Bigger data is better for molecular diagnosis tests based on decision trees

Alexandru G. Floares; George A. Calin; Florin B. Manolache

doi:10.1007/978-3-319-40973-3_29

Bigger data is better for molecular diagnosis tests based on decision trees

Alexandru G. Floares, George A. Calin, Florin B. Manolache

Research output: Contribution to journal › Article › peer-review

3 Scopus citations

Abstract

Most molecular diagnosis tests are based on small studies with about twenty patients, and use classical statistics. The prevailing conception is that such studies can indeed yield accurate tests with just one or two predictors, especially when using informative molecules like microRNA in cancer diagnosis. We investigated the relationship between accuracy, the number of microRNA predictors, and the sample size of the dataset used in developing cancer diagnosis tests. The generalization capability of the tests was also investigated. One of the largest existing free breast cancer dataset was used in a binary classification (cancer versus normal) using C5 and CART decision trees. The results show that diagnosis tests with a good compromise between accuracy and the number of predictors (related to costs) can be obtained with C5 or CART on a sample size of more than 100 patients. These tests generalize well.

Original language	English (US)
Pages (from-to)	288-295
Number of pages	8
Journal	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	9714 LNCS
DOIs	https://doi.org/10.1007/978-3-319-40973-3_29
State	Published - 2016

ASJC Scopus subject areas

Theoretical Computer Science
General Computer Science

Access to Document

10.1007/978-3-319-40973-3_29

Cite this

Bigger data is better for molecular diagnosis tests based on decision trees. / Floares, Alexandru G.; Calin, George A.; Manolache, Florin B.
In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 9714 LNCS, 2016, p. 288-295.

Research output: Contribution to journal › Article › peer-review

@article{cfc0b4c1a54d4fb489843bc473397c72,

title = "Bigger data is better for molecular diagnosis tests based on decision trees",

abstract = "Most molecular diagnosis tests are based on small studies with about twenty patients, and use classical statistics. The prevailing conception is that such studies can indeed yield accurate tests with just one or two predictors, especially when using informative molecules like microRNA in cancer diagnosis. We investigated the relationship between accuracy, the number of microRNA predictors, and the sample size of the dataset used in developing cancer diagnosis tests. The generalization capability of the tests was also investigated. One of the largest existing free breast cancer dataset was used in a binary classification (cancer versus normal) using C5 and CART decision trees. The results show that diagnosis tests with a good compromise between accuracy and the number of predictors (related to costs) can be obtained with C5 or CART on a sample size of more than 100 patients. These tests generalize well.",

author = "Floares, {Alexandru G.} and Calin, {George A.} and Manolache, {Florin B.}",

note = "Funding Information: This work was supported by the research grants UEFISCDI PN-II-PT-PCCA-2013-4-1959 INTELCOR and UEFISCDI PN-II-PT-PCCA-2011-3.1-1221 IntelUro, financed by Romanian Ministry of Education and Scientific Research. Publisher Copyright: {\textcopyright} Springer International Publishing Switzerland 2016.",

year = "2016",

doi = "10.1007/978-3-319-40973-3_29",

language = "English (US)",

volume = "9714 LNCS",

pages = "288--295",

journal = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

issn = "0302-9743",

publisher = "Springer Verlag",

}

TY - JOUR

T1 - Bigger data is better for molecular diagnosis tests based on decision trees

AU - Floares, Alexandru G.

AU - Calin, George A.

AU - Manolache, Florin B.

N1 - Funding Information: This work was supported by the research grants UEFISCDI PN-II-PT-PCCA-2013-4-1959 INTELCOR and UEFISCDI PN-II-PT-PCCA-2011-3.1-1221 IntelUro, financed by Romanian Ministry of Education and Scientific Research. Publisher Copyright: © Springer International Publishing Switzerland 2016.

PY - 2016

Y1 - 2016

N2 - Most molecular diagnosis tests are based on small studies with about twenty patients, and use classical statistics. The prevailing conception is that such studies can indeed yield accurate tests with just one or two predictors, especially when using informative molecules like microRNA in cancer diagnosis. We investigated the relationship between accuracy, the number of microRNA predictors, and the sample size of the dataset used in developing cancer diagnosis tests. The generalization capability of the tests was also investigated. One of the largest existing free breast cancer dataset was used in a binary classification (cancer versus normal) using C5 and CART decision trees. The results show that diagnosis tests with a good compromise between accuracy and the number of predictors (related to costs) can be obtained with C5 or CART on a sample size of more than 100 patients. These tests generalize well.

AB - Most molecular diagnosis tests are based on small studies with about twenty patients, and use classical statistics. The prevailing conception is that such studies can indeed yield accurate tests with just one or two predictors, especially when using informative molecules like microRNA in cancer diagnosis. We investigated the relationship between accuracy, the number of microRNA predictors, and the sample size of the dataset used in developing cancer diagnosis tests. The generalization capability of the tests was also investigated. One of the largest existing free breast cancer dataset was used in a binary classification (cancer versus normal) using C5 and CART decision trees. The results show that diagnosis tests with a good compromise between accuracy and the number of predictors (related to costs) can be obtained with C5 or CART on a sample size of more than 100 patients. These tests generalize well.

UR - http://www.scopus.com/inward/record.url?scp=85007557218&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85007557218&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-40973-3_29

DO - 10.1007/978-3-319-40973-3_29

M3 - Article

AN - SCOPUS:85007557218

SN - 0302-9743

VL - 9714 LNCS

SP - 288

EP - 295

JO - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

JF - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -

Bigger data is better for molecular diagnosis tests based on decision trees

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this