Bigger data is better for molecular diagnosis tests based on decision trees

Alexandru G. Floares, George A. Calin, Florin B. Manolache

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Most molecular diagnosis tests are based on small studies with about twenty patients, and use classical statistics. The prevailing conception is that such studies can indeed yield accurate tests with just one or two predictors, especially when using informative molecules like microRNA in cancer diagnosis. We investigated the relationship between accuracy, the number of microRNA predictors, and the sample size of the dataset used in developing cancer diagnosis tests. The generalization capability of the tests was also investigated. One of the largest existing free breast cancer dataset was used in a binary classification (cancer versus normal) using C5 and CART decision trees. The results show that diagnosis tests with a good compromise between accuracy and the number of predictors (related to costs) can be obtained with C5 or CART on a sample size of more than 100 patients. These tests generalize well.

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Bigger data is better for molecular diagnosis tests based on decision trees'. Together they form a unique fingerprint.

Cite this