Ensemble stump classifiers and gene expression signatures in lung cancer

Lewis Frey, Mary Edgerton, Douglas Fisher, Shawn Levy

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

Microarray data sets for cancer tumor tissue generally have very few samples, each sample having thousands of probes (i.e., continuous variables). The sparsity of samples makes it difficult for machine learning techniques to discover probes relevant to the classification of tumor tissue. By combining data from different platforms (i.e., data sources), data sparsity is reduced, but this typically requires normalizing data from the different platforms, which can be non-trivial. This paper proposes a variant on the idea of ensemble learners to circumvent the need for normalization. To facilitate comprehension we build ensembles of very simple classifiers known as decision stumps-decision trees of one test each. The Ensemble Stump Classifier (ESC) identifies an mRNA signature having three probes and high accuracy for distinguishing between adenocarcinoma and squamous cell carcinoma of the lung across four data sets. In terms of accuracy, ESC outperforms a decision tree classifier on all four data sets, outperforms ensemble decision trees on three data sets, and simple stump classifiers on two data sets.

Original languageEnglish (US)
Title of host publicationMEDINFO 2007 - Proceedings of the 12th World Congress on Health (Medical) Informatics
Subtitle of host publicationBuilding Sustainable Health Systems
PublisherIOS Press
Pages1255-1259
Number of pages5
ISBN (Print)9781586037741
StatePublished - 2007
Event12th World Congress on Medical Informatics, MEDINFO 2007 - Brisbane, QLD, Australia
Duration: Aug 20 2007Aug 24 2007

Publication series

NameStudies in Health Technology and Informatics
Volume129
ISSN (Print)0926-9630
ISSN (Electronic)1879-8365

Other

Other12th World Congress on Medical Informatics, MEDINFO 2007
Country/TerritoryAustralia
CityBrisbane, QLD
Period8/20/078/24/07

Keywords

  • decision trees
  • ensembles
  • microarray
  • stumps

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics
  • Health Information Management

Fingerprint

Dive into the research topics of 'Ensemble stump classifiers and gene expression signatures in lung cancer'. Together they form a unique fingerprint.

Cite this