Efficient gene selection for cancer prognostic biomarkers using swarm optimization and survival analysis

Raul Aguirre-Gamboa, Emmanuel Martinez-Ledesma, Hugo Gomez-Rueda, Rebeca Palacios, Isabel Fuentes-Hernandez, Emilio Sánchez-Canales, Rafael Chacolla-Huaringa, Servando Cardona-Huerta, Luis Villela, Sean Patrick Scott, Jose Tamez-Pena, Victor Trevino

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

The discovery of molecular prognostic cancer biomarkers is still a major scientific challenge. Some methodologies have been proposed to generate novel model biomarkers for clinical outcome using gene expression as predictors but involve some drawbacks. For example, (i) they heavily depend on a rank of the initial univariate relation to survival times, (ii) are unable to generate compact multivariate predictors, (iii) are based on survival models other than Cox, or (iv) use aggregation and transformations of expression values instead of the gene expression directly. These issues complicate the evaluation of biomarkers in clinical trials, its implementation in medical practice and obscures its biological association with cancer. We propose a particle swarm optimization search engine coupled to multivariate Cox survival model fitting, constraining the number of genes while minimizing for deviance residuals to identify prognostic biomarkers cancer. By evaluating the concordance index, Log-rank, correlation, the integrated discrimination improvement per feature and the number of variables significantly associated to survival times, we show that many compact and highly predictive models can be found for six cancer datasets and a simulated cohort. We also show that our algorithm generates a competitive population of multivariate models with a wide variety of gene combinations, including genes that could not be found by a univariate methodology. In comparisons with other methods such as LASSO, Ridge, and Elastic Net, our algorithm shows similar or better results. We conclude that our algorithm generates highly predictive and compact models for clinical outcomes with a unique gene content, and a superior or comparable prediction to other current feature selection methods. R and Java code are available in Supplementary Information and http://bioinformatica.mty.itesm.mx/?q=coxswarm.

Original languageEnglish (US)
Pages (from-to)310-323
Number of pages14
JournalCurrent Bioinformatics
Volume11
Issue number3
DOIs
StatePublished - Jul 1 2016

Keywords

  • Biomarkers
  • Clinical outcome
  • Feature selection
  • Gene expression
  • Microarrays

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Genetics
  • Computational Mathematics

Fingerprint

Dive into the research topics of 'Efficient gene selection for cancer prognostic biomarkers using swarm optimization and survival analysis'. Together they form a unique fingerprint.

Cite this