TY - JOUR
T1 - Efficient gene selection for cancer prognostic biomarkers using swarm optimization and survival analysis
AU - Aguirre-Gamboa, Raul
AU - Martinez-Ledesma, Emmanuel
AU - Gomez-Rueda, Hugo
AU - Palacios, Rebeca
AU - Fuentes-Hernandez, Isabel
AU - Sánchez-Canales, Emilio
AU - Chacolla-Huaringa, Rafael
AU - Cardona-Huerta, Servando
AU - Villela, Luis
AU - Scott, Sean Patrick
AU - Tamez-Pena, Jose
AU - Trevino, Victor
N1 - Publisher Copyright:
© 2016 Bentham Science Publishers.
PY - 2016/7/1
Y1 - 2016/7/1
N2 - The discovery of molecular prognostic cancer biomarkers is still a major scientific challenge. Some methodologies have been proposed to generate novel model biomarkers for clinical outcome using gene expression as predictors but involve some drawbacks. For example, (i) they heavily depend on a rank of the initial univariate relation to survival times, (ii) are unable to generate compact multivariate predictors, (iii) are based on survival models other than Cox, or (iv) use aggregation and transformations of expression values instead of the gene expression directly. These issues complicate the evaluation of biomarkers in clinical trials, its implementation in medical practice and obscures its biological association with cancer. We propose a particle swarm optimization search engine coupled to multivariate Cox survival model fitting, constraining the number of genes while minimizing for deviance residuals to identify prognostic biomarkers cancer. By evaluating the concordance index, Log-rank, correlation, the integrated discrimination improvement per feature and the number of variables significantly associated to survival times, we show that many compact and highly predictive models can be found for six cancer datasets and a simulated cohort. We also show that our algorithm generates a competitive population of multivariate models with a wide variety of gene combinations, including genes that could not be found by a univariate methodology. In comparisons with other methods such as LASSO, Ridge, and Elastic Net, our algorithm shows similar or better results. We conclude that our algorithm generates highly predictive and compact models for clinical outcomes with a unique gene content, and a superior or comparable prediction to other current feature selection methods. R and Java code are available in Supplementary Information and http://bioinformatica.mty.itesm.mx/?q=coxswarm.
AB - The discovery of molecular prognostic cancer biomarkers is still a major scientific challenge. Some methodologies have been proposed to generate novel model biomarkers for clinical outcome using gene expression as predictors but involve some drawbacks. For example, (i) they heavily depend on a rank of the initial univariate relation to survival times, (ii) are unable to generate compact multivariate predictors, (iii) are based on survival models other than Cox, or (iv) use aggregation and transformations of expression values instead of the gene expression directly. These issues complicate the evaluation of biomarkers in clinical trials, its implementation in medical practice and obscures its biological association with cancer. We propose a particle swarm optimization search engine coupled to multivariate Cox survival model fitting, constraining the number of genes while minimizing for deviance residuals to identify prognostic biomarkers cancer. By evaluating the concordance index, Log-rank, correlation, the integrated discrimination improvement per feature and the number of variables significantly associated to survival times, we show that many compact and highly predictive models can be found for six cancer datasets and a simulated cohort. We also show that our algorithm generates a competitive population of multivariate models with a wide variety of gene combinations, including genes that could not be found by a univariate methodology. In comparisons with other methods such as LASSO, Ridge, and Elastic Net, our algorithm shows similar or better results. We conclude that our algorithm generates highly predictive and compact models for clinical outcomes with a unique gene content, and a superior or comparable prediction to other current feature selection methods. R and Java code are available in Supplementary Information and http://bioinformatica.mty.itesm.mx/?q=coxswarm.
KW - Biomarkers
KW - Clinical outcome
KW - Feature selection
KW - Gene expression
KW - Microarrays
UR - http://www.scopus.com/inward/record.url?scp=84978196994&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84978196994&partnerID=8YFLogxK
U2 - 10.2174/1574893611999160610125628
DO - 10.2174/1574893611999160610125628
M3 - Article
AN - SCOPUS:84978196994
SN - 1574-8936
VL - 11
SP - 310
EP - 323
JO - Current Bioinformatics
JF - Current Bioinformatics
IS - 3
ER -