TY - GEN
T1 - Bayesian variable selection for linear regression in high dimensional microarray data
AU - Cabrera, Wellington
AU - Ordonez, Carlos
AU - Matusevich, David Sergio
AU - Baladandayuthapani, Veerabhadran
N1 - Copyright:
Copyright 2013 Elsevier B.V., All rights reserved.
PY - 2013
Y1 - 2013
N2 - Variable selection is a fundamental problem in Bayesian statistics whose solution requires exploring a combinatorial search space. We study the solution of variable selection with a well-known MCMC method, which requires thousands of iterations. We present several algorithmic optimizations to accelerate the MCMC method to make it work efficiently inside a database system. Our optimizations include sufficient statistics, variable preselection, hash tables and calling a linear algebra library. We present experiments with very high dimensional microarray data sets to predict cancer survival time. We discuss encouraging findings, identifying specific genes likely to predict the survival time for brain cancer patients. We also show our DBMS-based algorithm is orders of magnitude faster than the R statistical package. Our work shows a DBMS is a promising platform to analyze microarray data.
AB - Variable selection is a fundamental problem in Bayesian statistics whose solution requires exploring a combinatorial search space. We study the solution of variable selection with a well-known MCMC method, which requires thousands of iterations. We present several algorithmic optimizations to accelerate the MCMC method to make it work efficiently inside a database system. Our optimizations include sufficient statistics, variable preselection, hash tables and calling a linear algebra library. We present experiments with very high dimensional microarray data sets to predict cancer survival time. We discuss encouraging findings, identifying specific genes likely to predict the survival time for brain cancer patients. We also show our DBMS-based algorithm is orders of magnitude faster than the R statistical package. Our work shows a DBMS is a promising platform to analyze microarray data.
KW - Algorithms
KW - DBMS
KW - MCMC
KW - Microarray
KW - Variable selection
UR - http://www.scopus.com/inward/record.url?scp=84889587726&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84889587726&partnerID=8YFLogxK
U2 - 10.1145/2512089.2512094
DO - 10.1145/2512089.2512094
M3 - Conference contribution
AN - SCOPUS:84889587726
SN - 9781450324199
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 17
EP - 18
BT - DTMBIO 2013 - Proceedings of the 7th International Workshop on Data and Text Mining in Biomedical Informatics, Co-located with CIKM 2013
T2 - 7th ACM International Workshop on Data and Text Mining in Biomedical Informatics, DTMBIO 2013, in Conjunction with the 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013
Y2 - 1 November 2013 through 1 November 2013
ER -