TY - JOUR
T1 - Incorporating prior knowledge into Gene Network Study
AU - Wang, Zixing
AU - Xu, Wenlong
AU - Lucas, F. Anthony San
AU - Liu, Yin
N1 - Funding Information:
Funding: This work has been supported in part by the National Institutes of Health grants [R01 LM010022, R01 GM097553] and the seed grant from the University of Texas Health Science Center at Houston.
PY - 2013/10/15
Y1 - 2013/10/15
N2 - Motivation: A major goal in genomic research is to identify genes that may jointly influence a biological response. From many years of intensive biomedical research, a large body of biological knowledge, or pathway information, has accumulated in available databases. There is a strong interest in leveraging these pathways to improve the statistical power and interpretability in studying gene networks associated with complex phenotypes. This prior information is a valuable complement to large-scale genomic data such as gene expression data generated from microarrays. However, it is a non-trivial task to effectively integrate available biological knowledge into gene expression data when reconstructing gene networks. Results: In this article, we developed and applied a Lasso method from a Bayesian perspective, a method we call prior Lasso (pLasso), for the reconstruction of gene networks. In this method, we partition edges between genes into two subsets: one subset of edges is present in known pathways, whereas the other has no prior information associated. Our method assigns different prior distributions to each subset according to a modified Bayesian information criterion that incorporates prior knowledge on both the network structure and the pathway information. Simulation studies have indicated that the method is more effective in recovering the underlying network than a traditional Lasso method that does not use the prior information. We applied pLasso to microarray gene expression datasets, where we used information from the Pathway Commons (PC) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) as prior information for the network reconstruction, and successfully identified network hub genes associated with clinical outcome in cancer patients. Availability: The source code is available at http://nba.uth.tmc.edu/homepage/ liu/pLasso. Contact: Yin.Liu@uth.tmc.edu Supplementary information: Supplementary data are available at Bioinformatics online.
AB - Motivation: A major goal in genomic research is to identify genes that may jointly influence a biological response. From many years of intensive biomedical research, a large body of biological knowledge, or pathway information, has accumulated in available databases. There is a strong interest in leveraging these pathways to improve the statistical power and interpretability in studying gene networks associated with complex phenotypes. This prior information is a valuable complement to large-scale genomic data such as gene expression data generated from microarrays. However, it is a non-trivial task to effectively integrate available biological knowledge into gene expression data when reconstructing gene networks. Results: In this article, we developed and applied a Lasso method from a Bayesian perspective, a method we call prior Lasso (pLasso), for the reconstruction of gene networks. In this method, we partition edges between genes into two subsets: one subset of edges is present in known pathways, whereas the other has no prior information associated. Our method assigns different prior distributions to each subset according to a modified Bayesian information criterion that incorporates prior knowledge on both the network structure and the pathway information. Simulation studies have indicated that the method is more effective in recovering the underlying network than a traditional Lasso method that does not use the prior information. We applied pLasso to microarray gene expression datasets, where we used information from the Pathway Commons (PC) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) as prior information for the network reconstruction, and successfully identified network hub genes associated with clinical outcome in cancer patients. Availability: The source code is available at http://nba.uth.tmc.edu/homepage/ liu/pLasso. Contact: Yin.Liu@uth.tmc.edu Supplementary information: Supplementary data are available at Bioinformatics online.
UR - http://www.scopus.com/inward/record.url?scp=84885580971&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84885580971&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btt443
DO - 10.1093/bioinformatics/btt443
M3 - Article
C2 - 23956306
AN - SCOPUS:84885580971
SN - 1367-4803
VL - 29
SP - 2633
EP - 2640
JO - Bioinformatics
JF - Bioinformatics
IS - 20
ER -