TY - JOUR
T1 - Spectral clustering via sparse graph structure learning with application to proteomic signaling networks in cancer
AU - Banerjee, Sayantan
AU - Akbani, Rehan
AU - Baladandayuthapani, Veerabhadran
N1 - Funding Information:
S.B. and V.B. were partially supported by NIH , United States, grant R01 CA160736 and NSF , United States, grant 1463233 . S.B. was supported by DST INSPIRE Faculty Award , India, Grant No. 04/2015/002165 . V.B. was also partially supported by NIH , United States, grants P30-CA016672 and R01-CA194391 . R.A. was supported in part by U.S. National Cancer Institute (NCI; MD Anderson TCGA Genome Data Analysis Center) Grant Number CA143883 , the Cancer Prevention Research Institute of Texas (CPRIT) , United States, Grant Number RP130397 , the Mary K. Chapman Foundation , United States, the Michael & Susan Dell Foundation , United States (honoring Lorraine Dell). All authors were also partially supported by MD Anderson Cancer Center Support Grant , United States, P30 CA016672 (the Bioinformatics Shared Resource). The authors would like to express their sincere thanks to the Associate Editor and two other anonymous reviewers for their careful reading and comments which has lead to substantial improvements in the paper. The authors would also like to thank Lee Ann Chastain for assistance in proofreading the manuscript.
Funding Information:
S.B. and V.B. were partially supported by NIH, United States, grant R01 CA160736 and NSF, United States, grant 1463233. S.B. was supported by DST INSPIRE Faculty Award, India, Grant No. 04/2015/002165. V.B. was also partially supported by NIH, United States, grants P30-CA016672 and R01-CA194391. R.A. was supported in part by U.S. National Cancer Institute (NCI; MD Anderson TCGA Genome Data Analysis Center) Grant Number CA143883, the Cancer Prevention Research Institute of Texas (CPRIT), United States, Grant Number RP130397, the Mary K. Chapman Foundation, United States, the Michael & Susan Dell Foundation, United States (honoring Lorraine Dell). All authors were also partially supported by MD Anderson Cancer Center Support Grant, United States, P30 CA016672 (the Bioinformatics Shared Resource). The authors would like to express their sincere thanks to the Associate Editor and two other anonymous reviewers for their careful reading and comments which has lead to substantial improvements in the paper. The authors would also like to thank Lee Ann Chastain for assistance in proofreading the manuscript.
Publisher Copyright:
© 2018 Elsevier B.V.
PY - 2019/4
Y1 - 2019/4
N2 - Clustering methods for multivariate data exploiting the underlying geometry of the graphical structure between variables are presented. As opposed to standard approaches for graph clustering that assume known graph structures, the edge structure of the unknown graph is first estimated using sparse regression based approaches for sparse graph structure learning. Subsequently, graph clustering on the lower dimensional projections of the graph is performed based on Laplacian embeddings using a penalized k-means approach, motivated by Dirichlet process mixture models in Bayesian nonparametrics. In contrast to standard algorithmic approaches for known graphs, the proposed method allows estimation and inference for both graph structure learning and clustering. More importantly, the arguments for Laplacian embeddings as suitable projections for graph clustering are formalized by providing theoretical support for the consistency of the eigenspace of the estimated graph Laplacians. Fast computational algorithms are proposed to scale the method to large number of nodes. Extensive simulations are presented to compare the clustering performance with standard methods. The methods are applied to a novel pan-cancer proteomic data set, and protein networks and clusters are evaluated across multiple different cancer types.
AB - Clustering methods for multivariate data exploiting the underlying geometry of the graphical structure between variables are presented. As opposed to standard approaches for graph clustering that assume known graph structures, the edge structure of the unknown graph is first estimated using sparse regression based approaches for sparse graph structure learning. Subsequently, graph clustering on the lower dimensional projections of the graph is performed based on Laplacian embeddings using a penalized k-means approach, motivated by Dirichlet process mixture models in Bayesian nonparametrics. In contrast to standard algorithmic approaches for known graphs, the proposed method allows estimation and inference for both graph structure learning and clustering. More importantly, the arguments for Laplacian embeddings as suitable projections for graph clustering are formalized by providing theoretical support for the consistency of the eigenspace of the estimated graph Laplacians. Fast computational algorithms are proposed to scale the method to large number of nodes. Extensive simulations are presented to compare the clustering performance with standard methods. The methods are applied to a novel pan-cancer proteomic data set, and protein networks and clusters are evaluated across multiple different cancer types.
KW - Graph clustering
KW - Graph structure learning
KW - Proteomic data
KW - Spectral clustering
UR - http://www.scopus.com/inward/record.url?scp=85052893431&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85052893431&partnerID=8YFLogxK
U2 - 10.1016/j.csda.2018.08.009
DO - 10.1016/j.csda.2018.08.009
M3 - Article
AN - SCOPUS:85052893431
SN - 0167-9473
VL - 132
SP - 46
EP - 69
JO - Computational Statistics and Data Analysis
JF - Computational Statistics and Data Analysis
ER -