Spectral clustering via sparse graph structure learning with application to proteomic signaling networks in cancer

Sayantan Banerjee; Rehan Akbani; Veerabhadran Baladandayuthapani

doi:10.1016/j.csda.2018.08.009

Spectral clustering via sparse graph structure learning with application to proteomic signaling networks in cancer

Sayantan Banerjee, Rehan Akbani, Veerabhadran Baladandayuthapani

Bioinformatics & Computational Biology

Research output: Contribution to journal › Article › peer-review

7 Scopus citations

Abstract

Clustering methods for multivariate data exploiting the underlying geometry of the graphical structure between variables are presented. As opposed to standard approaches for graph clustering that assume known graph structures, the edge structure of the unknown graph is first estimated using sparse regression based approaches for sparse graph structure learning. Subsequently, graph clustering on the lower dimensional projections of the graph is performed based on Laplacian embeddings using a penalized k-means approach, motivated by Dirichlet process mixture models in Bayesian nonparametrics. In contrast to standard algorithmic approaches for known graphs, the proposed method allows estimation and inference for both graph structure learning and clustering. More importantly, the arguments for Laplacian embeddings as suitable projections for graph clustering are formalized by providing theoretical support for the consistency of the eigenspace of the estimated graph Laplacians. Fast computational algorithms are proposed to scale the method to large number of nodes. Extensive simulations are presented to compare the clustering performance with standard methods. The methods are applied to a novel pan-cancer proteomic data set, and protein networks and clusters are evaluated across multiple different cancer types.

Original language	English (US)
Pages (from-to)	46-69
Number of pages	24
Journal	Computational Statistics and Data Analysis
Volume	132
DOIs	https://doi.org/10.1016/j.csda.2018.08.009
State	Published - Apr 2019

Keywords

Graph clustering
Graph structure learning
Proteomic data
Spectral clustering

ASJC Scopus subject areas

Statistics and Probability
Computational Mathematics
Computational Theory and Mathematics
Applied Mathematics

MD Anderson CCSG core facilities

Bioinformatics Shared Resource
Functional Proteomics Reverse Phase Protein Array Core

Access to Document

10.1016/j.csda.2018.08.009

Cite this

@article{a51cb4771daa436d989c6e7d3cf7a637,

title = "Spectral clustering via sparse graph structure learning with application to proteomic signaling networks in cancer",

abstract = "Clustering methods for multivariate data exploiting the underlying geometry of the graphical structure between variables are presented. As opposed to standard approaches for graph clustering that assume known graph structures, the edge structure of the unknown graph is first estimated using sparse regression based approaches for sparse graph structure learning. Subsequently, graph clustering on the lower dimensional projections of the graph is performed based on Laplacian embeddings using a penalized k-means approach, motivated by Dirichlet process mixture models in Bayesian nonparametrics. In contrast to standard algorithmic approaches for known graphs, the proposed method allows estimation and inference for both graph structure learning and clustering. More importantly, the arguments for Laplacian embeddings as suitable projections for graph clustering are formalized by providing theoretical support for the consistency of the eigenspace of the estimated graph Laplacians. Fast computational algorithms are proposed to scale the method to large number of nodes. Extensive simulations are presented to compare the clustering performance with standard methods. The methods are applied to a novel pan-cancer proteomic data set, and protein networks and clusters are evaluated across multiple different cancer types.",

keywords = "Graph clustering, Graph structure learning, Proteomic data, Spectral clustering",

author = "Sayantan Banerjee and Rehan Akbani and Veerabhadran Baladandayuthapani",

note = "Funding Information: S.B. and V.B. were partially supported by NIH , United States, grant R01 CA160736 and NSF , United States, grant 1463233 . S.B. was supported by DST INSPIRE Faculty Award , India, Grant No. 04/2015/002165 . V.B. was also partially supported by NIH , United States, grants P30-CA016672 and R01-CA194391 . R.A. was supported in part by U.S. National Cancer Institute (NCI; MD Anderson TCGA Genome Data Analysis Center) Grant Number CA143883 , the Cancer Prevention Research Institute of Texas (CPRIT) , United States, Grant Number RP130397 , the Mary K. Chapman Foundation , United States, the Michael & Susan Dell Foundation , United States (honoring Lorraine Dell). All authors were also partially supported by MD Anderson Cancer Center Support Grant , United States, P30 CA016672 (the Bioinformatics Shared Resource). The authors would like to express their sincere thanks to the Associate Editor and two other anonymous reviewers for their careful reading and comments which has lead to substantial improvements in the paper. The authors would also like to thank Lee Ann Chastain for assistance in proofreading the manuscript. Funding Information: S.B. and V.B. were partially supported by NIH, United States, grant R01 CA160736 and NSF, United States, grant 1463233. S.B. was supported by DST INSPIRE Faculty Award, India, Grant No. 04/2015/002165. V.B. was also partially supported by NIH, United States, grants P30-CA016672 and R01-CA194391. R.A. was supported in part by U.S. National Cancer Institute (NCI; MD Anderson TCGA Genome Data Analysis Center) Grant Number CA143883, the Cancer Prevention Research Institute of Texas (CPRIT), United States, Grant Number RP130397, the Mary K. Chapman Foundation, United States, the Michael & Susan Dell Foundation, United States (honoring Lorraine Dell). All authors were also partially supported by MD Anderson Cancer Center Support Grant, United States, P30 CA016672 (the Bioinformatics Shared Resource). The authors would like to express their sincere thanks to the Associate Editor and two other anonymous reviewers for their careful reading and comments which has lead to substantial improvements in the paper. The authors would also like to thank Lee Ann Chastain for assistance in proofreading the manuscript. Publisher Copyright: {\textcopyright} 2018 Elsevier B.V.",

year = "2019",

month = apr,

doi = "10.1016/j.csda.2018.08.009",

language = "English (US)",

volume = "132",

pages = "46--69",

journal = "Computational Statistics and Data Analysis",

issn = "0167-9473",

publisher = "Elsevier",

}

TY - JOUR

T1 - Spectral clustering via sparse graph structure learning with application to proteomic signaling networks in cancer

AU - Banerjee, Sayantan

AU - Akbani, Rehan

AU - Baladandayuthapani, Veerabhadran

N1 - Funding Information: S.B. and V.B. were partially supported by NIH , United States, grant R01 CA160736 and NSF , United States, grant 1463233 . S.B. was supported by DST INSPIRE Faculty Award , India, Grant No. 04/2015/002165 . V.B. was also partially supported by NIH , United States, grants P30-CA016672 and R01-CA194391 . R.A. was supported in part by U.S. National Cancer Institute (NCI; MD Anderson TCGA Genome Data Analysis Center) Grant Number CA143883 , the Cancer Prevention Research Institute of Texas (CPRIT) , United States, Grant Number RP130397 , the Mary K. Chapman Foundation , United States, the Michael & Susan Dell Foundation , United States (honoring Lorraine Dell). All authors were also partially supported by MD Anderson Cancer Center Support Grant , United States, P30 CA016672 (the Bioinformatics Shared Resource). The authors would like to express their sincere thanks to the Associate Editor and two other anonymous reviewers for their careful reading and comments which has lead to substantial improvements in the paper. The authors would also like to thank Lee Ann Chastain for assistance in proofreading the manuscript. Funding Information: S.B. and V.B. were partially supported by NIH, United States, grant R01 CA160736 and NSF, United States, grant 1463233. S.B. was supported by DST INSPIRE Faculty Award, India, Grant No. 04/2015/002165. V.B. was also partially supported by NIH, United States, grants P30-CA016672 and R01-CA194391. R.A. was supported in part by U.S. National Cancer Institute (NCI; MD Anderson TCGA Genome Data Analysis Center) Grant Number CA143883, the Cancer Prevention Research Institute of Texas (CPRIT), United States, Grant Number RP130397, the Mary K. Chapman Foundation, United States, the Michael & Susan Dell Foundation, United States (honoring Lorraine Dell). All authors were also partially supported by MD Anderson Cancer Center Support Grant, United States, P30 CA016672 (the Bioinformatics Shared Resource). The authors would like to express their sincere thanks to the Associate Editor and two other anonymous reviewers for their careful reading and comments which has lead to substantial improvements in the paper. The authors would also like to thank Lee Ann Chastain for assistance in proofreading the manuscript. Publisher Copyright: © 2018 Elsevier B.V.

PY - 2019/4

Y1 - 2019/4

N2 - Clustering methods for multivariate data exploiting the underlying geometry of the graphical structure between variables are presented. As opposed to standard approaches for graph clustering that assume known graph structures, the edge structure of the unknown graph is first estimated using sparse regression based approaches for sparse graph structure learning. Subsequently, graph clustering on the lower dimensional projections of the graph is performed based on Laplacian embeddings using a penalized k-means approach, motivated by Dirichlet process mixture models in Bayesian nonparametrics. In contrast to standard algorithmic approaches for known graphs, the proposed method allows estimation and inference for both graph structure learning and clustering. More importantly, the arguments for Laplacian embeddings as suitable projections for graph clustering are formalized by providing theoretical support for the consistency of the eigenspace of the estimated graph Laplacians. Fast computational algorithms are proposed to scale the method to large number of nodes. Extensive simulations are presented to compare the clustering performance with standard methods. The methods are applied to a novel pan-cancer proteomic data set, and protein networks and clusters are evaluated across multiple different cancer types.

AB - Clustering methods for multivariate data exploiting the underlying geometry of the graphical structure between variables are presented. As opposed to standard approaches for graph clustering that assume known graph structures, the edge structure of the unknown graph is first estimated using sparse regression based approaches for sparse graph structure learning. Subsequently, graph clustering on the lower dimensional projections of the graph is performed based on Laplacian embeddings using a penalized k-means approach, motivated by Dirichlet process mixture models in Bayesian nonparametrics. In contrast to standard algorithmic approaches for known graphs, the proposed method allows estimation and inference for both graph structure learning and clustering. More importantly, the arguments for Laplacian embeddings as suitable projections for graph clustering are formalized by providing theoretical support for the consistency of the eigenspace of the estimated graph Laplacians. Fast computational algorithms are proposed to scale the method to large number of nodes. Extensive simulations are presented to compare the clustering performance with standard methods. The methods are applied to a novel pan-cancer proteomic data set, and protein networks and clusters are evaluated across multiple different cancer types.

KW - Graph clustering

KW - Graph structure learning

KW - Proteomic data

KW - Spectral clustering

UR - http://www.scopus.com/inward/record.url?scp=85052893431&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85052893431&partnerID=8YFLogxK

U2 - 10.1016/j.csda.2018.08.009

DO - 10.1016/j.csda.2018.08.009

M3 - Article

AN - SCOPUS:85052893431

SN - 0167-9473

VL - 132

SP - 46

EP - 69

JO - Computational Statistics and Data Analysis

JF - Computational Statistics and Data Analysis

ER -

Spectral clustering via sparse graph structure learning with application to proteomic signaling networks in cancer

Abstract

Keywords

ASJC Scopus subject areas

MD Anderson CCSG core facilities

Access to Document

Other files and links

Fingerprint

Cite this