Scalable network estimation with L0 penalty

Junghi Kim; Hongtu Zhu; Xiao Wang; Kim Anh Do

doi:10.1002/sam.11483

Scalable network estimation with L₀ penalty

Junghi Kim, Hongtu Zhu, Xiao Wang, Kim Anh Do

Biostatistics

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

With the advent of high-throughput sequencing, an efficient computing strategy is required to deal with large genomic data sets. The challenge of estimating a large precision matrix has garnered substantial research attention for its direct application to discriminant analyses and graphical models. Most existing methods either use a lasso-type penalty that may lead to biased estimators or are computationally intensive, which prevents their applications to very large graphs. We propose using an L₀ penalty to estimate an ultra-large precision matrix (scalnetL0). We apply scalnetL0 to RNA-seq data from breast cancer patients represented in The Cancer Genome Atlas and find improved accuracy of classifications for survival times. The estimated precision matrix provides information about a large-scale co-expression network in breast cancer. Simulation studies demonstrate that scalnetL0 provides more accurate and efficient estimators, yielding shorter CPU time and less Frobenius loss on sparse learning for large-scale precision matrix estimation.

Original language	English (US)
Pages (from-to)	18-30
Number of pages	13
Journal	Statistical Analysis and Data Mining
Volume	14
Issue number	1
DOIs	https://doi.org/10.1002/sam.11483
State	Published - Feb 2021

Keywords

L penalty
genomics
network
scalable

ASJC Scopus subject areas

Analysis
Information Systems
Computer Science Applications

MD Anderson CCSG core facilities

Biostatistics Resource Group

Access to Document

10.1002/sam.11483

Cite this

@article{3a26cf41a26047e5bba8dd718d387ede,

title = "Scalable network estimation with L0 penalty",

abstract = "With the advent of high-throughput sequencing, an efficient computing strategy is required to deal with large genomic data sets. The challenge of estimating a large precision matrix has garnered substantial research attention for its direct application to discriminant analyses and graphical models. Most existing methods either use a lasso-type penalty that may lead to biased estimators or are computationally intensive, which prevents their applications to very large graphs. We propose using an L0 penalty to estimate an ultra-large precision matrix (scalnetL0). We apply scalnetL0 to RNA-seq data from breast cancer patients represented in The Cancer Genome Atlas and find improved accuracy of classifications for survival times. The estimated precision matrix provides information about a large-scale co-expression network in breast cancer. Simulation studies demonstrate that scalnetL0 provides more accurate and efficient estimators, yielding shorter CPU time and less Frobenius loss on sparse learning for large-scale precision matrix estimation.",

keywords = "L penalty, genomics, network, scalable",

author = "Junghi Kim and Hongtu Zhu and Xiao Wang and Do, {Kim Anh}",

note = "Publisher Copyright: {\textcopyright} 2020 Wiley Periodicals LLC.",

year = "2021",

month = feb,

doi = "10.1002/sam.11483",

language = "English (US)",

volume = "14",

pages = "18--30",

journal = "Statistical Analysis and Data Mining",

issn = "1932-1864",

publisher = "John Wiley and Sons Inc.",

number = "1",

}

TY - JOUR

T1 - Scalable network estimation with L0 penalty

AU - Kim, Junghi

AU - Zhu, Hongtu

AU - Wang, Xiao

AU - Do, Kim Anh

PY - 2021/2

Y1 - 2021/2

N2 - With the advent of high-throughput sequencing, an efficient computing strategy is required to deal with large genomic data sets. The challenge of estimating a large precision matrix has garnered substantial research attention for its direct application to discriminant analyses and graphical models. Most existing methods either use a lasso-type penalty that may lead to biased estimators or are computationally intensive, which prevents their applications to very large graphs. We propose using an L0 penalty to estimate an ultra-large precision matrix (scalnetL0). We apply scalnetL0 to RNA-seq data from breast cancer patients represented in The Cancer Genome Atlas and find improved accuracy of classifications for survival times. The estimated precision matrix provides information about a large-scale co-expression network in breast cancer. Simulation studies demonstrate that scalnetL0 provides more accurate and efficient estimators, yielding shorter CPU time and less Frobenius loss on sparse learning for large-scale precision matrix estimation.

AB - With the advent of high-throughput sequencing, an efficient computing strategy is required to deal with large genomic data sets. The challenge of estimating a large precision matrix has garnered substantial research attention for its direct application to discriminant analyses and graphical models. Most existing methods either use a lasso-type penalty that may lead to biased estimators or are computationally intensive, which prevents their applications to very large graphs. We propose using an L0 penalty to estimate an ultra-large precision matrix (scalnetL0). We apply scalnetL0 to RNA-seq data from breast cancer patients represented in The Cancer Genome Atlas and find improved accuracy of classifications for survival times. The estimated precision matrix provides information about a large-scale co-expression network in breast cancer. Simulation studies demonstrate that scalnetL0 provides more accurate and efficient estimators, yielding shorter CPU time and less Frobenius loss on sparse learning for large-scale precision matrix estimation.

KW - L penalty

KW - genomics

KW - network

KW - scalable

UR - http://www.scopus.com/inward/record.url?scp=85092934274&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85092934274&partnerID=8YFLogxK

U2 - 10.1002/sam.11483

DO - 10.1002/sam.11483

M3 - Article

C2 - 35027990

AN - SCOPUS:85092934274

SN - 1932-1864

VL - 14

SP - 18

EP - 30

JO - Statistical Analysis and Data Mining

JF - Statistical Analysis and Data Mining

IS - 1

ER -