TY - JOUR
T1 - A comprehensive assessment of cell type-specific differential expression methods in bulk data
AU - Meng, Guanqun
AU - Tang, Wen
AU - Huang, Emina
AU - Li, Ziyi
AU - Feng, Hao
N1 - Funding Information:
National Institutes of Health [U01CA214300, R01CA237304 to E.H., R03CA270725 to Z.L.]; American Cancer Society Institutional Research Grant (ACS IRG) [#IRG-16-186-21 to H.F.] through Case Comprehensive Cancer Center; Corinne L. Dodero Foundation for the Arts and Sciences; Case Western Reserve University (CWRU) Program for Autism Education and Research to H.F.
Publisher Copyright:
© 2022 The Author(s). Published by Oxford University Press.
PY - 2023/1/1
Y1 - 2023/1/1
N2 - Accounting for cell type compositions has been very successful at analyzing high-throughput data from heterogeneous tissues. Differential gene expression analysis at cell type level is becoming increasingly popular, yielding biomarker discovery in a finer granularity within a particular cell type. Although several computational methods have been developed to identify cell type-specific differentially expressed genes (csDEG) from RNA-seq data, a systematic evaluation is yet to be performed. Here, we thoroughly benchmark six recently published methods: CellDMC, CARseq, TOAST, LRCDE, CeDAR and TCA, together with two classical methods, csSAM and DESeq2, for a comprehensive comparison. We aim to systematically evaluate the performance of popular csDEG detection methods and provide guidance to researchers. In simulation studies, we benchmark available methods under various scenarios of baseline expression levels, sample sizes, cell type compositions, expression level alterations, technical noises and biological dispersions. Real data analyses of three large datasets on inflammatory bowel disease, lung cancer and autism provide evaluation in both the gene level and the pathway level. We find that csDEG calling is strongly affected by effect size, baseline expression level and cell type compositions. Results imply that csDEG discovery is a challenging task itself, with room to improvements on handling low signal-to-noise ratio and low expression genes.
AB - Accounting for cell type compositions has been very successful at analyzing high-throughput data from heterogeneous tissues. Differential gene expression analysis at cell type level is becoming increasingly popular, yielding biomarker discovery in a finer granularity within a particular cell type. Although several computational methods have been developed to identify cell type-specific differentially expressed genes (csDEG) from RNA-seq data, a systematic evaluation is yet to be performed. Here, we thoroughly benchmark six recently published methods: CellDMC, CARseq, TOAST, LRCDE, CeDAR and TCA, together with two classical methods, csSAM and DESeq2, for a comprehensive comparison. We aim to systematically evaluate the performance of popular csDEG detection methods and provide guidance to researchers. In simulation studies, we benchmark available methods under various scenarios of baseline expression levels, sample sizes, cell type compositions, expression level alterations, technical noises and biological dispersions. Real data analyses of three large datasets on inflammatory bowel disease, lung cancer and autism provide evaluation in both the gene level and the pathway level. We find that csDEG calling is strongly affected by effect size, baseline expression level and cell type compositions. Results imply that csDEG discovery is a challenging task itself, with room to improvements on handling low signal-to-noise ratio and low expression genes.
KW - cell type-specific signal
KW - deconvolution
KW - differentially expressed genes
KW - heterogeneous samples
KW - RNA-seq
UR - http://www.scopus.com/inward/record.url?scp=85147044834&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85147044834&partnerID=8YFLogxK
U2 - 10.1093/bib/bbac516
DO - 10.1093/bib/bbac516
M3 - Review article
C2 - 36472568
AN - SCOPUS:85147044834
SN - 1467-5463
VL - 24
JO - Briefings in bioinformatics
JF - Briefings in bioinformatics
IS - 1
M1 - bbac516
ER -