Stratified Test Accurately Identifies Differentially Expressed Genes under Batch Effects in Single-Cell Data

Shaoheng Liang; Qingnan Liang; Rui Chen; Ken Chen

doi:10.1109/TCBB.2021.3094650

Stratified Test Accurately Identifies Differentially Expressed Genes under Batch Effects in Single-Cell Data

Shaoheng Liang, Qingnan Liang, Rui Chen, Ken Chen

Bioinformatics & Computational Biology

Research output: Contribution to journal › Article › peer-review

2 Scopus citations

Abstract

Analyzing single-cell sequencing data from large cohorts is challenging. Discrepancies across experiments and differences among participants often lead to omissions and false discoveries in differentially expressed genes. We find that the Van Elteren test, a stratified version of the widely used Wilcoxon rank-sum test, elegantly mitigates the problem. We also modified the common language effect size to supplement this test, further improving its utility. On both simulated and real patient data we show the ability of Van Elteren test to control for false positives and false negatives. A comprehensive assessment using receiver operating characteristic (ROC) curve shows that Van Elteren test achieves higher sensitivity and specificity on simulated datasets, compared with nine state-of-the-art differential expression analysis methods. The effect size also estimates the differences between cell types more accurately.

Original language	English (US)
Pages (from-to)	2072-2079
Number of pages	8
Journal	IEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume	18
Issue number	6
DOIs	https://doi.org/10.1109/TCBB.2021.3094650
State	Published - 2021

Keywords

Van Elteren test
Wilcoxon rank-sum test
batch effect
differential expression analysis
scRNA-seq analysis

ASJC Scopus subject areas

Biotechnology
Genetics
Applied Mathematics

MD Anderson CCSG core facilities

Bioinformatics Shared Resource

Access to Document

10.1109/TCBB.2021.3094650

Cite this

@article{639eb6b1ddb64912a2d9f5316ce80f2c,

title = "Stratified Test Accurately Identifies Differentially Expressed Genes under Batch Effects in Single-Cell Data",

abstract = "Analyzing single-cell sequencing data from large cohorts is challenging. Discrepancies across experiments and differences among participants often lead to omissions and false discoveries in differentially expressed genes. We find that the Van Elteren test, a stratified version of the widely used Wilcoxon rank-sum test, elegantly mitigates the problem. We also modified the common language effect size to supplement this test, further improving its utility. On both simulated and real patient data we show the ability of Van Elteren test to control for false positives and false negatives. A comprehensive assessment using receiver operating characteristic (ROC) curve shows that Van Elteren test achieves higher sensitivity and specificity on simulated datasets, compared with nine state-of-the-art differential expression analysis methods. The effect size also estimates the differences between cell types more accurately.",

keywords = "Van Elteren test, Wilcoxon rank-sum test, batch effect, differential expression analysis, scRNA-seq analysis",

author = "Shaoheng Liang and Qingnan Liang and Rui Chen and Ken Chen",

note = "Publisher Copyright: {\textcopyright} 2004-2012 IEEE.",

year = "2021",

doi = "10.1109/TCBB.2021.3094650",

language = "English (US)",

volume = "18",

pages = "2072--2079",

journal = "IEEE/ACM Transactions on Computational Biology and Bioinformatics",

issn = "1545-5963",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "6",

}

TY - JOUR

T1 - Stratified Test Accurately Identifies Differentially Expressed Genes under Batch Effects in Single-Cell Data

AU - Liang, Shaoheng

AU - Liang, Qingnan

AU - Chen, Rui

AU - Chen, Ken

PY - 2021

Y1 - 2021

N2 - Analyzing single-cell sequencing data from large cohorts is challenging. Discrepancies across experiments and differences among participants often lead to omissions and false discoveries in differentially expressed genes. We find that the Van Elteren test, a stratified version of the widely used Wilcoxon rank-sum test, elegantly mitigates the problem. We also modified the common language effect size to supplement this test, further improving its utility. On both simulated and real patient data we show the ability of Van Elteren test to control for false positives and false negatives. A comprehensive assessment using receiver operating characteristic (ROC) curve shows that Van Elteren test achieves higher sensitivity and specificity on simulated datasets, compared with nine state-of-the-art differential expression analysis methods. The effect size also estimates the differences between cell types more accurately.

AB - Analyzing single-cell sequencing data from large cohorts is challenging. Discrepancies across experiments and differences among participants often lead to omissions and false discoveries in differentially expressed genes. We find that the Van Elteren test, a stratified version of the widely used Wilcoxon rank-sum test, elegantly mitigates the problem. We also modified the common language effect size to supplement this test, further improving its utility. On both simulated and real patient data we show the ability of Van Elteren test to control for false positives and false negatives. A comprehensive assessment using receiver operating characteristic (ROC) curve shows that Van Elteren test achieves higher sensitivity and specificity on simulated datasets, compared with nine state-of-the-art differential expression analysis methods. The effect size also estimates the differences between cell types more accurately.

KW - Van Elteren test

KW - Wilcoxon rank-sum test

KW - batch effect

KW - differential expression analysis

KW - scRNA-seq analysis

UR - http://www.scopus.com/inward/record.url?scp=85112628092&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85112628092&partnerID=8YFLogxK

U2 - 10.1109/TCBB.2021.3094650

DO - 10.1109/TCBB.2021.3094650

M3 - Article

C2 - 34232885

AN - SCOPUS:85112628092

SN - 1545-5963

VL - 18

SP - 2072

EP - 2079

JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics

JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics

IS - 6

ER -

Stratified Test Accurately Identifies Differentially Expressed Genes under Batch Effects in Single-Cell Data

Abstract

Keywords

ASJC Scopus subject areas

MD Anderson CCSG core facilities

Access to Document

Other files and links

Fingerprint

Cite this