Abstract
Analyzing single-cell sequencing data from large cohorts is challenging. Discrepancies across experiments and differences among participants often lead to omissions and false discoveries in differentially expressed genes. We find that the Van Elteren test, a stratified version of the widely used Wilcoxon rank-sum test, elegantly mitigates the problem. We also modified the common language effect size to supplement this test, further improving its utility. On both simulated and real patient data we show the ability of Van Elteren test to control for false positives and false negatives. A comprehensive assessment using receiver operating characteristic (ROC) curve shows that Van Elteren test achieves higher sensitivity and specificity on simulated datasets, compared with nine state-of-the-art differential expression analysis methods. The effect size also estimates the differences between cell types more accurately.
Original language | English (US) |
---|---|
Pages (from-to) | 2072-2079 |
Number of pages | 8 |
Journal | IEEE/ACM Transactions on Computational Biology and Bioinformatics |
Volume | 18 |
Issue number | 6 |
DOIs | |
State | Published - 2021 |
Keywords
- Van Elteren test
- Wilcoxon rank-sum test
- batch effect
- differential expression analysis
- scRNA-seq analysis
ASJC Scopus subject areas
- Biotechnology
- Genetics
- Applied Mathematics
MD Anderson CCSG core facilities
- Bioinformatics Shared Resource