EDClust: An EM-MM hybrid method for cell clustering in multiple-subject single-cell RNA sequencing

Xin Wei, Ziyi Li, Hongkai Ji, Hao Wu

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Motivation: Single-cell RNA sequencing (scRNA-seq) has revolutionized biological research by enabling the measurement of transcriptomic profiles at the single-cell level. With the increasing application of scRNA-seq in larger-scale studies, the problem of appropriately clustering cells emerges when the scRNA-seq data are from multiple subjects. One challenge is the subject-specific variation; systematic heterogeneity from multiple subjects may have a significant impact on clustering accuracy. Existing methods seeking to address such effects suffer from several limitations. Results: We develop a novel statistical method, EDClust, for multi-subject scRNA-seq cell clustering. EDClust models the sequence read counts by a mixture of Dirichlet-multinomial distributions and explicitly accounts for cell-type heterogeneity, subject heterogeneity and clustering uncertainty. An EM-MM hybrid algorithm is derived for maximizing the data likelihood and clustering the cells. We perform a series of simulation studies to evaluate the proposed method and demonstrate the outstanding performance of EDClust. Comprehensive benchmarking on four real scRNA-seq datasets with various tissue types and species demonstrates the substantial accuracy improvement of EDClust compared to existing methods.

Original languageEnglish (US)
Pages (from-to)2692-2699
Number of pages8
JournalBioinformatics
Volume38
Issue number10
DOIs
StatePublished - May 15 2022

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

MD Anderson CCSG core facilities

  • Biostatistics Resource Group

Fingerprint

Dive into the research topics of 'EDClust: An EM-MM hybrid method for cell clustering in multiple-subject single-cell RNA sequencing'. Together they form a unique fingerprint.

Cite this