A comprehensive comparison of supervised and unsupervised methods for cell type identification in single-cell RNA-seq

Xiaobo Sun, Xiaochu Lin, Ziyi Li, Hao Wu

Research output: Contribution to journalArticlepeer-review

17 Scopus citations

Abstract

The cell type identification is among the most important tasks in single-cell RNA-sequencing (scRNA-seq) analysis. Many in silico methods have been developed and can be roughly categorized as either supervised or unsupervised. In this study, we investigated the performances of 8 supervised and 10 unsupervised cell type identification methods using 14 public scRNA-seq datasets of different tissues, sequencing protocols and species. We investigated the impacts of a number of factors, including total amount of cells, number of cell types, sequencing depth, batch effects, reference bias, cell population imbalance, unknown/novel cell type, and computational efficiency and scalability. Instead of merely comparing individual methods, we focused on factors' impacts on the general category of supervised and unsupervised methods. We found that in most scenarios, the supervised methods outperformed the unsupervised methods, except for the identification of unknown cell types. This is particularly true when the supervised methods use a reference dataset with high informational sufficiency, low complexity and high similarity to the query dataset. However, such outperformance could be undermined by some undesired dataset properties investigated in this study, which lead to uninformative and biased reference datasets. In these scenarios, unsupervised methods could be comparable to supervised methods. Our study not only explained the cell typing methods' behaviors under different experimental settings but also provided a general guideline for the choice of method according to the scientific goal and dataset properties. Finally, our evaluation workflow is implemented as a modularized R pipeline that allows future evaluation of new methods. Availability: All the source codes are available at https://github.com/xsun28/scRNAIdent.

Original languageEnglish (US)
Article numberbbab567
JournalBriefings in bioinformatics
Volume23
Issue number2
DOIs
StatePublished - Mar 1 2022

Keywords

  • cell type identification
  • ScRNA-seq
  • supervised learning
  • unsupervised clustering

ASJC Scopus subject areas

  • Information Systems
  • Molecular Biology

MD Anderson CCSG core facilities

  • Biostatistics Resource Group

Fingerprint

Dive into the research topics of 'A comprehensive comparison of supervised and unsupervised methods for cell type identification in single-cell RNA-seq'. Together they form a unique fingerprint.

Cite this