Label-aware distance mitigates temporal and spatial variability for clustering and visualization of single-cell gene expression data

Shaoheng Liang, Jinzhuang Dou, Ramiz Iqbal, Ken Chen

Research output: Contribution to journalArticlepeer-review

Abstract

Clustering and visualization are essential parts of single-cell gene expression data analysis. The Euclidean distance used in most distance-based methods is not optimal. The batch effect, i.e., the variability among samples gathered from different times, tissues, and patients, introduces large between-group distance and obscures the true identities of cells. To solve this problem, we introduce Label-Aware Distance (Lad), a metric using temporal/spatial locality of the batch effect to control for such factors. We validate Lad on simulated data as well as apply it to a mouse retina development dataset and a lung dataset. We also found the utility of our approach in understanding the progression of the Coronavirus Disease 2019 (COVID-19). Lad provides better cell embedding than state-of-the-art batch correction methods on longitudinal datasets. It can be used in distance-based clustering and visualization methods to combine the power of multiple samples to help make biological findings.

Original languageEnglish (US)
Article number326
JournalCommunications Biology
Volume7
Issue number1
DOIs
StatePublished - Dec 2024

ASJC Scopus subject areas

  • Medicine (miscellaneous)
  • General Biochemistry, Genetics and Molecular Biology
  • General Agricultural and Biological Sciences

Fingerprint

Dive into the research topics of 'Label-aware distance mitigates temporal and spatial variability for clustering and visualization of single-cell gene expression data'. Together they form a unique fingerprint.

Cite this