TY - JOUR
T1 - Label-aware distance mitigates temporal and spatial variability for clustering and visualization of single-cell gene expression data
AU - Liang, Shaoheng
AU - Dou, Jinzhuang
AU - Iqbal, Ramiz
AU - Chen, Ken
N1 - Publisher Copyright:
© The Author(s) 2024.
PY - 2024/12
Y1 - 2024/12
N2 - Clustering and visualization are essential parts of single-cell gene expression data analysis. The Euclidean distance used in most distance-based methods is not optimal. The batch effect, i.e., the variability among samples gathered from different times, tissues, and patients, introduces large between-group distance and obscures the true identities of cells. To solve this problem, we introduce Label-Aware Distance (Lad), a metric using temporal/spatial locality of the batch effect to control for such factors. We validate Lad on simulated data as well as apply it to a mouse retina development dataset and a lung dataset. We also found the utility of our approach in understanding the progression of the Coronavirus Disease 2019 (COVID-19). Lad provides better cell embedding than state-of-the-art batch correction methods on longitudinal datasets. It can be used in distance-based clustering and visualization methods to combine the power of multiple samples to help make biological findings.
AB - Clustering and visualization are essential parts of single-cell gene expression data analysis. The Euclidean distance used in most distance-based methods is not optimal. The batch effect, i.e., the variability among samples gathered from different times, tissues, and patients, introduces large between-group distance and obscures the true identities of cells. To solve this problem, we introduce Label-Aware Distance (Lad), a metric using temporal/spatial locality of the batch effect to control for such factors. We validate Lad on simulated data as well as apply it to a mouse retina development dataset and a lung dataset. We also found the utility of our approach in understanding the progression of the Coronavirus Disease 2019 (COVID-19). Lad provides better cell embedding than state-of-the-art batch correction methods on longitudinal datasets. It can be used in distance-based clustering and visualization methods to combine the power of multiple samples to help make biological findings.
UR - http://www.scopus.com/inward/record.url?scp=85187911031&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85187911031&partnerID=8YFLogxK
U2 - 10.1038/s42003-024-05988-y
DO - 10.1038/s42003-024-05988-y
M3 - Article
C2 - 38486077
AN - SCOPUS:85187911031
SN - 2399-3642
VL - 7
JO - Communications Biology
JF - Communications Biology
IS - 1
M1 - 326
ER -