CondiS: A conditional survival distribution-based method for censored data imputation overcoming the hurdle in machine learning-based survival analysis

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Data analyses by machine learning (ML) algorithms are gaining popularity in biomedical research. When time-to-event data are of interest, censoring is common and needs to be properly addressed. Most ML methods cannot conveniently and appropriately take the censoring information into consideration, potentially leading to inaccurate or biased results. We aim to develop a general-purpose method for imputing censored survival data, facilitating downstream ML analysis. In this study, we propose a novel method of imputing the survival times for censored observations. The proposal is based on their conditional survival distributions (CondiS) derived from Kaplan-Meier estimators. CondiS can replace censored observations with their best approximations from the statistical model, allowing for direct application of ML methods. When covariates are available, we extend CondiS by incorporating the covariate information through ML modeling (CondiS-X), which further improves the accuracy of the imputed survival time. Compared with existing methods with similar purposes, the proposed methods achieved smaller prediction errors and higher concordance with the underlying true survival times in extensive simulation studies. We also demonstrated the usage and advantages of the proposed methods through two real-world cancer datasets. The major advantage of CondiS is that it allows for the direct application of standard ML techniques for analysis once the censored survival times are imputed. We present a user-friendly R package to implement our method, which is a useful tool for ML-based biomedical research in this era of big data.

Original languageEnglish (US)
Article number104117
JournalJournal of Biomedical Informatics
Volume131
DOIs
StatePublished - Jul 2022

Keywords

  • Data censoring
  • Imputation
  • Kaplan Meier
  • Machine learning
  • Survival analysis

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics

MD Anderson CCSG core facilities

  • Biostatistics Resource Group

Fingerprint

Dive into the research topics of 'CondiS: A conditional survival distribution-based method for censored data imputation overcoming the hurdle in machine learning-based survival analysis'. Together they form a unique fingerprint.

Cite this