eClock: An ensemble-based method to accurately predict ages with a biased distribution from DNA methylation data

Research output: Contribution to journalArticlepeer-review

Abstract

DNA methylation is closely related to senescence, so it has been used to develop statistical models, called clock models, to predict chronological ages accurately. However, because the training data always have a biased age distribution, the model performance becomes weak for the samples with a small age distribution density. To solve this problem, we developed the R package eClock, which uses a bagging-SMOTE method to adjust the biased distribution and predict age with an ensemble model. Moreover, it also provides a bootstrapped model based on bagging only and a traditional clock model. The performance on three datasets showed that the bagging-SMOTE model significantly improved rare sample age prediction. In addition to model construction, the package also provides other functions such as data visualization and methylation feature conversion to facilitate the research in relevant areas.

Original languageEnglish (US)
Article numbere0267349
JournalPloS one
Volume17
Issue number5 May
DOIs
StatePublished - May 2022
Externally publishedYes

ASJC Scopus subject areas

  • General

Fingerprint

Dive into the research topics of 'eClock: An ensemble-based method to accurately predict ages with a biased distribution from DNA methylation data'. Together they form a unique fingerprint.

Cite this