Genotype error due to low-coverage sequencing induces uncertainty in polygenic scoring

Ella Petter, Yi Ding, Kangcheng Hou, Arjun Bhattacharya, Alexander Gusev, Noah Zaitlen, Bogdan Pasaniuc

Research output: Contribution to journalArticlepeer-review

Abstract

Polygenic scores (PGSs) have emerged as a standard approach to predict phenotypes from genotype data in a wide array of applications from socio-genomics to personalized medicine. Traditional PGSs assume genotype data to be error-free, ignoring possible errors and uncertainties introduced from genotyping, sequencing, and/or imputation. In this work, we investigate the effects of genotyping error due to low coverage sequencing on PGS estimation. We leverage SNP array and low-coverage whole-genome sequencing data (lcWGS, median coverage 0.04×) of 802 individuals from the Dana-Farber PROFILE cohort to show that PGS error correlates with sequencing depth (p = 1.2 × 10−7). We develop a probabilistic approach that incorporates genotype error in PGS estimation to produce well-calibrated PGS credible intervals and show that the probabilistic approach increases classification accuracy by up to 6% as compared to traditional PGSs that ignore genotyping error. Finally, we use simulations to explore the combined effect of genotyping and effect size errors and their implication on PGS-based risk-stratification. Our results illustrate the importance of considering genotyping error as a source of PGS error especially for cohorts with varying genotyping technologies and/or low-coverage sequencing.

Original languageEnglish (US)
Pages (from-to)1319-1329
Number of pages11
JournalAmerican journal of human genetics
Volume110
Issue number8
DOIs
StatePublished - Aug 3 2023
Externally publishedYes

Keywords

  • effect sizes
  • genotype error
  • lcWGS
  • PGS
  • PGS error
  • risk stratification
  • uncertainty

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)

Fingerprint

Dive into the research topics of 'Genotype error due to low-coverage sequencing induces uncertainty in polygenic scoring'. Together they form a unique fingerprint.

Cite this