Genotype error due to low-coverage sequencing induces uncertainty in polygenic scoring

Ella Petter; Yi Ding; Kangcheng Hou; Arjun Bhattacharya; Alexander Gusev; Noah Zaitlen; Bogdan Pasaniuc

doi:10.1016/j.ajhg.2023.06.015

Genotype error due to low-coverage sequencing induces uncertainty in polygenic scoring

Ella Petter, Yi Ding, Kangcheng Hou, Arjun Bhattacharya, Alexander Gusev, Noah Zaitlen, Bogdan Pasaniuc

Research output: Contribution to journal › Article › peer-review

Abstract

Polygenic scores (PGSs) have emerged as a standard approach to predict phenotypes from genotype data in a wide array of applications from socio-genomics to personalized medicine. Traditional PGSs assume genotype data to be error-free, ignoring possible errors and uncertainties introduced from genotyping, sequencing, and/or imputation. In this work, we investigate the effects of genotyping error due to low coverage sequencing on PGS estimation. We leverage SNP array and low-coverage whole-genome sequencing data (lcWGS, median coverage 0.04×) of 802 individuals from the Dana-Farber PROFILE cohort to show that PGS error correlates with sequencing depth (p = 1.2 × 10⁻⁷). We develop a probabilistic approach that incorporates genotype error in PGS estimation to produce well-calibrated PGS credible intervals and show that the probabilistic approach increases classification accuracy by up to 6% as compared to traditional PGSs that ignore genotyping error. Finally, we use simulations to explore the combined effect of genotyping and effect size errors and their implication on PGS-based risk-stratification. Our results illustrate the importance of considering genotyping error as a source of PGS error especially for cohorts with varying genotyping technologies and/or low-coverage sequencing.

Original language	English (US)
Pages (from-to)	1319-1329
Number of pages	11
Journal	American journal of human genetics
Volume	110
Issue number	8
DOIs	https://doi.org/10.1016/j.ajhg.2023.06.015
State	Published - Aug 3 2023
Externally published	Yes

Keywords

effect sizes
genotype error
lcWGS
PGS
PGS error
risk stratification
uncertainty

ASJC Scopus subject areas

Genetics
Genetics(clinical)

Access to Document

10.1016/j.ajhg.2023.06.015

Cite this

@article{b77c2edee2f6421c8092898c275ef79b,

title = "Genotype error due to low-coverage sequencing induces uncertainty in polygenic scoring",

abstract = "Polygenic scores (PGSs) have emerged as a standard approach to predict phenotypes from genotype data in a wide array of applications from socio-genomics to personalized medicine. Traditional PGSs assume genotype data to be error-free, ignoring possible errors and uncertainties introduced from genotyping, sequencing, and/or imputation. In this work, we investigate the effects of genotyping error due to low coverage sequencing on PGS estimation. We leverage SNP array and low-coverage whole-genome sequencing data (lcWGS, median coverage 0.04×) of 802 individuals from the Dana-Farber PROFILE cohort to show that PGS error correlates with sequencing depth (p = 1.2 × 10−7). We develop a probabilistic approach that incorporates genotype error in PGS estimation to produce well-calibrated PGS credible intervals and show that the probabilistic approach increases classification accuracy by up to 6% as compared to traditional PGSs that ignore genotyping error. Finally, we use simulations to explore the combined effect of genotyping and effect size errors and their implication on PGS-based risk-stratification. Our results illustrate the importance of considering genotyping error as a source of PGS error especially for cohorts with varying genotyping technologies and/or low-coverage sequencing.",

keywords = "effect sizes, genotype error, lcWGS, PGS, PGS error, risk stratification, uncertainty",

author = "Ella Petter and Yi Ding and Kangcheng Hou and Arjun Bhattacharya and Alexander Gusev and Noah Zaitlen and Bogdan Pasaniuc",

note = "Publisher Copyright: {\textcopyright} 2023 American Society of Human Genetics",

year = "2023",

month = aug,

day = "3",

doi = "10.1016/j.ajhg.2023.06.015",

language = "English (US)",

volume = "110",

pages = "1319--1329",

journal = "American journal of human genetics",

issn = "0002-9297",

publisher = "Cell Press",

number = "8",

}

TY - JOUR

T1 - Genotype error due to low-coverage sequencing induces uncertainty in polygenic scoring

AU - Petter, Ella

AU - Ding, Yi

AU - Hou, Kangcheng

AU - Bhattacharya, Arjun

AU - Gusev, Alexander

AU - Zaitlen, Noah

AU - Pasaniuc, Bogdan

PY - 2023/8/3

Y1 - 2023/8/3

N2 - Polygenic scores (PGSs) have emerged as a standard approach to predict phenotypes from genotype data in a wide array of applications from socio-genomics to personalized medicine. Traditional PGSs assume genotype data to be error-free, ignoring possible errors and uncertainties introduced from genotyping, sequencing, and/or imputation. In this work, we investigate the effects of genotyping error due to low coverage sequencing on PGS estimation. We leverage SNP array and low-coverage whole-genome sequencing data (lcWGS, median coverage 0.04×) of 802 individuals from the Dana-Farber PROFILE cohort to show that PGS error correlates with sequencing depth (p = 1.2 × 10−7). We develop a probabilistic approach that incorporates genotype error in PGS estimation to produce well-calibrated PGS credible intervals and show that the probabilistic approach increases classification accuracy by up to 6% as compared to traditional PGSs that ignore genotyping error. Finally, we use simulations to explore the combined effect of genotyping and effect size errors and their implication on PGS-based risk-stratification. Our results illustrate the importance of considering genotyping error as a source of PGS error especially for cohorts with varying genotyping technologies and/or low-coverage sequencing.

AB - Polygenic scores (PGSs) have emerged as a standard approach to predict phenotypes from genotype data in a wide array of applications from socio-genomics to personalized medicine. Traditional PGSs assume genotype data to be error-free, ignoring possible errors and uncertainties introduced from genotyping, sequencing, and/or imputation. In this work, we investigate the effects of genotyping error due to low coverage sequencing on PGS estimation. We leverage SNP array and low-coverage whole-genome sequencing data (lcWGS, median coverage 0.04×) of 802 individuals from the Dana-Farber PROFILE cohort to show that PGS error correlates with sequencing depth (p = 1.2 × 10−7). We develop a probabilistic approach that incorporates genotype error in PGS estimation to produce well-calibrated PGS credible intervals and show that the probabilistic approach increases classification accuracy by up to 6% as compared to traditional PGSs that ignore genotyping error. Finally, we use simulations to explore the combined effect of genotyping and effect size errors and their implication on PGS-based risk-stratification. Our results illustrate the importance of considering genotyping error as a source of PGS error especially for cohorts with varying genotyping technologies and/or low-coverage sequencing.

KW - effect sizes

KW - genotype error

KW - lcWGS

KW - PGS

KW - PGS error

KW - risk stratification

KW - uncertainty

UR - http://www.scopus.com/inward/record.url?scp=85166473871&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85166473871&partnerID=8YFLogxK

U2 - 10.1016/j.ajhg.2023.06.015

DO - 10.1016/j.ajhg.2023.06.015

M3 - Article

C2 - 37490908

AN - SCOPUS:85166473871

SN - 0002-9297

VL - 110

SP - 1319

EP - 1329

JO - American journal of human genetics

JF - American journal of human genetics

IS - 8

ER -

Genotype error due to low-coverage sequencing induces uncertainty in polygenic scoring

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this