TY - JOUR
T1 - XPAT
T2 - a toolkit to conduct cross-platform association studies with heterogeneous sequencing datasets
AU - Yu, Yao
AU - Hu, Hao
AU - Bohlender, Ryan J.
AU - Hu, Fulan
AU - Chen, Jiun Sheng
AU - Holt, Carson
AU - Fowler, Jerry
AU - Guthery, Stephen L.
AU - Scheet, Paul
AU - Hildebrandt, Michelle A.T.
AU - Yandell, Mark
AU - Huff, Chad D.
N1 - Publisher Copyright:
© The Author(s) 2017
PY - 2018/4/6
Y1 - 2018/4/6
N2 - High-throughput sequencing data are increasingly being made available to the research community for secondary analyses, providing new opportunities for large-scale association studies. However, heterogeneity in target capture and sequencing technologies often introduce strong technological stratification biases that overwhelm subtle signals of association in studies of complex traits. Here, we introduce the Cross-Platform Association Toolkit, XPAT, which provides a suite of tools designed to support and conduct large-scale association studies with heterogeneous sequencing datasets. XPAT includes tools to support cross-platform aware variant calling, quality control filtering, gene-based association testing and rare variant effect size estimation. To evaluate the performance of XPAT, we conducted case-control association studies for three diseases, including 783 breast cancer cases, 272 ovarian cancer cases, 205 Crohn disease cases and 3507 shared controls (including 1722 females) using sequencing data from multiple sources. XPAT greatly reduced Type I error inflation in the case-control analyses, while replicating many previously identified disease-gene associations. We also show that association tests conducted with XPAT using cross-platform data have comparable performance to tests using matched platform data. XPAT enables new association studies that combine existing sequencing datasets to identify genetic loci associated with common diseases and other complex traits.
AB - High-throughput sequencing data are increasingly being made available to the research community for secondary analyses, providing new opportunities for large-scale association studies. However, heterogeneity in target capture and sequencing technologies often introduce strong technological stratification biases that overwhelm subtle signals of association in studies of complex traits. Here, we introduce the Cross-Platform Association Toolkit, XPAT, which provides a suite of tools designed to support and conduct large-scale association studies with heterogeneous sequencing datasets. XPAT includes tools to support cross-platform aware variant calling, quality control filtering, gene-based association testing and rare variant effect size estimation. To evaluate the performance of XPAT, we conducted case-control association studies for three diseases, including 783 breast cancer cases, 272 ovarian cancer cases, 205 Crohn disease cases and 3507 shared controls (including 1722 females) using sequencing data from multiple sources. XPAT greatly reduced Type I error inflation in the case-control analyses, while replicating many previously identified disease-gene associations. We also show that association tests conducted with XPAT using cross-platform data have comparable performance to tests using matched platform data. XPAT enables new association studies that combine existing sequencing datasets to identify genetic loci associated with common diseases and other complex traits.
UR - http://www.scopus.com/inward/record.url?scp=85046254275&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85046254275&partnerID=8YFLogxK
U2 - 10.1093/NAR/GKX1280
DO - 10.1093/NAR/GKX1280
M3 - Article
C2 - 29294048
AN - SCOPUS:85046254275
SN - 0305-1048
VL - 46
SP - E32
JO - Nucleic acids research
JF - Nucleic acids research
IS - 6
ER -