TY - JOUR
T1 - Integrating DNA sequencing and transcriptomic data for association analyses of low-frequency variants and lipid traits
AU - Yang, Tianzhong
AU - Wu, Chong
AU - Wei, Peng
AU - Pan, Wei
N1 - Funding Information:
We thank the Minnesota Supercomputing Institute and Texas Advanced Computing Center for providing computing resources. The FHS is conducted and supported by the NHLBI in collaboration with Boston University (Contract No. N01-HC-25195 and HHSN268201500001I). This manuscript was not prepared in collaboration with investigators of the FHS and does not necessarily reflect the opinions or views of the FHS, Boston University or NHLBI. WGS for TOPMed program was supported by the NHLBI. WGS for ‘NHLBI TOPMed: Whole Genome Sequencing and Related Phenotypes in the Framingham Heart Study’ (phs000974.v1.p1) was performed at the Broad Institute of MIT and Harvard (HHSN268201500014C). Centralized read mapping and genotype calling, along with variant quality metrics and filtering were provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1; contract HHSN268201800002I). Phenotype harmonization, data management, sample-identity QC, and general study coordination were provided by the TOPMed Data Coordinating Center (3R01HL-120393-02S1; contract HHSN268201800001I). This study makes use of data generated by the UK10K Consortium, including samples from the TwinsUK and ALSPAC cohorts. A full list of the investigators who contributed to the generation of the data is available from www.UK10K.org. Funding for UK10K was provided by the Wellcome Trust under award WT091310. Data collection and sharing for this project were funded by the Alzheimer’s Disease Neuroimaging Initiative (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen Idec Inc.; Bristol-Myers Squibb Company; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Medpace, Inc.; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Synarc Inc. and Takeda Pharmaceutical Company. The Canadian Institutes of Rev December 5, 2013 Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.
Publisher Copyright:
© 2020 The Author(s) 2020. Published by Oxford University Press. All rights reserved.
PY - 2020/2/1
Y1 - 2020/2/1
N2 - Transcriptome-wide association studies (TWAS) integrate genome-wide association studies (GWAS) and transcriptomic data to showcase their improved statistical power of identifying gene-trait associations while, importantly, offering further biological insights. TWAS have thus far focused on common variants as available from GWAS. Compared with common variants, the findings for or even applications to low-frequency variants are limited and their underlying role in regulating gene expression is less clear. To fill this gap, we extend TWAS to integrating whole genome sequencing data with transcriptomic data for low-frequency variants. Using the data from the Framingham Heart Study, we demonstrate that low-frequency variants play an important and universal role in predicting gene expression, which is not completely due to linkage disequilibrium with the nearby common variants. By including low-frequency variants, in addition to common variants, we increase the predictivity of gene expression for 79% of the examined genes. Incorporating this piece of functional genomic information, we perform association testing for five lipid traits in two UK10K whole genome sequencing cohorts, hypothesizing that cis-expression quantitative trait loci, including low-frequency variants, are more likely to be trait-associated. We discover that two genes, LDLR and TTC22, are genome-wide significantly associated with low-density lipoprotein cholesterol based on 3203 subjects and that the association signals are largely independent of common variants. We further demonstrate that a joint analysis of both common and low-frequency variants identifies association signals that would be missed by testing on either common variants or low-frequency variants alone.
AB - Transcriptome-wide association studies (TWAS) integrate genome-wide association studies (GWAS) and transcriptomic data to showcase their improved statistical power of identifying gene-trait associations while, importantly, offering further biological insights. TWAS have thus far focused on common variants as available from GWAS. Compared with common variants, the findings for or even applications to low-frequency variants are limited and their underlying role in regulating gene expression is less clear. To fill this gap, we extend TWAS to integrating whole genome sequencing data with transcriptomic data for low-frequency variants. Using the data from the Framingham Heart Study, we demonstrate that low-frequency variants play an important and universal role in predicting gene expression, which is not completely due to linkage disequilibrium with the nearby common variants. By including low-frequency variants, in addition to common variants, we increase the predictivity of gene expression for 79% of the examined genes. Incorporating this piece of functional genomic information, we perform association testing for five lipid traits in two UK10K whole genome sequencing cohorts, hypothesizing that cis-expression quantitative trait loci, including low-frequency variants, are more likely to be trait-associated. We discover that two genes, LDLR and TTC22, are genome-wide significantly associated with low-density lipoprotein cholesterol based on 3203 subjects and that the association signals are largely independent of common variants. We further demonstrate that a joint analysis of both common and low-frequency variants identifies association signals that would be missed by testing on either common variants or low-frequency variants alone.
UR - http://www.scopus.com/inward/record.url?scp=85079345620&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85079345620&partnerID=8YFLogxK
U2 - 10.1093/hmg/ddz314
DO - 10.1093/hmg/ddz314
M3 - Article
C2 - 31919517
AN - SCOPUS:85079345620
SN - 0964-6906
VL - 29
SP - 515
EP - 526
JO - Human molecular genetics
JF - Human molecular genetics
IS - 3
ER -