Incorporating ENCODE information into association analysis of whole genome sequencing data

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

With the rapidly decreasing cost of the next-generation sequencing technology, a large number of whole genome sequences have been generated, enabling researchers to survey rare variants in the protein-coding and regulatory regions of the genome. However, it remains a daunting task to identify functional variants associated with complex diseases from whole genome sequencing (WGS) data because of the millions of candidate variants and yet moderate sample size. We propose to incorporate the Encyclopedia of DNA Elements (ENCODE) information in the association analysis of WGS data to boost the statistical power. We use the RegulomeDB and PolyPhen2 scores as external weights in existing rare variants association tests. We demonstrate the proposed framework using the WGS data and blood pressure phenotype from the San Antonio Family Studies provided by the Genetic Analysis Workshop 19. We identified a genome-wide significant locus in gene SNUPN on chromosome 15 that harbors a rare nonsynonymous variant, which was not detected by benchmark methods that did not incorporate biological information, including the T5 burden test and sequence kernel association test.

Original languageEnglish (US)
Article number9
JournalBMC Proceedings
Volume10
DOIs
StatePublished - 2016

ASJC Scopus subject areas

  • General Biochemistry, Genetics and Molecular Biology

Fingerprint

Dive into the research topics of 'Incorporating ENCODE information into association analysis of whole genome sequencing data'. Together they form a unique fingerprint.

Cite this