A Bayesian approach to identify genes and gene-level SNP aggregates in a genetic analysis of cancer data

Francesco C. Stingo, Michael D. Swartz, Marina Vannucci

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Complex diseases, such as cancer, arise from complex etiologies consisting of multiple single-nucleotide polymorphisms (SNPs), each contributing a small amount to the overall risk of disease. Thus, many researchers have gone beyond single-SNPs analysis methods, focusing instead on groups of SNPs, for example by analysing haplotypes. More recently, pathway-based methods have been proposed that use prior biological knowledge on gene function to achieve a more powerful analysis of genome-wide association studies (GWAS) data. In this paper we propose a novel Bayesian modeling framework to identify molecular biomarkers for disease prediction. Our method combines pathway-based approaches with multiple SNP analyses of a specified region of interest. The model's development is motivated by SNP data from a lung cancer study. In our approach we define gene-level scores based on SNP allele frequencies and use a linear modeling setting to study the scores association to the observed phenotype. The basic idea behind the definition of gene-level scores is to weigh the SNPs within the gene according to their rarity, based on genotype frequencies expected under the Hardy-Weinberg equilibrium law. This results in scores giving more importance to the unusually low frequencies, i.e. to SNPs that might indicate peculiar genetic differences between subjects belonging to different groups. An additional feature of our approach is that we incorporate information on SNP-to-SNP associations into the model. In particular, we use network priors that model the linkage disequilibrium between SNPs. For posterior inference, we design a stochastic search method that identifies significant biomarkers (genes and SNPs) for disease prediction. We assess performances on simulated data and compare results to existing approaches. We then show the ability of the proposed methodology to detect relevant genes and associated SNPs in a lung cancer dataset.

Original languageEnglish (US)
Pages (from-to)137-151
Number of pages15
JournalStatistics and its Interface
Volume8
Issue number2
DOIs
StatePublished - 2015

Keywords

  • Bayesian variable selection
  • Hardy-Weinberg equilibrium law
  • Linear models
  • Linkage disequilibrium
  • Markov random field
  • SNP data

ASJC Scopus subject areas

  • Statistics and Probability
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'A Bayesian approach to identify genes and gene-level SNP aggregates in a genetic analysis of cancer data'. Together they form a unique fingerprint.

Cite this