TY - JOUR
T1 - BM-SNP
T2 - A bayesian model for SNP calling using high throughput sequencing data
AU - Xu, Yanxun
AU - Zheng, Xiaofeng
AU - Yuan, Yuan
AU - Estecio, Marcos R.
AU - Issa, Jean Pierre
AU - Qiu, Peng
AU - Ji, Yuan
AU - Liang, Shoudan
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/11/1
Y1 - 2014/11/1
N2 - A single-nucleotide polymorphism (SNP) is a sole base change in the DNA sequence and is themost common polymorphism. Detection and annotation of SNPs are among the central topics in biomedical research as SNPs are believed to play important roles on the manifestation of phenotypic events, such as disease susceptibility. To take full advantage of the next-generation sequencing (NGS) technology, we propose a Bayesian approach, BM-SNP, to identify SNPs based on the posterior inference using NGS data. In particular, BM-SNP computes the posterior probability of nucleotide variation at each covered genomic position using the contents and frequency of the mapped short reads. The position with a high posterior probability of nucleotide variation is flagged as a potential SNP. We apply BM-SNP to two cell-line NGS data, and the results show a high ratio of overlap (> 95 percent) with the dbSNP database. Compared with MAQ, BM-SNP identifies more SNPs that are in dbSNP, with higher quality. The SNPs that are called only by BM-SNP but not in dbSNP may serve as new discoveries. The proposed BM-SNP method integrates information from multiple aspects of NGS data, and therefore achieves high detection power. BM-SNP is fast, capable of processing whole genome data at 20-fold average coverage in a short amount of time.
AB - A single-nucleotide polymorphism (SNP) is a sole base change in the DNA sequence and is themost common polymorphism. Detection and annotation of SNPs are among the central topics in biomedical research as SNPs are believed to play important roles on the manifestation of phenotypic events, such as disease susceptibility. To take full advantage of the next-generation sequencing (NGS) technology, we propose a Bayesian approach, BM-SNP, to identify SNPs based on the posterior inference using NGS data. In particular, BM-SNP computes the posterior probability of nucleotide variation at each covered genomic position using the contents and frequency of the mapped short reads. The position with a high posterior probability of nucleotide variation is flagged as a potential SNP. We apply BM-SNP to two cell-line NGS data, and the results show a high ratio of overlap (> 95 percent) with the dbSNP database. Compared with MAQ, BM-SNP identifies more SNPs that are in dbSNP, with higher quality. The SNPs that are called only by BM-SNP but not in dbSNP may serve as new discoveries. The proposed BM-SNP method integrates information from multiple aspects of NGS data, and therefore achieves high detection power. BM-SNP is fast, capable of processing whole genome data at 20-fold average coverage in a short amount of time.
KW - Bayesian
KW - False discovery rate (FDR)
KW - Markov chain monte carlo (MCMC)
KW - Next-generation sequencing (NGS)
KW - Single-nucleotide polymorphism (SNP)
UR - http://www.scopus.com/inward/record.url?scp=84919491972&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84919491972&partnerID=8YFLogxK
U2 - 10.1109/TCBB.2014.2321407
DO - 10.1109/TCBB.2014.2321407
M3 - Article
C2 - 26357041
AN - SCOPUS:84919491972
SN - 1545-5963
VL - 11
SP - 1038
EP - 1044
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
IS - 6
M1 - 6809195
ER -