BM-map: Bayesian mapping of multireads for next-generation sequencing data

Yuan Ji, Yanxun Xu, Qiong Zhang, Kam Wah Tsui, Yuan Yuan, Clift Norris, Shoudan Liang, Han Liang

Research output: Contribution to journalArticlepeer-review

18 Scopus citations

Abstract

Next-generation sequencing (NGS) technology generates millions of short reads, which provide valuable information for various aspects of cellular activities and biological functions. A key step in NGS applications (e.g., RNA-Seq) is to map short reads to correct genomic locations within the source genome. While most reads are mapped to a unique location, a significant proportion of reads align to multiple genomic locations with equal or similar numbers of mismatches; these are called multireads. The ambiguity in mapping the multireads may lead to bias in downstream analyses. Currently, most practitioners discard the multireads in their analysis, resulting in a loss of valuable information, especially for the genes with similar sequences. To refine the read mapping, we develop a Bayesian model that computes the posterior probability of mapping a multiread to each competing location. The probabilities are used for downstream analyses, such as the quantification of gene expression. We show through simulation studies and RNA-Seq analysis of real life data that the Bayesian method yields better mapping than the current leading methods. We provide a C++ program for downloading that is being packaged into a user-friendly software.

Original languageEnglish (US)
Pages (from-to)1215-1224
Number of pages10
JournalBiometrics
Volume67
Issue number4
DOIs
StatePublished - Dec 2011

Keywords

  • Data augmentation
  • RNA-Seq
  • Read alignment
  • Short reads
  • Solexa sequencing
  • Transcriptome

ASJC Scopus subject areas

  • Statistics and Probability
  • General Biochemistry, Genetics and Molecular Biology
  • General Immunology and Microbiology
  • General Agricultural and Biological Sciences
  • Applied Mathematics

MD Anderson CCSG core facilities

  • Bioinformatics Shared Resource
  • Biostatistics Resource Group

Fingerprint

Dive into the research topics of 'BM-map: Bayesian mapping of multireads for next-generation sequencing data'. Together they form a unique fingerprint.

Cite this