Abstract
Motivation: ChIP-seq is becoming the main approach to the genome-wide study of protein-DNA interactions and histone modifications. Existing informatics tools perform well to extract strong ChIP-enriched sites. However, two questions remain to be answered: (i) to which extent is a ChIP-seq experiment able to reveal the weak ChIP-enriched sites? (ii) are the weak sites biologically meaningful? To answer these questions, it is necessary to identify the weak ChIP signals from background noise. Results: We propose a linear signal-noise model, in which a noise rate was introduced to represent the fraction of noise in a ChIP library. We developed an iterative algorithm to estimate the noise rate using a control library, and derived a library-swapping strategy for the false discovery rate estimation. These approaches were integrated in a general-purpose framework, named CCAT (Control-based ChIP-seq Analysis Tool), for the significance analysis of ChIP-seq. Applications to H3K4me3 and H3K36me3 datasets showed that CCAT predicted significantly more ChIP-enriched sites that the previous methods did. With the high sensitivity of CCAT prediction, we revealed distinct chromatin features associated to the strong and weak H3K4me3 sites. Availability: http://cmb.gis.a-star.edu.sg/ChIPSeq/tools.htm. Contact: sungk@gis.a-star.edu.sg; asflin@ntu.edu.sg. Supplementary Information:Supplementary data are available at Bioinformatics online.
Original language | English (US) |
---|---|
Article number | btq128 |
Pages (from-to) | 1199-1204 |
Number of pages | 6 |
Journal | Bioinformatics |
Volume | 26 |
Issue number | 9 |
DOIs | |
State | Published - Apr 5 2010 |
Externally published | Yes |
ASJC Scopus subject areas
- Statistics and Probability
- Biochemistry
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics