Abstract
Motivation: In contrasting levels of gene expression between groups of SAGE libraries, the libraries within each group are often combined and the counts for the tag of interest summed, and inference is made on the basis of these larger 'pseudolibraries'. While this captures the sampling variability inherent in the procedure, it fails to allow for normal variation in levels of the gene between individuals within the same group, and can consequently overstate the significance of the results. The effect is not slight: between-library variation can be hundreds of times the within-library variation. Results: We introduce a beta-binomial sampling model that correctly incorporates both sources of variation. We show how to fit the parameters of this model, and introduce a test statistic for differential expression similar to a two-sample t-test.
Original language | English (US) |
---|---|
Pages (from-to) | 1477-1483 |
Number of pages | 7 |
Journal | Bioinformatics |
Volume | 19 |
Issue number | 12 |
DOIs | |
State | Published - Aug 12 2003 |
ASJC Scopus subject areas
- Statistics and Probability
- Biochemistry
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics