Differential expression in SAGE: Accounting for normal between-library variation

Keith A. Baggerly; Li Deng; Jeffrey S. Morris; C. Marcelo Aldaz

doi:10.1093/bioinformatics/btg173

Differential expression in SAGE: Accounting for normal between-library variation

Keith A. Baggerly, Li Deng, Jeffrey S. Morris, C. Marcelo Aldaz

Research output: Contribution to journal › Article › peer-review

257 Scopus citations

Abstract

Motivation: In contrasting levels of gene expression between groups of SAGE libraries, the libraries within each group are often combined and the counts for the tag of interest summed, and inference is made on the basis of these larger 'pseudolibraries'. While this captures the sampling variability inherent in the procedure, it fails to allow for normal variation in levels of the gene between individuals within the same group, and can consequently overstate the significance of the results. The effect is not slight: between-library variation can be hundreds of times the within-library variation. Results: We introduce a beta-binomial sampling model that correctly incorporates both sources of variation. We show how to fit the parameters of this model, and introduce a test statistic for differential expression similar to a two-sample t-test.

Original language	English (US)
Pages (from-to)	1477-1483
Number of pages	7
Journal	Bioinformatics
Volume	19
Issue number	12
DOIs	https://doi.org/10.1093/bioinformatics/btg173
State	Published - Aug 12 2003

ASJC Scopus subject areas

Statistics and Probability
Biochemistry
Molecular Biology
Computer Science Applications
Computational Theory and Mathematics
Computational Mathematics

Access to Document

10.1093/bioinformatics/btg173

Cite this

@article{b612729cdc7547a5bb036e2a12098a22,

title = "Differential expression in SAGE: Accounting for normal between-library variation",

abstract = "Motivation: In contrasting levels of gene expression between groups of SAGE libraries, the libraries within each group are often combined and the counts for the tag of interest summed, and inference is made on the basis of these larger 'pseudolibraries'. While this captures the sampling variability inherent in the procedure, it fails to allow for normal variation in levels of the gene between individuals within the same group, and can consequently overstate the significance of the results. The effect is not slight: between-library variation can be hundreds of times the within-library variation. Results: We introduce a beta-binomial sampling model that correctly incorporates both sources of variation. We show how to fit the parameters of this model, and introduce a test statistic for differential expression similar to a two-sample t-test.",

author = "Baggerly, {Keith A.} and Li Deng and Morris, {Jeffrey S.} and Aldaz, {C. Marcelo}",

note = "Funding Information: The authors gratefully acknowledge support from NIH-NCI Grant 1U19 CA84978-1A1.",

year = "2003",

month = aug,

day = "12",

doi = "10.1093/bioinformatics/btg173",

language = "English (US)",

volume = "19",

pages = "1477--1483",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "12",

}

TY - JOUR

T1 - Differential expression in SAGE

T2 - Accounting for normal between-library variation

AU - Baggerly, Keith A.

AU - Deng, Li

AU - Morris, Jeffrey S.

AU - Aldaz, C. Marcelo

N1 - Funding Information: The authors gratefully acknowledge support from NIH-NCI Grant 1U19 CA84978-1A1.

PY - 2003/8/12

Y1 - 2003/8/12

N2 - Motivation: In contrasting levels of gene expression between groups of SAGE libraries, the libraries within each group are often combined and the counts for the tag of interest summed, and inference is made on the basis of these larger 'pseudolibraries'. While this captures the sampling variability inherent in the procedure, it fails to allow for normal variation in levels of the gene between individuals within the same group, and can consequently overstate the significance of the results. The effect is not slight: between-library variation can be hundreds of times the within-library variation. Results: We introduce a beta-binomial sampling model that correctly incorporates both sources of variation. We show how to fit the parameters of this model, and introduce a test statistic for differential expression similar to a two-sample t-test.

AB - Motivation: In contrasting levels of gene expression between groups of SAGE libraries, the libraries within each group are often combined and the counts for the tag of interest summed, and inference is made on the basis of these larger 'pseudolibraries'. While this captures the sampling variability inherent in the procedure, it fails to allow for normal variation in levels of the gene between individuals within the same group, and can consequently overstate the significance of the results. The effect is not slight: between-library variation can be hundreds of times the within-library variation. Results: We introduce a beta-binomial sampling model that correctly incorporates both sources of variation. We show how to fit the parameters of this model, and introduce a test statistic for differential expression similar to a two-sample t-test.

UR - http://www.scopus.com/inward/record.url?scp=0043009767&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0043009767&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btg173

DO - 10.1093/bioinformatics/btg173

M3 - Article

C2 - 12912827

AN - SCOPUS:0043009767

SN - 1367-4803

VL - 19

SP - 1477

EP - 1483

JO - Bioinformatics

JF - Bioinformatics

IS - 12

ER -

Differential expression in SAGE: Accounting for normal between-library variation

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this