Identification of differentially expressed gene sets using the Generalized Berk-Jones statistic

Sheila M. Gaynor, Ryan Sun, Xihong Lin, John Quackenbush

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Cancer genomics studies frequently aim to identify genes that are differentially expressed between clinically distinct patient subgroups, generally by testing single genes one at a time. However, the results of any individual transcriptomic study are often not fully reproducible. A particular challenge impeding statistical analysis is the difficulty of distinguishing between differential expression comprising part of the genomic disease etiology and that induced by downstream effects. More robust analytical approaches that are well-powered to detect potentially causative genes, are less prone to discovering spurious associations, and can deliver reproducible findings across different studies are needed. Results: We propose a set-based procedure for testing of differential expression and show that this set-based approach can produce more robust results by aggregating information across multiple, correlated genomic markers. Specifically, we adapt the Generalized Berk-Jones statistic to test for the transcription factors that may contribute to the progression of estrogen receptor positive breast cancer. We demonstrate the ability of our method to produce reproducible findings by applying the same analysis to 21 publicly available datasets, producing a similar list of significant transcription factors across most studies. Our Generalized Berk-Jones approach produces results that show improved consistency over three set-based testing algorithms: Generalized Higher Criticism, Gene Set Analysis and Gene Set Enrichment Analysis.

Original languageEnglish (US)
Pages (from-to)4568-4576
Number of pages9
JournalBioinformatics
Volume35
Issue number22
DOIs
StatePublished - Nov 1 2019
Externally publishedYes

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

MD Anderson CCSG core facilities

  • Biostatistics Resource Group

Fingerprint

Dive into the research topics of 'Identification of differentially expressed gene sets using the Generalized Berk-Jones statistic'. Together they form a unique fingerprint.

Cite this