Before and After: Comparison of Legacy and Harmonized TCGA Genomic Data Commons’ Data

The Genomic Data Analysis Network

Research output: Contribution to journalArticlepeer-review

91 Scopus citations

Abstract

We present a systematic analysis of the effects of synchronizing a large-scale, deeply characterized, multi-omic dataset to the current human reference genome, using updated software, pipelines, and annotations. For each of 5 molecular data platforms in The Cancer Genome Atlas (TCGA)—mRNA and miRNA expression, single nucleotide variants, DNA methylation and copy number alterations—comprehensive sample, gene, and probe-level studies were performed, towards quantifying the degree of similarity between the ‘legacy’ GRCh37 (hg19) TCGA data and its GRCh38 (hg38) version as ‘harmonized’ by the Genomic Data Commons. We offer gene lists to elucidate differences that remained after controlling for confounders, and strategies to mitigate their impact on biological interpretation. Our results demonstrate that the hg19 and hg38 TCGA datasets are very highly concordant, promote informed use of either legacy or harmonized omics data, and provide a rubric that encourages similar comparisons as new data emerge and reference data evolve. Gao et al. performed a systematic analysis of the effects of synchronizing the large-scale, widely used, multi-omic dataset of The Cancer Genome Atlas to the current human reference genome. For each of the five molecular data platforms assessed, they demonstrated a very high concordance between the ‘legacy’ GRCh37 (hg19) TCGA data and its GRCh38 (hg38) version as ‘harmonized’ by the Genomic Data Commons.

Original languageEnglish (US)
Pages (from-to)24-34.e10
JournalCell Systems
Volume9
Issue number1
DOIs
StatePublished - Jul 24 2019

Keywords

  • DNA methylation
  • The Cancer Genome Atlas
  • human reference genome
  • mRNA expression
  • microRNA expression
  • quality control
  • somatic copy number alteration
  • somatic mutation

ASJC Scopus subject areas

  • Pathology and Forensic Medicine
  • Histology
  • Cell Biology

MD Anderson CCSG core facilities

  • Bioinformatics Shared Resource

Fingerprint

Dive into the research topics of 'Before and After: Comparison of Legacy and Harmonized TCGA Genomic Data Commons’ Data'. Together they form a unique fingerprint.

Cite this