An approach for normalization and quality control for NanoString RNA expression data

Arjun Bhattacharya; Alina M. Hamilton; Helena Furberg; Eugene Pietzak; Mark P. Purdue; Melissa A. Troester; Katherine A. Hoadley; Michael I. Love

doi:10.1093/bib/bbaa163

An approach for normalization and quality control for NanoString RNA expression data

Arjun Bhattacharya, Alina M. Hamilton, Helena Furberg, Eugene Pietzak, Mark P. Purdue, Melissa A. Troester, Katherine A. Hoadley, Michael I. Love

Research output: Contribution to journal › Article › peer-review

49 Scopus citations

Abstract

The NanoString RNA counting assay for formalin-fixed paraffin embedded samples is unique in its sensitivity, technical reproducibility and robustness for analysis of clinical and archival samples. While commercial normalization methods are provided by NanoString, they are not optimal for all settings, particularly when samples exhibit strong technical or biological variation or where housekeeping genes have variable performance across the cohort. Here, we develop and evaluate a more comprehensive normalization procedure for NanoString data with steps for quality control, selection of housekeeping targets, normalization and iterative data visualization and biological validation. The approach was evaluated using a large cohort ($N=\kern0.5em 1649$) from the Carolina Breast Cancer Study, two cohorts of moderate sample size ($N=359$ and$130$) and a small published dataset ($N=12$). The iterative process developed here eliminates technical variation (e.g. from different study phases or sites) more reliably than the three other methods, including NanoString's commercial package, without diminishing biological variation, especially in long-term longitudinal multiphase or multisite cohorts. We also find that probe sets validated for nCounter, such as the PAM50 gene signature, are impervious to batch issues. This work emphasizes that systematic quality control, normalization and visualization of NanoString nCounter data are an imperative component of study design that influences results in downstream analyses.

Original language	English (US)
Article number	bbaa163
Journal	Briefings in bioinformatics
Volume	22
Issue number	3
DOIs	https://doi.org/10.1093/bib/bbaa163
State	Published - May 1 2021
Externally published	Yes

Keywords

data visualization
gene expression normalization
NanoString nCounter expression
quality control

ASJC Scopus subject areas

Information Systems
Molecular Biology

Access to Document

10.1093/bib/bbaa163

Cite this

@article{c9fc009f6fb0479cb80616cbc6d7dfd7,

title = "An approach for normalization and quality control for NanoString RNA expression data",

abstract = "The NanoString RNA counting assay for formalin-fixed paraffin embedded samples is unique in its sensitivity, technical reproducibility and robustness for analysis of clinical and archival samples. While commercial normalization methods are provided by NanoString, they are not optimal for all settings, particularly when samples exhibit strong technical or biological variation or where housekeeping genes have variable performance across the cohort. Here, we develop and evaluate a more comprehensive normalization procedure for NanoString data with steps for quality control, selection of housekeeping targets, normalization and iterative data visualization and biological validation. The approach was evaluated using a large cohort ($N=\kern0.5em 1649$) from the Carolina Breast Cancer Study, two cohorts of moderate sample size ($N=359$ and$130$) and a small published dataset ($N=12$). The iterative process developed here eliminates technical variation (e.g. from different study phases or sites) more reliably than the three other methods, including NanoString's commercial package, without diminishing biological variation, especially in long-term longitudinal multiphase or multisite cohorts. We also find that probe sets validated for nCounter, such as the PAM50 gene signature, are impervious to batch issues. This work emphasizes that systematic quality control, normalization and visualization of NanoString nCounter data are an imperative component of study design that influences results in downstream analyses.",

keywords = "data visualization, gene expression normalization, NanoString nCounter expression, quality control",

author = "Arjun Bhattacharya and Hamilton, {Alina M.} and Helena Furberg and Eugene Pietzak and Purdue, {Mark P.} and Troester, {Melissa A.} and Hoadley, {Katherine A.} and Love, {Michael I.}",

year = "2021",

month = may,

day = "1",

doi = "10.1093/bib/bbaa163",

language = "English (US)",

volume = "22",

journal = "Briefings in bioinformatics",

issn = "1467-5463",

publisher = "Oxford University Press",

number = "3",

}

TY - JOUR

T1 - An approach for normalization and quality control for NanoString RNA expression data

AU - Bhattacharya, Arjun

AU - Hamilton, Alina M.

AU - Furberg, Helena

AU - Pietzak, Eugene

AU - Purdue, Mark P.

AU - Troester, Melissa A.

AU - Hoadley, Katherine A.

AU - Love, Michael I.

PY - 2021/5/1

Y1 - 2021/5/1

N2 - The NanoString RNA counting assay for formalin-fixed paraffin embedded samples is unique in its sensitivity, technical reproducibility and robustness for analysis of clinical and archival samples. While commercial normalization methods are provided by NanoString, they are not optimal for all settings, particularly when samples exhibit strong technical or biological variation or where housekeeping genes have variable performance across the cohort. Here, we develop and evaluate a more comprehensive normalization procedure for NanoString data with steps for quality control, selection of housekeeping targets, normalization and iterative data visualization and biological validation. The approach was evaluated using a large cohort ($N=\kern0.5em 1649$) from the Carolina Breast Cancer Study, two cohorts of moderate sample size ($N=359$ and$130$) and a small published dataset ($N=12$). The iterative process developed here eliminates technical variation (e.g. from different study phases or sites) more reliably than the three other methods, including NanoString's commercial package, without diminishing biological variation, especially in long-term longitudinal multiphase or multisite cohorts. We also find that probe sets validated for nCounter, such as the PAM50 gene signature, are impervious to batch issues. This work emphasizes that systematic quality control, normalization and visualization of NanoString nCounter data are an imperative component of study design that influences results in downstream analyses.

AB - The NanoString RNA counting assay for formalin-fixed paraffin embedded samples is unique in its sensitivity, technical reproducibility and robustness for analysis of clinical and archival samples. While commercial normalization methods are provided by NanoString, they are not optimal for all settings, particularly when samples exhibit strong technical or biological variation or where housekeeping genes have variable performance across the cohort. Here, we develop and evaluate a more comprehensive normalization procedure for NanoString data with steps for quality control, selection of housekeeping targets, normalization and iterative data visualization and biological validation. The approach was evaluated using a large cohort ($N=\kern0.5em 1649$) from the Carolina Breast Cancer Study, two cohorts of moderate sample size ($N=359$ and$130$) and a small published dataset ($N=12$). The iterative process developed here eliminates technical variation (e.g. from different study phases or sites) more reliably than the three other methods, including NanoString's commercial package, without diminishing biological variation, especially in long-term longitudinal multiphase or multisite cohorts. We also find that probe sets validated for nCounter, such as the PAM50 gene signature, are impervious to batch issues. This work emphasizes that systematic quality control, normalization and visualization of NanoString nCounter data are an imperative component of study design that influences results in downstream analyses.

KW - data visualization

KW - gene expression normalization

KW - NanoString nCounter expression

KW - quality control

UR - http://www.scopus.com/inward/record.url?scp=85107088804&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85107088804&partnerID=8YFLogxK

U2 - 10.1093/bib/bbaa163

DO - 10.1093/bib/bbaa163

M3 - Article

C2 - 32789507

AN - SCOPUS:85107088804

SN - 1467-5463

VL - 22

JO - Briefings in bioinformatics

JF - Briefings in bioinformatics

IS - 3

M1 - bbaa163

ER -

An approach for normalization and quality control for NanoString RNA expression data

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this