The Big Data Paradox in Clinical Practice

Pavlos Msaouel

doi:10.1080/07357907.2022.2084621

The Big Data Paradox in Clinical Practice

Pavlos Msaouel

Genitourinary Medical Oncology

Research output: Contribution to journal › Article › peer-review

19 Scopus citations

Abstract

The big data paradox is a real-world phenomenon whereby as the number of patients enrolled in a study increases, the probability that the confidence intervals from that study will include the truth decreases. This occurs in both observational and experimental studies, including randomized clinical trials, and should always be considered when clinicians are interpreting research data. Furthermore, as data quantity continues to increase in today’s era of big data, the paradox is becoming more pernicious. Herein, I consider three mechanisms that underlie this paradox, as well as three potential strategies to mitigate it: (1) improving data quality; (2) anticipating and modeling patient heterogeneity; (3) including the systematic error, not just the variance, in the estimation of error intervals.

Original language	English (US)
Pages (from-to)	567-576
Number of pages	10
Journal	Cancer Investigation
Volume	40
Issue number	7
DOIs	https://doi.org/10.1080/07357907.2022.2084621
State	Published - 2022

Keywords

Bias-variance trade-off
big data
patient relevance
relevance-robustness trade-off
robustness

ASJC Scopus subject areas

Oncology
Cancer Research

Access to Document

10.1080/07357907.2022.2084621

Cite this

@article{c1cdd062bdd44524b84ceaa1cdc0982c,

title = "The Big Data Paradox in Clinical Practice",

abstract = "The big data paradox is a real-world phenomenon whereby as the number of patients enrolled in a study increases, the probability that the confidence intervals from that study will include the truth decreases. This occurs in both observational and experimental studies, including randomized clinical trials, and should always be considered when clinicians are interpreting research data. Furthermore, as data quantity continues to increase in today{\textquoteright}s era of big data, the paradox is becoming more pernicious. Herein, I consider three mechanisms that underlie this paradox, as well as three potential strategies to mitigate it: (1) improving data quality; (2) anticipating and modeling patient heterogeneity; (3) including the systematic error, not just the variance, in the estimation of error intervals.",

keywords = "Bias-variance trade-off, big data, patient relevance, relevance-robustness trade-off, robustness",

author = "Pavlos Msaouel",

note = "Funding Information: Pavlos Msaouel is supported by a Career Development Award from the American Society of Clinical Oncology, a Research Award from KCCure, the MD Anderson Khalifa Scholar Award, the MD Anderson Physician-Scientist Award, philanthropic donations by Mike and Mary Allen, and the Andrew Sabin Family Foundation Fellowship. The author thank Drs Bora Lim (Associate Professor, Baylor College of Medicine, Houston, TX, USA) and Christopher Logothetis (Professor, The University of Texas MD Anderson Cancer Center, Houston, TX, USA) for helpful conversations, as well as Sarah Townsend (Senior Technical Writer; Department of Genitourinary Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA) for editorial assistance. Funding Information: Pavlos Msaouel reports honoraria for scientific advisory boards membership for Mirati Therapeutics, Bristol Myers Squibb, and Exelixis; consulting fees from Axiom Healthcare; non-branded educational programs supported by Exelixis and Pfizer; leadership or fiduciary roles as a Medical Steering Committee member for the Kidney Cancer Association and a Kidney Cancer Scientific Advisory Board member for KCCure; and research funding from Takeda, Bristol Myers Squibb, Mirati Therapeutics, and Gateway for Cancer Research. Publisher Copyright: {\textcopyright} 2022 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.",

year = "2022",

doi = "10.1080/07357907.2022.2084621",

language = "English (US)",

volume = "40",

pages = "567--576",

journal = "Cancer Investigation",

issn = "0735-7907",

publisher = "Informa Healthcare",

number = "7",

}

TY - JOUR

T1 - The Big Data Paradox in Clinical Practice

AU - Msaouel, Pavlos

N1 - Funding Information: Pavlos Msaouel is supported by a Career Development Award from the American Society of Clinical Oncology, a Research Award from KCCure, the MD Anderson Khalifa Scholar Award, the MD Anderson Physician-Scientist Award, philanthropic donations by Mike and Mary Allen, and the Andrew Sabin Family Foundation Fellowship. The author thank Drs Bora Lim (Associate Professor, Baylor College of Medicine, Houston, TX, USA) and Christopher Logothetis (Professor, The University of Texas MD Anderson Cancer Center, Houston, TX, USA) for helpful conversations, as well as Sarah Townsend (Senior Technical Writer; Department of Genitourinary Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA) for editorial assistance. Funding Information: Pavlos Msaouel reports honoraria for scientific advisory boards membership for Mirati Therapeutics, Bristol Myers Squibb, and Exelixis; consulting fees from Axiom Healthcare; non-branded educational programs supported by Exelixis and Pfizer; leadership or fiduciary roles as a Medical Steering Committee member for the Kidney Cancer Association and a Kidney Cancer Scientific Advisory Board member for KCCure; and research funding from Takeda, Bristol Myers Squibb, Mirati Therapeutics, and Gateway for Cancer Research. Publisher Copyright: © 2022 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.

PY - 2022

Y1 - 2022

N2 - The big data paradox is a real-world phenomenon whereby as the number of patients enrolled in a study increases, the probability that the confidence intervals from that study will include the truth decreases. This occurs in both observational and experimental studies, including randomized clinical trials, and should always be considered when clinicians are interpreting research data. Furthermore, as data quantity continues to increase in today’s era of big data, the paradox is becoming more pernicious. Herein, I consider three mechanisms that underlie this paradox, as well as three potential strategies to mitigate it: (1) improving data quality; (2) anticipating and modeling patient heterogeneity; (3) including the systematic error, not just the variance, in the estimation of error intervals.

AB - The big data paradox is a real-world phenomenon whereby as the number of patients enrolled in a study increases, the probability that the confidence intervals from that study will include the truth decreases. This occurs in both observational and experimental studies, including randomized clinical trials, and should always be considered when clinicians are interpreting research data. Furthermore, as data quantity continues to increase in today’s era of big data, the paradox is becoming more pernicious. Herein, I consider three mechanisms that underlie this paradox, as well as three potential strategies to mitigate it: (1) improving data quality; (2) anticipating and modeling patient heterogeneity; (3) including the systematic error, not just the variance, in the estimation of error intervals.

KW - Bias-variance trade-off

KW - big data

KW - patient relevance

KW - relevance-robustness trade-off

KW - robustness

UR - http://www.scopus.com/inward/record.url?scp=85131699989&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85131699989&partnerID=8YFLogxK

U2 - 10.1080/07357907.2022.2084621

DO - 10.1080/07357907.2022.2084621

M3 - Article

C2 - 35671042

AN - SCOPUS:85131699989

SN - 0735-7907

VL - 40

SP - 567

EP - 576

JO - Cancer Investigation

JF - Cancer Investigation

IS - 7

ER -

The Big Data Paradox in Clinical Practice

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this