TY - JOUR
T1 - Associations Between Radiation Oncologist Demographic Factors and Segmentation Similarity Benchmarks
T2 - Insights From a Crowd-Sourced Challenge Using Bayesian Estimation
AU - Wahid, Kareem A.
AU - Sahin, Onur
AU - Kundu, Suprateek
AU - Lin, Diana
AU - Alanis, Anthony
AU - Tehami, Salik
AU - Kamel, Serageldin
AU - Duke, Simon
AU - Sherer, Michael V.
AU - Rasmussen, Mathis
AU - Korreman, Stine
AU - Fuentes, David
AU - Cislo, Michael
AU - Nelms, Benjamin E.
AU - Christodouleas, John P.
AU - Murphy, James D.
AU - Mohamed, Abdallah S.R.
AU - He, Renjie
AU - Naser, Mohammed A.
AU - Gillespie, Erin F.
AU - Fuller, Clifton D.
N1 - Publisher Copyright:
© 2024 by American Society of Clinical Oncology.
PY - 2024
Y1 - 2024
N2 - PURPOSE The quality of radiotherapy auto-segmentation training data, primarily derived from clinician observers, is of utmost importance. However, the factors influencing the quality of clinician-derived segmentations are poorly understood; our study aims to quantify these factors. METHODS Organ at risk (OAR) and tumor-related segmentations provided by radiation oncologists from the Contouring Collaborative for Consensus in Radiation Oncology data set were used. Segmentations were derived from five disease sites: breast, sarcoma, head and neck (H&N), gynecologic (GYN), and GI. Segmentation quality was determined on a structure-by-structure basis by comparing the observer segmentations with an expert-derived consensus, which served as a reference standard benchmark. The Dice similarity coefficient (DSC) was primarily used as a metric for the comparisons. DSC was stratified into binary groups on the basis of structure-specific expert-derived interobserver variability (IOV) cutoffs. Generalized linear mixed-effects models using Bayesian estimation were used to investigate the association between demographic variables and the binarized DSC for each disease site. Variables with a highest density interval excluding zero were considered to substantially affect the outcome measure. RESULTS Five hundred seventy-four, 110, 452, 112, and 48 segmentations were used for the breast, sarcoma, H&N, GYN, and GI cases, respectively. The median percentage of segmentations that crossed the expert DSC IOV cutoff when stratified by structure type was 55% and 31% for OARs and tumors, respectively. Regression analysis revealed that the structure being tumor-related had a substantial negative impact on binarized DSC for the breast, sarcoma, H&N, and GI cases. There were no recurring relationships between segmentation quality and demographic variables across the cases, with most variables demonstrating large standard deviations. CONCLUSION Our study highlights substantial uncertainty surrounding conventionally presumed factors influencing segmentation quality relative to benchmarks.
AB - PURPOSE The quality of radiotherapy auto-segmentation training data, primarily derived from clinician observers, is of utmost importance. However, the factors influencing the quality of clinician-derived segmentations are poorly understood; our study aims to quantify these factors. METHODS Organ at risk (OAR) and tumor-related segmentations provided by radiation oncologists from the Contouring Collaborative for Consensus in Radiation Oncology data set were used. Segmentations were derived from five disease sites: breast, sarcoma, head and neck (H&N), gynecologic (GYN), and GI. Segmentation quality was determined on a structure-by-structure basis by comparing the observer segmentations with an expert-derived consensus, which served as a reference standard benchmark. The Dice similarity coefficient (DSC) was primarily used as a metric for the comparisons. DSC was stratified into binary groups on the basis of structure-specific expert-derived interobserver variability (IOV) cutoffs. Generalized linear mixed-effects models using Bayesian estimation were used to investigate the association between demographic variables and the binarized DSC for each disease site. Variables with a highest density interval excluding zero were considered to substantially affect the outcome measure. RESULTS Five hundred seventy-four, 110, 452, 112, and 48 segmentations were used for the breast, sarcoma, H&N, GYN, and GI cases, respectively. The median percentage of segmentations that crossed the expert DSC IOV cutoff when stratified by structure type was 55% and 31% for OARs and tumors, respectively. Regression analysis revealed that the structure being tumor-related had a substantial negative impact on binarized DSC for the breast, sarcoma, H&N, and GI cases. There were no recurring relationships between segmentation quality and demographic variables across the cases, with most variables demonstrating large standard deviations. CONCLUSION Our study highlights substantial uncertainty surrounding conventionally presumed factors influencing segmentation quality relative to benchmarks.
UR - http://www.scopus.com/inward/record.url?scp=85196137760&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85196137760&partnerID=8YFLogxK
U2 - 10.1200/CCI.23.00174
DO - 10.1200/CCI.23.00174
M3 - Article
C2 - 38870441
AN - SCOPUS:85196137760
SN - 2473-4276
VL - 8
JO - JCO Clinical Cancer Informatics
JF - JCO Clinical Cancer Informatics
M1 - e2300174
ER -