Comparison of Machine-Learning and Deep-Learning Methods for the Prediction of Osteoradionecrosis Resulting From Head and Neck Cancer Radiation Therapy

Brandon Reber; Lisanne Van Dijk; Brian Anderson; Abdallah Sherif Radwan Mohamed; Clifton Fuller; Stephen Lai; Kristy Brock

doi:10.1016/j.adro.2022.101163

Comparison of Machine-Learning and Deep-Learning Methods for the Prediction of Osteoradionecrosis Resulting From Head and Neck Cancer Radiation Therapy

Brandon Reber, Lisanne Van Dijk, Brian Anderson, Abdallah Sherif Radwan Mohamed, Clifton Fuller, Stephen Lai, Kristy Brock

Research output: Contribution to journal › Article › peer-review

3 Scopus citations

Abstract

Purpose: Deep-learning (DL) techniques have been successful in disease-prediction tasks and could improve the prediction of mandible osteoradionecrosis (ORN) resulting from head and neck cancer (HNC) radiation therapy. In this study, we retrospectively compared the performance of DL algorithms and traditional machine-learning (ML) techniques to predict mandible ORN binary outcome in an extensive cohort of patients with HNC. Methods and Materials: Patients who received HNC radiation therapy at the University of Texas MD Anderson Cancer Center from 2005 to 2015 were identified for the ML (n = 1259) and DL (n = 1236) studies. The subjects were followed for ORN development for at least 12 months, with 173 developing ORN and 1086 having no evidence of ORN. The ML models used dose-volume histogram parameters to predict ORN development. These models included logistic regression, random forest, support vector machine, and a random classifier reference. The DL models were based on ResNet, DenseNet, and autoencoder-based architectures. The DL models used each participant's dose cropped to the mandible. The effect of increasing the amount of available training data on the DL models’ prediction performance was evaluated by training the DL models using increasing ratios of the original training data. Results: The F1 score for the logistic regression model, the best-performing ML model, was 0.3. The best-performing ResNet, DenseNet, and autoencoder-based models had F1 scores of 0.07, 0.14, and 0.23, respectively, whereas the random classifier's F1 score was 0.17. No performance increase was apparent when we increased the amount of training data available for DL model training. Conclusions: The ML models had superior performance to their DL counterparts. The lack of improvement in DL performance with increased training data suggests that either more data are needed for appropriate DL model construction or that the image features used in DL models are not suitable for this task.

Original language	English (US)
Article number	101163
Journal	Advances in Radiation Oncology
Volume	8
Issue number	4
DOIs	https://doi.org/10.1016/j.adro.2022.101163
State	Published - Jul 1 2023

ASJC Scopus subject areas

Oncology
Radiology Nuclear Medicine and imaging

Access to Document

10.1016/j.adro.2022.101163

Cite this

@article{fc0b0b1f568a4c07ac186cb05e4b6301,

title = "Comparison of Machine-Learning and Deep-Learning Methods for the Prediction of Osteoradionecrosis Resulting From Head and Neck Cancer Radiation Therapy",

abstract = "Purpose: Deep-learning (DL) techniques have been successful in disease-prediction tasks and could improve the prediction of mandible osteoradionecrosis (ORN) resulting from head and neck cancer (HNC) radiation therapy. In this study, we retrospectively compared the performance of DL algorithms and traditional machine-learning (ML) techniques to predict mandible ORN binary outcome in an extensive cohort of patients with HNC. Methods and Materials: Patients who received HNC radiation therapy at the University of Texas MD Anderson Cancer Center from 2005 to 2015 were identified for the ML (n = 1259) and DL (n = 1236) studies. The subjects were followed for ORN development for at least 12 months, with 173 developing ORN and 1086 having no evidence of ORN. The ML models used dose-volume histogram parameters to predict ORN development. These models included logistic regression, random forest, support vector machine, and a random classifier reference. The DL models were based on ResNet, DenseNet, and autoencoder-based architectures. The DL models used each participant's dose cropped to the mandible. The effect of increasing the amount of available training data on the DL models{\textquoteright} prediction performance was evaluated by training the DL models using increasing ratios of the original training data. Results: The F1 score for the logistic regression model, the best-performing ML model, was 0.3. The best-performing ResNet, DenseNet, and autoencoder-based models had F1 scores of 0.07, 0.14, and 0.23, respectively, whereas the random classifier's F1 score was 0.17. No performance increase was apparent when we increased the amount of training data available for DL model training. Conclusions: The ML models had superior performance to their DL counterparts. The lack of improvement in DL performance with increased training data suggests that either more data are needed for appropriate DL model construction or that the image features used in DL models are not suitable for this task.",

author = "Brandon Reber and {Van Dijk}, Lisanne and Brian Anderson and Mohamed, {Abdallah Sherif Radwan} and Clifton Fuller and Stephen Lai and Kristy Brock",

note = "Funding Information: Sources of support: Research reported in this publication was supported by the National Institutes of Health (NIH)/National Cancer Institute (NCI) under award number P30CA016672, the Helen Black Image Guided Fund, resources from the Image Guided Cancer Therapy Research Program at the University of Texas MD Anderson Cancer Center, a generous gift from the Apache Corporation, and support from the Tumor Measurement Initiative through the MD Anderson Strategic Initiative Development Program. Funding Information: Disclosures: Mr Reber and Dr Brock received support from the NIH/NCI under award number P30CA016672, the Helen Black Image Guided Fund, resources from the Image Guided Cancer Therapy Research Program at the University of Texas MD Anderson Cancer Center, a generous gift from the Apache Corporation, and support from the Tumor Measurement Initiative through the MD Anderson Strategic Initiative Development Program. Dr Van Dijk has received support from the Dutch Cancer Society (KWF-13529), Rubicon (NWO-452182317), and VENI (NWO-09150162010173). Dr Anderson received an Allied Scientist grant from the Society of Interventional Radiology. Abdallah Mohamed received support from the NIH through a NIH National Institute of Dental and Craniofacial Research (NIDCR) Academic Industrial Partnership Grant (R01DE028290), NIH/National Science Foundation NCI Smart Connected Health Program (R01CA257814), and an NIDCR Establish Outcome Measures for Clinical Studies of Oral and Craniofacial Diseases and Conditions award 1 (R01DE025248). Dr Fuller received an NCI Institutional Research Training Grant (T32CA261856) and National Institute of Biomedical Imaging and Bioengineering Grant for Research Education Programs for Residents and Clinical Fellows. Dr Lai has received support from the NIDCR (R01 DE025248). Funding Information: We thank the University of Texas MD Anderson Cancer Center Research Medical Library for manuscript editing. Sources of support: Research reported in this publication was supported by the National Institutes of Health (NIH)/National Cancer Institute (NCI) under award number P30CA016672, the Helen Black Image Guided Fund, resources from the Image Guided Cancer Therapy Research Program at the University of Texas MD Anderson Cancer Center, a generous gift from the Apache Corporation, and support from the Tumor Measurement Initiative through the MD Anderson Strategic Initiative Development Program. Disclosures: Mr Reber and Dr Brock received support from the NIH/NCI under award number P30CA016672, the Helen Black Image Guided Fund, resources from the Image Guided Cancer Therapy Research Program at the University of Texas MD Anderson Cancer Center, a generous gift from the Apache Corporation, and support from the Tumor Measurement Initiative through the MD Anderson Strategic Initiative Development Program. Dr Van Dijk has received support from the Dutch Cancer Society (KWF-13529), Rubicon (NWO-452182317), and VENI (NWO-09150162010173). Dr Anderson received an Allied Scientist grant from the Society of Interventional Radiology. Abdallah Mohamed received support from the NIH through a NIH National Institute of Dental and Craniofacial Research (NIDCR) Academic Industrial Partnership Grant (R01DE028290), NIH/National Science Foundation NCI Smart Connected Health Program (R01CA257814), and an NIDCR Establish Outcome Measures for Clinical Studies of Oral and Craniofacial Diseases and Conditions award 1 (R01DE025248). Dr Fuller received an NCI Institutional Research Training Grant (T32CA261856) and National Institute of Biomedical Imaging and Bioengineering Grant for Research Education Programs for Residents and Clinical Fellows. Dr Lai has received support from the NIDCR (R01 DE025248). Publisher Copyright: {\textcopyright} 2023 The Authors",

year = "2023",

month = jul,

day = "1",

doi = "10.1016/j.adro.2022.101163",

language = "English (US)",

volume = "8",

journal = "Advances in Radiation Oncology",

issn = "2452-1094",

publisher = "Elsevier Inc.",

number = "4",

}

TY - JOUR

T1 - Comparison of Machine-Learning and Deep-Learning Methods for the Prediction of Osteoradionecrosis Resulting From Head and Neck Cancer Radiation Therapy

AU - Reber, Brandon

AU - Van Dijk, Lisanne

AU - Anderson, Brian

AU - Mohamed, Abdallah Sherif Radwan

AU - Fuller, Clifton

AU - Lai, Stephen

AU - Brock, Kristy

N1 - Funding Information: Sources of support: Research reported in this publication was supported by the National Institutes of Health (NIH)/National Cancer Institute (NCI) under award number P30CA016672, the Helen Black Image Guided Fund, resources from the Image Guided Cancer Therapy Research Program at the University of Texas MD Anderson Cancer Center, a generous gift from the Apache Corporation, and support from the Tumor Measurement Initiative through the MD Anderson Strategic Initiative Development Program. Funding Information: Disclosures: Mr Reber and Dr Brock received support from the NIH/NCI under award number P30CA016672, the Helen Black Image Guided Fund, resources from the Image Guided Cancer Therapy Research Program at the University of Texas MD Anderson Cancer Center, a generous gift from the Apache Corporation, and support from the Tumor Measurement Initiative through the MD Anderson Strategic Initiative Development Program. Dr Van Dijk has received support from the Dutch Cancer Society (KWF-13529), Rubicon (NWO-452182317), and VENI (NWO-09150162010173). Dr Anderson received an Allied Scientist grant from the Society of Interventional Radiology. Abdallah Mohamed received support from the NIH through a NIH National Institute of Dental and Craniofacial Research (NIDCR) Academic Industrial Partnership Grant (R01DE028290), NIH/National Science Foundation NCI Smart Connected Health Program (R01CA257814), and an NIDCR Establish Outcome Measures for Clinical Studies of Oral and Craniofacial Diseases and Conditions award 1 (R01DE025248). Dr Fuller received an NCI Institutional Research Training Grant (T32CA261856) and National Institute of Biomedical Imaging and Bioengineering Grant for Research Education Programs for Residents and Clinical Fellows. Dr Lai has received support from the NIDCR (R01 DE025248). Funding Information: We thank the University of Texas MD Anderson Cancer Center Research Medical Library for manuscript editing. Sources of support: Research reported in this publication was supported by the National Institutes of Health (NIH)/National Cancer Institute (NCI) under award number P30CA016672, the Helen Black Image Guided Fund, resources from the Image Guided Cancer Therapy Research Program at the University of Texas MD Anderson Cancer Center, a generous gift from the Apache Corporation, and support from the Tumor Measurement Initiative through the MD Anderson Strategic Initiative Development Program. Disclosures: Mr Reber and Dr Brock received support from the NIH/NCI under award number P30CA016672, the Helen Black Image Guided Fund, resources from the Image Guided Cancer Therapy Research Program at the University of Texas MD Anderson Cancer Center, a generous gift from the Apache Corporation, and support from the Tumor Measurement Initiative through the MD Anderson Strategic Initiative Development Program. Dr Van Dijk has received support from the Dutch Cancer Society (KWF-13529), Rubicon (NWO-452182317), and VENI (NWO-09150162010173). Dr Anderson received an Allied Scientist grant from the Society of Interventional Radiology. Abdallah Mohamed received support from the NIH through a NIH National Institute of Dental and Craniofacial Research (NIDCR) Academic Industrial Partnership Grant (R01DE028290), NIH/National Science Foundation NCI Smart Connected Health Program (R01CA257814), and an NIDCR Establish Outcome Measures for Clinical Studies of Oral and Craniofacial Diseases and Conditions award 1 (R01DE025248). Dr Fuller received an NCI Institutional Research Training Grant (T32CA261856) and National Institute of Biomedical Imaging and Bioengineering Grant for Research Education Programs for Residents and Clinical Fellows. Dr Lai has received support from the NIDCR (R01 DE025248). Publisher Copyright: © 2023 The Authors

PY - 2023/7/1

Y1 - 2023/7/1

N2 - Purpose: Deep-learning (DL) techniques have been successful in disease-prediction tasks and could improve the prediction of mandible osteoradionecrosis (ORN) resulting from head and neck cancer (HNC) radiation therapy. In this study, we retrospectively compared the performance of DL algorithms and traditional machine-learning (ML) techniques to predict mandible ORN binary outcome in an extensive cohort of patients with HNC. Methods and Materials: Patients who received HNC radiation therapy at the University of Texas MD Anderson Cancer Center from 2005 to 2015 were identified for the ML (n = 1259) and DL (n = 1236) studies. The subjects were followed for ORN development for at least 12 months, with 173 developing ORN and 1086 having no evidence of ORN. The ML models used dose-volume histogram parameters to predict ORN development. These models included logistic regression, random forest, support vector machine, and a random classifier reference. The DL models were based on ResNet, DenseNet, and autoencoder-based architectures. The DL models used each participant's dose cropped to the mandible. The effect of increasing the amount of available training data on the DL models’ prediction performance was evaluated by training the DL models using increasing ratios of the original training data. Results: The F1 score for the logistic regression model, the best-performing ML model, was 0.3. The best-performing ResNet, DenseNet, and autoencoder-based models had F1 scores of 0.07, 0.14, and 0.23, respectively, whereas the random classifier's F1 score was 0.17. No performance increase was apparent when we increased the amount of training data available for DL model training. Conclusions: The ML models had superior performance to their DL counterparts. The lack of improvement in DL performance with increased training data suggests that either more data are needed for appropriate DL model construction or that the image features used in DL models are not suitable for this task.

AB - Purpose: Deep-learning (DL) techniques have been successful in disease-prediction tasks and could improve the prediction of mandible osteoradionecrosis (ORN) resulting from head and neck cancer (HNC) radiation therapy. In this study, we retrospectively compared the performance of DL algorithms and traditional machine-learning (ML) techniques to predict mandible ORN binary outcome in an extensive cohort of patients with HNC. Methods and Materials: Patients who received HNC radiation therapy at the University of Texas MD Anderson Cancer Center from 2005 to 2015 were identified for the ML (n = 1259) and DL (n = 1236) studies. The subjects were followed for ORN development for at least 12 months, with 173 developing ORN and 1086 having no evidence of ORN. The ML models used dose-volume histogram parameters to predict ORN development. These models included logistic regression, random forest, support vector machine, and a random classifier reference. The DL models were based on ResNet, DenseNet, and autoencoder-based architectures. The DL models used each participant's dose cropped to the mandible. The effect of increasing the amount of available training data on the DL models’ prediction performance was evaluated by training the DL models using increasing ratios of the original training data. Results: The F1 score for the logistic regression model, the best-performing ML model, was 0.3. The best-performing ResNet, DenseNet, and autoencoder-based models had F1 scores of 0.07, 0.14, and 0.23, respectively, whereas the random classifier's F1 score was 0.17. No performance increase was apparent when we increased the amount of training data available for DL model training. Conclusions: The ML models had superior performance to their DL counterparts. The lack of improvement in DL performance with increased training data suggests that either more data are needed for appropriate DL model construction or that the image features used in DL models are not suitable for this task.

UR - http://www.scopus.com/inward/record.url?scp=85147366664&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85147366664&partnerID=8YFLogxK

U2 - 10.1016/j.adro.2022.101163

DO - 10.1016/j.adro.2022.101163

M3 - Article

C2 - 36798732

AN - SCOPUS:85147366664

SN - 2452-1094

VL - 8

JO - Advances in Radiation Oncology

JF - Advances in Radiation Oncology

IS - 4

M1 - 101163

ER -

Comparison of Machine-Learning and Deep-Learning Methods for the Prediction of Osteoradionecrosis Resulting From Head and Neck Cancer Radiation Therapy

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this