Strengthening deep-learning models for intracranial hemorrhage detection: strongly annotated computed tomography images and model ensembles

Dong Wan Kang; Gi Hun Park; Wi Sun Ryu; Dawid Schellingerhout; Museong Kim; Yong Soo Kim; Chan Young Park; Keon Joo Lee; Moon Ku Han; Han Gil Jeong; Dong Eog Kim

doi:10.3389/fneur.2023.1321964

Strengthening deep-learning models for intracranial hemorrhage detection: strongly annotated computed tomography images and model ensembles

Dong Wan Kang, Gi Hun Park, Wi Sun Ryu, Dawid Schellingerhout, Museong Kim, Yong Soo Kim, Chan Young Park, Keon Joo Lee, Moon Ku Han, Han Gil Jeong, Dong Eog Kim

Neuroradiology

Research output: Contribution to journal › Article › peer-review

Abstract

Background and purpose: Multiple attempts at intracranial hemorrhage (ICH) detection using deep-learning techniques have been plagued by clinical failures. We aimed to compare the performance of a deep-learning algorithm for ICH detection trained on strongly and weakly annotated datasets, and to assess whether a weighted ensemble model that integrates separate models trained using datasets with different ICH improves performance. Methods: We used brain CT scans from the Radiological Society of North America (27,861 CT scans, 3,528 ICHs) and AI-Hub (53,045 CT scans, 7,013 ICHs) for training. DenseNet121, InceptionResNetV2, MobileNetV2, and VGG19 were trained on strongly and weakly annotated datasets and compared using independent external test datasets. We then developed a weighted ensemble model combining separate models trained on all ICH, subdural hemorrhage (SDH), subarachnoid hemorrhage (SAH), and small-lesion ICH cases. The final weighted ensemble model was compared to four well-known deep-learning models. After external testing, six neurologists reviewed 91 ICH cases difficult for AI and humans. Results: InceptionResNetV2, MobileNetV2, and VGG19 models outperformed when trained on strongly annotated datasets. A weighted ensemble model combining models trained on SDH, SAH, and small-lesion ICH had a higher AUC, compared with a model trained on all ICH cases only. This model outperformed four deep-learning models (AUC [95% C.I.]: Ensemble model, 0.953[0.938–0.965]; InceptionResNetV2, 0.852[0.828–0.873]; DenseNet121, 0.875[0.852–0.895]; VGG19, 0.796[0.770–0.821]; MobileNetV2, 0.650[0.620–0.680]; p < 0.0001). In addition, the case review showed that a better understanding and management of difficult cases may facilitate clinical use of ICH detection algorithms. Conclusion: We propose a weighted ensemble model for ICH detection, trained on large-scale, strongly annotated CT scans, as no model can capture all aspects of complex tasks.

Original language	English (US)
Article number	1321964
Journal	Frontiers in Neurology
Volume	14
DOIs	https://doi.org/10.3389/fneur.2023.1321964
State	Published - 2023

Keywords

deep-learning algorithm
intracranial hemorrhage (ICH)
neuroimaging
strongly annotated dataset
weighted ensemble model

ASJC Scopus subject areas

Neurology
Clinical Neurology

Access to Document

10.3389/fneur.2023.1321964

Cite this

Kang, D. W., Park, G. H., Ryu, W. S., Schellingerhout, D., Kim, M., Kim, Y. S., Park, C. Y., Lee, K. J., Han, M. K., Jeong, H. G., & Kim, D. E. (2023). Strengthening deep-learning models for intracranial hemorrhage detection: strongly annotated computed tomography images and model ensembles. Frontiers in Neurology, 14, Article 1321964. https://doi.org/10.3389/fneur.2023.1321964

@article{62a82fef419a4e6ab971aba5e6b798b2,

title = "Strengthening deep-learning models for intracranial hemorrhage detection: strongly annotated computed tomography images and model ensembles",

abstract = "Background and purpose: Multiple attempts at intracranial hemorrhage (ICH) detection using deep-learning techniques have been plagued by clinical failures. We aimed to compare the performance of a deep-learning algorithm for ICH detection trained on strongly and weakly annotated datasets, and to assess whether a weighted ensemble model that integrates separate models trained using datasets with different ICH improves performance. Methods: We used brain CT scans from the Radiological Society of North America (27,861 CT scans, 3,528 ICHs) and AI-Hub (53,045 CT scans, 7,013 ICHs) for training. DenseNet121, InceptionResNetV2, MobileNetV2, and VGG19 were trained on strongly and weakly annotated datasets and compared using independent external test datasets. We then developed a weighted ensemble model combining separate models trained on all ICH, subdural hemorrhage (SDH), subarachnoid hemorrhage (SAH), and small-lesion ICH cases. The final weighted ensemble model was compared to four well-known deep-learning models. After external testing, six neurologists reviewed 91 ICH cases difficult for AI and humans. Results: InceptionResNetV2, MobileNetV2, and VGG19 models outperformed when trained on strongly annotated datasets. A weighted ensemble model combining models trained on SDH, SAH, and small-lesion ICH had a higher AUC, compared with a model trained on all ICH cases only. This model outperformed four deep-learning models (AUC [95% C.I.]: Ensemble model, 0.953[0.938–0.965]; InceptionResNetV2, 0.852[0.828–0.873]; DenseNet121, 0.875[0.852–0.895]; VGG19, 0.796[0.770–0.821]; MobileNetV2, 0.650[0.620–0.680]; p < 0.0001). In addition, the case review showed that a better understanding and management of difficult cases may facilitate clinical use of ICH detection algorithms. Conclusion: We propose a weighted ensemble model for ICH detection, trained on large-scale, strongly annotated CT scans, as no model can capture all aspects of complex tasks.",

keywords = "deep-learning algorithm, intracranial hemorrhage (ICH), neuroimaging, strongly annotated dataset, weighted ensemble model",

author = "Kang, {Dong Wan} and Park, {Gi Hun} and Ryu, {Wi Sun} and Dawid Schellingerhout and Museong Kim and Kim, {Yong Soo} and Park, {Chan Young} and Lee, {Keon Joo} and Han, {Moon Ku} and Jeong, {Han Gil} and Kim, {Dong Eog}",

note = "Publisher Copyright: Copyright {\textcopyright} 2023 Kang, Park, Ryu, Schellingerhout, Kim, Kim, Park, Lee, Han, Jeong and Kim.",

year = "2023",

doi = "10.3389/fneur.2023.1321964",

language = "English (US)",

volume = "14",

journal = "Frontiers in Neurology",

issn = "1664-2295",

publisher = "Frontiers Research Foundation",

}

TY - JOUR

T1 - Strengthening deep-learning models for intracranial hemorrhage detection

T2 - strongly annotated computed tomography images and model ensembles

AU - Kang, Dong Wan

AU - Park, Gi Hun

AU - Ryu, Wi Sun

AU - Schellingerhout, Dawid

AU - Kim, Museong

AU - Kim, Yong Soo

AU - Park, Chan Young

AU - Lee, Keon Joo

AU - Han, Moon Ku

AU - Jeong, Han Gil

AU - Kim, Dong Eog

PY - 2023

Y1 - 2023

N2 - Background and purpose: Multiple attempts at intracranial hemorrhage (ICH) detection using deep-learning techniques have been plagued by clinical failures. We aimed to compare the performance of a deep-learning algorithm for ICH detection trained on strongly and weakly annotated datasets, and to assess whether a weighted ensemble model that integrates separate models trained using datasets with different ICH improves performance. Methods: We used brain CT scans from the Radiological Society of North America (27,861 CT scans, 3,528 ICHs) and AI-Hub (53,045 CT scans, 7,013 ICHs) for training. DenseNet121, InceptionResNetV2, MobileNetV2, and VGG19 were trained on strongly and weakly annotated datasets and compared using independent external test datasets. We then developed a weighted ensemble model combining separate models trained on all ICH, subdural hemorrhage (SDH), subarachnoid hemorrhage (SAH), and small-lesion ICH cases. The final weighted ensemble model was compared to four well-known deep-learning models. After external testing, six neurologists reviewed 91 ICH cases difficult for AI and humans. Results: InceptionResNetV2, MobileNetV2, and VGG19 models outperformed when trained on strongly annotated datasets. A weighted ensemble model combining models trained on SDH, SAH, and small-lesion ICH had a higher AUC, compared with a model trained on all ICH cases only. This model outperformed four deep-learning models (AUC [95% C.I.]: Ensemble model, 0.953[0.938–0.965]; InceptionResNetV2, 0.852[0.828–0.873]; DenseNet121, 0.875[0.852–0.895]; VGG19, 0.796[0.770–0.821]; MobileNetV2, 0.650[0.620–0.680]; p < 0.0001). In addition, the case review showed that a better understanding and management of difficult cases may facilitate clinical use of ICH detection algorithms. Conclusion: We propose a weighted ensemble model for ICH detection, trained on large-scale, strongly annotated CT scans, as no model can capture all aspects of complex tasks.

AB - Background and purpose: Multiple attempts at intracranial hemorrhage (ICH) detection using deep-learning techniques have been plagued by clinical failures. We aimed to compare the performance of a deep-learning algorithm for ICH detection trained on strongly and weakly annotated datasets, and to assess whether a weighted ensemble model that integrates separate models trained using datasets with different ICH improves performance. Methods: We used brain CT scans from the Radiological Society of North America (27,861 CT scans, 3,528 ICHs) and AI-Hub (53,045 CT scans, 7,013 ICHs) for training. DenseNet121, InceptionResNetV2, MobileNetV2, and VGG19 were trained on strongly and weakly annotated datasets and compared using independent external test datasets. We then developed a weighted ensemble model combining separate models trained on all ICH, subdural hemorrhage (SDH), subarachnoid hemorrhage (SAH), and small-lesion ICH cases. The final weighted ensemble model was compared to four well-known deep-learning models. After external testing, six neurologists reviewed 91 ICH cases difficult for AI and humans. Results: InceptionResNetV2, MobileNetV2, and VGG19 models outperformed when trained on strongly annotated datasets. A weighted ensemble model combining models trained on SDH, SAH, and small-lesion ICH had a higher AUC, compared with a model trained on all ICH cases only. This model outperformed four deep-learning models (AUC [95% C.I.]: Ensemble model, 0.953[0.938–0.965]; InceptionResNetV2, 0.852[0.828–0.873]; DenseNet121, 0.875[0.852–0.895]; VGG19, 0.796[0.770–0.821]; MobileNetV2, 0.650[0.620–0.680]; p < 0.0001). In addition, the case review showed that a better understanding and management of difficult cases may facilitate clinical use of ICH detection algorithms. Conclusion: We propose a weighted ensemble model for ICH detection, trained on large-scale, strongly annotated CT scans, as no model can capture all aspects of complex tasks.

KW - deep-learning algorithm

KW - intracranial hemorrhage (ICH)

KW - neuroimaging

KW - strongly annotated dataset

KW - weighted ensemble model

UR - http://www.scopus.com/inward/record.url?scp=85182192485&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85182192485&partnerID=8YFLogxK

U2 - 10.3389/fneur.2023.1321964

DO - 10.3389/fneur.2023.1321964

M3 - Article

C2 - 38221995

AN - SCOPUS:85182192485

SN - 1664-2295

VL - 14

JO - Frontiers in Neurology

JF - Frontiers in Neurology

M1 - 1321964

ER -

Strengthening deep-learning models for intracranial hemorrhage detection: strongly annotated computed tomography images and model ensembles

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this