Validation of a deep learning model for traumatic brain injury detection and NIRIS grading on non-contrast CT: a multi-reader study with promising results and opportunities for improvement

Bin Jiang; Burak Berksu Ozkara; Sean Creeden; Guangming Zhu; Victoria Y. Ding; Hui Chen; Bryan Lanzman; Dylan Wolman; Sara Shams; Austin Trinh; Ying Li; Alexander Khalaf; Jonathon J. Parker; Casey H. Halpern; Max Wintermark

doi:10.1007/s00234-023-03170-5

Validation of a deep learning model for traumatic brain injury detection and NIRIS grading on non-contrast CT: a multi-reader study with promising results and opportunities for improvement

Bin Jiang, Burak Berksu Ozkara, Sean Creeden, Guangming Zhu, Victoria Y. Ding, Hui Chen, Bryan Lanzman, Dylan Wolman, Sara Shams, Austin Trinh, Ying Li, Alexander Khalaf, Jonathon J. Parker, Casey H. Halpern, Max Wintermark

Neuroradiology

Research output: Contribution to journal › Article › peer-review

Abstract

Purpose: This study aimed to assess and externally validate the performance of a deep learning (DL) model for the interpretation of non-contrast computed tomography (NCCT) scans of patients with suspicion of traumatic brain injury (TBI). Methods: This retrospective and multi-reader study included patients with TBI suspicion who were transported to the emergency department and underwent NCCT scans. Eight reviewers, with varying levels of training and experience (two neuroradiology attendings, two neuroradiology fellows, two neuroradiology residents, one neurosurgery attending, and one neurosurgery resident), independently evaluated NCCT head scans. The same scans were evaluated using the version 5.0 of the DL model icobrain tbi. The establishment of the ground truth involved a thorough assessment of all accessible clinical and laboratory data, as well as follow-up imaging studies, including NCCT and magnetic resonance imaging, as a consensus amongst the study reviewers. The outcomes of interest included neuroimaging radiological interpretation system (NIRIS) scores, the presence of midline shift, mass effect, hemorrhagic lesions, hydrocephalus, and severe hydrocephalus, as well as measurements of midline shift and volumes of hemorrhagic lesions. Comparisons using weighted Cohen’s kappa coefficient were made. The McNemar test was used to compare the diagnostic performance. Bland–Altman plots were used to compare measurements. Results: One hundred patients were included, with the DL model successfully categorizing 77 scans. The median age for the total group was 48, with the omitted group having a median age of 44.5 and the included group having a median age of 48. The DL model demonstrated moderate agreement with the ground truth, trainees, and attendings. With the DL model’s assistance, trainees’ agreement with the ground truth improved. The DL model showed high specificity (0.88) and positive predictive value (0.96) in classifying NIRIS scores as 0–2 or 3–4. Trainees and attendings had the highest accuracy (0.95). The DL model’s performance in classifying various TBI CT imaging common data elements was comparable to that of trainees and attendings. The average difference for the DL model in quantifying the volume of hemorrhagic lesions was 6.0 mL with a wide 95% confidence interval (CI) of − 68.32 to 80.22, and for midline shift, the average difference was 1.4 mm with a 95% CI of − 3.4 to 6.2. Conclusion: While the DL model outperformed trainees in some aspects, attendings’ assessments remained superior in most instances. Using the DL model as an assistive tool benefited trainees, improving their NIRIS score agreement with the ground truth. Although the DL model showed high potential in classifying some TBI CT imaging common data elements, further refinement and optimization are necessary to enhance its clinical utility.

Original language	English (US)
Pages (from-to)	1605-1617
Number of pages	13
Journal	Neuroradiology
Volume	65
Issue number	11
DOIs	https://doi.org/10.1007/s00234-023-03170-5
State	Published - Nov 2023

Keywords

Computed tomography
Deep learning
Neuroimaging radiological interpretation system
Traumatic brain injury
Validation

ASJC Scopus subject areas

Radiology Nuclear Medicine and imaging
Clinical Neurology
Cardiology and Cardiovascular Medicine

Access to Document

10.1007/s00234-023-03170-5

Cite this

Jiang, B., Ozkara, B. B., Creeden, S., Zhu, G., Ding, V. Y., Chen, H., Lanzman, B., Wolman, D., Shams, S., Trinh, A., Li, Y., Khalaf, A., Parker, J. J., Halpern, C. H., & Wintermark, M. (2023). Validation of a deep learning model for traumatic brain injury detection and NIRIS grading on non-contrast CT: a multi-reader study with promising results and opportunities for improvement. Neuroradiology, 65(11), 1605-1617. https://doi.org/10.1007/s00234-023-03170-5

Validation of a deep learning model for traumatic brain injury detection and NIRIS grading on non-contrast CT: a multi-reader study with promising results and opportunities for improvement. / Jiang, Bin; Ozkara, Burak Berksu; Creeden, Sean et al.
In: Neuroradiology, Vol. 65, No. 11, 11.2023, p. 1605-1617.

Research output: Contribution to journal › Article › peer-review

Jiang, B, Ozkara, BB, Creeden, S, Zhu, G, Ding, VY, Chen, H, Lanzman, B, Wolman, D, Shams, S, Trinh, A, Li, Y, Khalaf, A, Parker, JJ, Halpern, CH & Wintermark, M 2023, 'Validation of a deep learning model for traumatic brain injury detection and NIRIS grading on non-contrast CT: a multi-reader study with promising results and opportunities for improvement', Neuroradiology, vol. 65, no. 11, pp. 1605-1617. https://doi.org/10.1007/s00234-023-03170-5

@article{bd4b6dea2587451c9656165b55027cc2,

title = "Validation of a deep learning model for traumatic brain injury detection and NIRIS grading on non-contrast CT: a multi-reader study with promising results and opportunities for improvement",

abstract = "Purpose: This study aimed to assess and externally validate the performance of a deep learning (DL) model for the interpretation of non-contrast computed tomography (NCCT) scans of patients with suspicion of traumatic brain injury (TBI). Methods: This retrospective and multi-reader study included patients with TBI suspicion who were transported to the emergency department and underwent NCCT scans. Eight reviewers, with varying levels of training and experience (two neuroradiology attendings, two neuroradiology fellows, two neuroradiology residents, one neurosurgery attending, and one neurosurgery resident), independently evaluated NCCT head scans. The same scans were evaluated using the version 5.0 of the DL model icobrain tbi. The establishment of the ground truth involved a thorough assessment of all accessible clinical and laboratory data, as well as follow-up imaging studies, including NCCT and magnetic resonance imaging, as a consensus amongst the study reviewers. The outcomes of interest included neuroimaging radiological interpretation system (NIRIS) scores, the presence of midline shift, mass effect, hemorrhagic lesions, hydrocephalus, and severe hydrocephalus, as well as measurements of midline shift and volumes of hemorrhagic lesions. Comparisons using weighted Cohen{\textquoteright}s kappa coefficient were made. The McNemar test was used to compare the diagnostic performance. Bland–Altman plots were used to compare measurements. Results: One hundred patients were included, with the DL model successfully categorizing 77 scans. The median age for the total group was 48, with the omitted group having a median age of 44.5 and the included group having a median age of 48. The DL model demonstrated moderate agreement with the ground truth, trainees, and attendings. With the DL model{\textquoteright}s assistance, trainees{\textquoteright} agreement with the ground truth improved. The DL model showed high specificity (0.88) and positive predictive value (0.96) in classifying NIRIS scores as 0–2 or 3–4. Trainees and attendings had the highest accuracy (0.95). The DL model{\textquoteright}s performance in classifying various TBI CT imaging common data elements was comparable to that of trainees and attendings. The average difference for the DL model in quantifying the volume of hemorrhagic lesions was 6.0 mL with a wide 95% confidence interval (CI) of − 68.32 to 80.22, and for midline shift, the average difference was 1.4 mm with a 95% CI of − 3.4 to 6.2. Conclusion: While the DL model outperformed trainees in some aspects, attendings{\textquoteright} assessments remained superior in most instances. Using the DL model as an assistive tool benefited trainees, improving their NIRIS score agreement with the ground truth. Although the DL model showed high potential in classifying some TBI CT imaging common data elements, further refinement and optimization are necessary to enhance its clinical utility.",

keywords = "Computed tomography, Deep learning, Neuroimaging radiological interpretation system, Traumatic brain injury, Validation",

author = "Bin Jiang and Ozkara, {Burak Berksu} and Sean Creeden and Guangming Zhu and Ding, {Victoria Y.} and Hui Chen and Bryan Lanzman and Dylan Wolman and Sara Shams and Austin Trinh and Ying Li and Alexander Khalaf and Parker, {Jonathon J.} and Halpern, {Casey H.} and Max Wintermark",

note = "Publisher Copyright: {\textcopyright} 2023, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.",

year = "2023",

month = nov,

doi = "10.1007/s00234-023-03170-5",

language = "English (US)",

volume = "65",

pages = "1605--1617",

journal = "Neuroradiology",

issn = "0028-3940",

publisher = "Springer Verlag",

number = "11",

}

TY - JOUR

T1 - Validation of a deep learning model for traumatic brain injury detection and NIRIS grading on non-contrast CT

T2 - a multi-reader study with promising results and opportunities for improvement

AU - Jiang, Bin

AU - Ozkara, Burak Berksu

AU - Creeden, Sean

AU - Zhu, Guangming

AU - Ding, Victoria Y.

AU - Chen, Hui

AU - Lanzman, Bryan

AU - Wolman, Dylan

AU - Shams, Sara

AU - Trinh, Austin

AU - Li, Ying

AU - Khalaf, Alexander

AU - Parker, Jonathon J.

AU - Halpern, Casey H.

AU - Wintermark, Max

PY - 2023/11

Y1 - 2023/11

N2 - Purpose: This study aimed to assess and externally validate the performance of a deep learning (DL) model for the interpretation of non-contrast computed tomography (NCCT) scans of patients with suspicion of traumatic brain injury (TBI). Methods: This retrospective and multi-reader study included patients with TBI suspicion who were transported to the emergency department and underwent NCCT scans. Eight reviewers, with varying levels of training and experience (two neuroradiology attendings, two neuroradiology fellows, two neuroradiology residents, one neurosurgery attending, and one neurosurgery resident), independently evaluated NCCT head scans. The same scans were evaluated using the version 5.0 of the DL model icobrain tbi. The establishment of the ground truth involved a thorough assessment of all accessible clinical and laboratory data, as well as follow-up imaging studies, including NCCT and magnetic resonance imaging, as a consensus amongst the study reviewers. The outcomes of interest included neuroimaging radiological interpretation system (NIRIS) scores, the presence of midline shift, mass effect, hemorrhagic lesions, hydrocephalus, and severe hydrocephalus, as well as measurements of midline shift and volumes of hemorrhagic lesions. Comparisons using weighted Cohen’s kappa coefficient were made. The McNemar test was used to compare the diagnostic performance. Bland–Altman plots were used to compare measurements. Results: One hundred patients were included, with the DL model successfully categorizing 77 scans. The median age for the total group was 48, with the omitted group having a median age of 44.5 and the included group having a median age of 48. The DL model demonstrated moderate agreement with the ground truth, trainees, and attendings. With the DL model’s assistance, trainees’ agreement with the ground truth improved. The DL model showed high specificity (0.88) and positive predictive value (0.96) in classifying NIRIS scores as 0–2 or 3–4. Trainees and attendings had the highest accuracy (0.95). The DL model’s performance in classifying various TBI CT imaging common data elements was comparable to that of trainees and attendings. The average difference for the DL model in quantifying the volume of hemorrhagic lesions was 6.0 mL with a wide 95% confidence interval (CI) of − 68.32 to 80.22, and for midline shift, the average difference was 1.4 mm with a 95% CI of − 3.4 to 6.2. Conclusion: While the DL model outperformed trainees in some aspects, attendings’ assessments remained superior in most instances. Using the DL model as an assistive tool benefited trainees, improving their NIRIS score agreement with the ground truth. Although the DL model showed high potential in classifying some TBI CT imaging common data elements, further refinement and optimization are necessary to enhance its clinical utility.

AB - Purpose: This study aimed to assess and externally validate the performance of a deep learning (DL) model for the interpretation of non-contrast computed tomography (NCCT) scans of patients with suspicion of traumatic brain injury (TBI). Methods: This retrospective and multi-reader study included patients with TBI suspicion who were transported to the emergency department and underwent NCCT scans. Eight reviewers, with varying levels of training and experience (two neuroradiology attendings, two neuroradiology fellows, two neuroradiology residents, one neurosurgery attending, and one neurosurgery resident), independently evaluated NCCT head scans. The same scans were evaluated using the version 5.0 of the DL model icobrain tbi. The establishment of the ground truth involved a thorough assessment of all accessible clinical and laboratory data, as well as follow-up imaging studies, including NCCT and magnetic resonance imaging, as a consensus amongst the study reviewers. The outcomes of interest included neuroimaging radiological interpretation system (NIRIS) scores, the presence of midline shift, mass effect, hemorrhagic lesions, hydrocephalus, and severe hydrocephalus, as well as measurements of midline shift and volumes of hemorrhagic lesions. Comparisons using weighted Cohen’s kappa coefficient were made. The McNemar test was used to compare the diagnostic performance. Bland–Altman plots were used to compare measurements. Results: One hundred patients were included, with the DL model successfully categorizing 77 scans. The median age for the total group was 48, with the omitted group having a median age of 44.5 and the included group having a median age of 48. The DL model demonstrated moderate agreement with the ground truth, trainees, and attendings. With the DL model’s assistance, trainees’ agreement with the ground truth improved. The DL model showed high specificity (0.88) and positive predictive value (0.96) in classifying NIRIS scores as 0–2 or 3–4. Trainees and attendings had the highest accuracy (0.95). The DL model’s performance in classifying various TBI CT imaging common data elements was comparable to that of trainees and attendings. The average difference for the DL model in quantifying the volume of hemorrhagic lesions was 6.0 mL with a wide 95% confidence interval (CI) of − 68.32 to 80.22, and for midline shift, the average difference was 1.4 mm with a 95% CI of − 3.4 to 6.2. Conclusion: While the DL model outperformed trainees in some aspects, attendings’ assessments remained superior in most instances. Using the DL model as an assistive tool benefited trainees, improving their NIRIS score agreement with the ground truth. Although the DL model showed high potential in classifying some TBI CT imaging common data elements, further refinement and optimization are necessary to enhance its clinical utility.

KW - Computed tomography

KW - Deep learning

KW - Neuroimaging radiological interpretation system

KW - Traumatic brain injury

KW - Validation

UR - http://www.scopus.com/inward/record.url?scp=85160838540&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85160838540&partnerID=8YFLogxK

U2 - 10.1007/s00234-023-03170-5

DO - 10.1007/s00234-023-03170-5

M3 - Article

C2 - 37269414

AN - SCOPUS:85160838540

SN - 0028-3940

VL - 65

SP - 1605

EP - 1617

JO - Neuroradiology

JF - Neuroradiology

IS - 11

ER -

Validation of a deep learning model for traumatic brain injury detection and NIRIS grading on non-contrast CT: a multi-reader study with promising results and opportunities for improvement

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this