TY - JOUR
T1 - Validation of a deep learning model for traumatic brain injury detection and NIRIS grading on non-contrast CT
T2 - a multi-reader study with promising results and opportunities for improvement
AU - Jiang, Bin
AU - Ozkara, Burak Berksu
AU - Creeden, Sean
AU - Zhu, Guangming
AU - Ding, Victoria Y.
AU - Chen, Hui
AU - Lanzman, Bryan
AU - Wolman, Dylan
AU - Shams, Sara
AU - Trinh, Austin
AU - Li, Ying
AU - Khalaf, Alexander
AU - Parker, Jonathon J.
AU - Halpern, Casey H.
AU - Wintermark, Max
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2023/11
Y1 - 2023/11
N2 - Purpose: This study aimed to assess and externally validate the performance of a deep learning (DL) model for the interpretation of non-contrast computed tomography (NCCT) scans of patients with suspicion of traumatic brain injury (TBI). Methods: This retrospective and multi-reader study included patients with TBI suspicion who were transported to the emergency department and underwent NCCT scans. Eight reviewers, with varying levels of training and experience (two neuroradiology attendings, two neuroradiology fellows, two neuroradiology residents, one neurosurgery attending, and one neurosurgery resident), independently evaluated NCCT head scans. The same scans were evaluated using the version 5.0 of the DL model icobrain tbi. The establishment of the ground truth involved a thorough assessment of all accessible clinical and laboratory data, as well as follow-up imaging studies, including NCCT and magnetic resonance imaging, as a consensus amongst the study reviewers. The outcomes of interest included neuroimaging radiological interpretation system (NIRIS) scores, the presence of midline shift, mass effect, hemorrhagic lesions, hydrocephalus, and severe hydrocephalus, as well as measurements of midline shift and volumes of hemorrhagic lesions. Comparisons using weighted Cohen’s kappa coefficient were made. The McNemar test was used to compare the diagnostic performance. Bland–Altman plots were used to compare measurements. Results: One hundred patients were included, with the DL model successfully categorizing 77 scans. The median age for the total group was 48, with the omitted group having a median age of 44.5 and the included group having a median age of 48. The DL model demonstrated moderate agreement with the ground truth, trainees, and attendings. With the DL model’s assistance, trainees’ agreement with the ground truth improved. The DL model showed high specificity (0.88) and positive predictive value (0.96) in classifying NIRIS scores as 0–2 or 3–4. Trainees and attendings had the highest accuracy (0.95). The DL model’s performance in classifying various TBI CT imaging common data elements was comparable to that of trainees and attendings. The average difference for the DL model in quantifying the volume of hemorrhagic lesions was 6.0 mL with a wide 95% confidence interval (CI) of − 68.32 to 80.22, and for midline shift, the average difference was 1.4 mm with a 95% CI of − 3.4 to 6.2. Conclusion: While the DL model outperformed trainees in some aspects, attendings’ assessments remained superior in most instances. Using the DL model as an assistive tool benefited trainees, improving their NIRIS score agreement with the ground truth. Although the DL model showed high potential in classifying some TBI CT imaging common data elements, further refinement and optimization are necessary to enhance its clinical utility.
AB - Purpose: This study aimed to assess and externally validate the performance of a deep learning (DL) model for the interpretation of non-contrast computed tomography (NCCT) scans of patients with suspicion of traumatic brain injury (TBI). Methods: This retrospective and multi-reader study included patients with TBI suspicion who were transported to the emergency department and underwent NCCT scans. Eight reviewers, with varying levels of training and experience (two neuroradiology attendings, two neuroradiology fellows, two neuroradiology residents, one neurosurgery attending, and one neurosurgery resident), independently evaluated NCCT head scans. The same scans were evaluated using the version 5.0 of the DL model icobrain tbi. The establishment of the ground truth involved a thorough assessment of all accessible clinical and laboratory data, as well as follow-up imaging studies, including NCCT and magnetic resonance imaging, as a consensus amongst the study reviewers. The outcomes of interest included neuroimaging radiological interpretation system (NIRIS) scores, the presence of midline shift, mass effect, hemorrhagic lesions, hydrocephalus, and severe hydrocephalus, as well as measurements of midline shift and volumes of hemorrhagic lesions. Comparisons using weighted Cohen’s kappa coefficient were made. The McNemar test was used to compare the diagnostic performance. Bland–Altman plots were used to compare measurements. Results: One hundred patients were included, with the DL model successfully categorizing 77 scans. The median age for the total group was 48, with the omitted group having a median age of 44.5 and the included group having a median age of 48. The DL model demonstrated moderate agreement with the ground truth, trainees, and attendings. With the DL model’s assistance, trainees’ agreement with the ground truth improved. The DL model showed high specificity (0.88) and positive predictive value (0.96) in classifying NIRIS scores as 0–2 or 3–4. Trainees and attendings had the highest accuracy (0.95). The DL model’s performance in classifying various TBI CT imaging common data elements was comparable to that of trainees and attendings. The average difference for the DL model in quantifying the volume of hemorrhagic lesions was 6.0 mL with a wide 95% confidence interval (CI) of − 68.32 to 80.22, and for midline shift, the average difference was 1.4 mm with a 95% CI of − 3.4 to 6.2. Conclusion: While the DL model outperformed trainees in some aspects, attendings’ assessments remained superior in most instances. Using the DL model as an assistive tool benefited trainees, improving their NIRIS score agreement with the ground truth. Although the DL model showed high potential in classifying some TBI CT imaging common data elements, further refinement and optimization are necessary to enhance its clinical utility.
KW - Computed tomography
KW - Deep learning
KW - Neuroimaging radiological interpretation system
KW - Traumatic brain injury
KW - Validation
UR - http://www.scopus.com/inward/record.url?scp=85160838540&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85160838540&partnerID=8YFLogxK
U2 - 10.1007/s00234-023-03170-5
DO - 10.1007/s00234-023-03170-5
M3 - Article
C2 - 37269414
AN - SCOPUS:85160838540
SN - 0028-3940
VL - 65
SP - 1605
EP - 1617
JO - Neuroradiology
JF - Neuroradiology
IS - 11
ER -