TY - JOUR
T1 - Monitoring Variations in the Use of Automated Contouring Software
AU - Nealon, Kelly A.
AU - Han, Eun Young
AU - Kry, Stephen F.
AU - Nguyen, Callistus
AU - Pham, Mary
AU - Reed, Valerie K.
AU - Rosenthal, David
AU - Simiele, Samantha
AU - Court, Laurence E.
N1 - Publisher Copyright:
© 2023 The Authors
PY - 2024/1/1
Y1 - 2024/1/1
N2 - Purpose: Our purpose was to identify variations in the clinical use of automatically generated contours that could be attributed to software error, off-label use, or automation bias. Methods and Materials: For 500 head and neck patients who were contoured by an in-house automated contouring system, Dice similarity coefficient and added path length were calculated between the contours generated by the automated system and the final contours after editing for clinical use. Statistical process control was used and control charts were generated with control limits at 3 standard deviations. Contours that exceeded the thresholds were investigated to determine the cause. Moving mean control plots were then generated to identify dosimetrists who were editing less over time, which could be indicative of automation bias. Results: Major contouring edits were flagged for: 1.0% brain, 3.1% brain stem, 3.5% left cochlea, 2.9% right cochlea, 4.8% esophagus, 4.1% left eye, 4.0% right eye, 2.2% left lens, 4.9% right lens, 2.5% mandible, 11% left optic nerve, 6.1% right optic nerve, 3.8% left parotid, 5.9% right parotid, and 3.0% of spinal cord contours. Identified causes of editing included unexpected patient positioning, deviation from standard clinical practice, and disagreement between dosimetrist preference and automated contouring style. A statistically significant (P < .05) difference was identified between the contour editing practice of dosimetrists, with 1 dosimetrist editing more across all organs at risk. Eighteen percent (27/150) of moving mean control plots created for 5 dosimetrists indicated the amount of contour editing was decreasing over time, possibly corresponding to automation bias. Conclusions: The developed system was used to detect statistically significant edits caused by software error, unexpected clinical use, and automation bias. The increased ability to detect systematic errors that occur when editing automatically generated contours will improve the safety of the automatic treatment planning workflow.
AB - Purpose: Our purpose was to identify variations in the clinical use of automatically generated contours that could be attributed to software error, off-label use, or automation bias. Methods and Materials: For 500 head and neck patients who were contoured by an in-house automated contouring system, Dice similarity coefficient and added path length were calculated between the contours generated by the automated system and the final contours after editing for clinical use. Statistical process control was used and control charts were generated with control limits at 3 standard deviations. Contours that exceeded the thresholds were investigated to determine the cause. Moving mean control plots were then generated to identify dosimetrists who were editing less over time, which could be indicative of automation bias. Results: Major contouring edits were flagged for: 1.0% brain, 3.1% brain stem, 3.5% left cochlea, 2.9% right cochlea, 4.8% esophagus, 4.1% left eye, 4.0% right eye, 2.2% left lens, 4.9% right lens, 2.5% mandible, 11% left optic nerve, 6.1% right optic nerve, 3.8% left parotid, 5.9% right parotid, and 3.0% of spinal cord contours. Identified causes of editing included unexpected patient positioning, deviation from standard clinical practice, and disagreement between dosimetrist preference and automated contouring style. A statistically significant (P < .05) difference was identified between the contour editing practice of dosimetrists, with 1 dosimetrist editing more across all organs at risk. Eighteen percent (27/150) of moving mean control plots created for 5 dosimetrists indicated the amount of contour editing was decreasing over time, possibly corresponding to automation bias. Conclusions: The developed system was used to detect statistically significant edits caused by software error, unexpected clinical use, and automation bias. The increased ability to detect systematic errors that occur when editing automatically generated contours will improve the safety of the automatic treatment planning workflow.
UR - http://www.scopus.com/inward/record.url?scp=85175234983&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85175234983&partnerID=8YFLogxK
U2 - 10.1016/j.prro.2023.09.004
DO - 10.1016/j.prro.2023.09.004
M3 - Article
C2 - 37797883
AN - SCOPUS:85175234983
SN - 1879-8500
VL - 14
SP - e75-e85
JO - Practical radiation oncology
JF - Practical radiation oncology
IS - 1
ER -