Computer-aided segmentation on MRI for prostate radiotherapy, part II: Comparing human and computer observer populations and the influence of annotator variability on algorithm variability

Jeremiah W. Sanders, Henry Mok, Alexander N. Hanania, Aradhana M. Venkatesan, Chad Tang, Teresa L. Bruno, Howard D. Thames, Rajat J. Kudchadker, Steven J. Frank

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

Background and purpose: Comparing deep learning (DL) algorithms to human interobserver variability, one of the largest sources of noise in human-performed annotations, is necessary to inform the clinical application, use, and quality assurance of DL for prostate radiotherapy. Materials and methods: One hundred fourteen DL algorithms were developed on 295 prostate MRIs to segment the prostate, external urinary sphincter (EUS), seminal vesicles (SV), rectum, and bladder. Fifty prostate MRIs of 25 patients undergoing MRI-based low-dose-rate prostate brachytherapy were acquired as an independent test set. Groups of DL algorithms were created based on the loss functions used to train them, and the spatial entropy (SE) of their predictions on the 50 test MRIs was computed. Five human observers contoured the 50 test MRIs, and SE maps of their contours were compared with those of the groups of the DL algorithms. Additionally, similarity metrics were computed between DL algorithm predictions and consensus annotations of the 5 human observers’ contours of the 50 test MRIs. Results: A DL algorithm yielded statistically significantly higher similarity metrics for the prostate than did the human observers (H) (prostate Matthew's correlation coefficient, DL vs. H: planning–0.931 vs. 0.903, p < 0.001; postimplant–0.925 vs. 0.892, p < 0.001); the same was true for the 4 organs at risk. The SE maps revealed that the DL algorithms and human annotators were most variable in similar anatomical regions: the prostate-EUS, prostate-SV, prostate-rectum, and prostate-bladder junctions. Conclusions: Annotation quality is an important consideration when developing, evaluating, and using DL algorithms clinically.

Original languageEnglish (US)
Pages (from-to)132-139
Number of pages8
JournalRadiotherapy and Oncology
Volume169
DOIs
StatePublished - Apr 2022

Keywords

  • Annotation quality
  • Brachytherapy
  • Deep learning
  • MRI
  • Prostate
  • Radiation therapy
  • Segmentation

ASJC Scopus subject areas

  • Hematology
  • Oncology
  • Radiology Nuclear Medicine and imaging

Fingerprint

Dive into the research topics of 'Computer-aided segmentation on MRI for prostate radiotherapy, part II: Comparing human and computer observer populations and the influence of annotator variability on algorithm variability'. Together they form a unique fingerprint.

Cite this