TY - JOUR
T1 - Toward a standard for the evaluation of PET-Auto-Segmentation methods following the recommendations of AAPM task group No. 211
T2 - Requirements and implementation
AU - Berthon, Beatrice
AU - Spezi, Emiliano
AU - Galavis, Paulina
AU - Shepherd, Tony
AU - Apte, Aditya
AU - Hatt, Mathieu
AU - Fayad, Hadi
AU - De Bernardi, Elisabetta
AU - Soffientini, Chiara D.
AU - Ross Schmidtlein, C.
AU - El Naqa, Issam
AU - Jeraj, Robert
AU - Lu, Wei
AU - Das, Shiva
AU - Zaidi, Habib
AU - Mawlawi, Osama R.
AU - Visvikis, Dimitris
AU - Lee, John A.
AU - Kirov, Assen S.
N1 - Publisher Copyright:
© 2017 The Authors. Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.
PY - 2017/8
Y1 - 2017/8
N2 - Purpose: The aim of this paper is to define the requirements and describe the design and implementation of a standard benchmark tool for evaluation and validation of PET-auto-segmentation (PET-AS) algorithms. This work follows the recommendations of Task Group 211 (TG211) appointed by the American Association of Physicists in Medicine (AAPM). Methods: The recommendations published in the AAPM TG211 report were used to derive a set of required features and to guide the design and structure of a benchmarking software tool. These items included the selection of appropriate representative data and reference contours obtained from established approaches and the description of available metrics. The benchmark was designed in a way that it could be extendable by inclusion of bespoke segmentation methods, while maintaining its main purpose of being a standard testing platform for newly developed PET-AS methods. An example of implementation of the proposed framework, named PETASset, was built. In this work, a selection of PET-AS methods representing common approaches to PET image segmentation was evaluated within PETASset for the purpose of testing and demonstrating the capabilities of the software as a benchmark platform. Results: A selection of clinical, physical, and simulated phantom data, including "best estimates" reference contours from macroscopic specimens, simulation template, and CT scans was built into the PETASset application database. Specific metrics such as Dice Similarity Coefficient (DSC), Positive Predictive Value (PPV), and Sensitivity (S), were included to allow the user to compare the results of any given PET-AS algorithm to the reference contours. In addition, a tool to generate structured reports on the evaluation of the performance of PET-AS algorithms against the reference contours was built. The variation of the metric agreement values with the reference contours across the PET-AS methods evaluated for demonstration were between 0.51 and 0.83, 0.44 and 0.86, and 0.61 and 1.00 for DSC, PPV, and the S metric, respectively. Examples of agreement limits were provided to show how the software could be used to evaluate a new algorithm against the existing state-of-the art. Conclusions: PETASset provides a platform that allows standardizing the evaluation and comparison of different PET-AS methods on a wide range of PET datasets. The developed platform will be available to users willing to evaluate their PET-AS methods and contribute with more evaluation datasets.
AB - Purpose: The aim of this paper is to define the requirements and describe the design and implementation of a standard benchmark tool for evaluation and validation of PET-auto-segmentation (PET-AS) algorithms. This work follows the recommendations of Task Group 211 (TG211) appointed by the American Association of Physicists in Medicine (AAPM). Methods: The recommendations published in the AAPM TG211 report were used to derive a set of required features and to guide the design and structure of a benchmarking software tool. These items included the selection of appropriate representative data and reference contours obtained from established approaches and the description of available metrics. The benchmark was designed in a way that it could be extendable by inclusion of bespoke segmentation methods, while maintaining its main purpose of being a standard testing platform for newly developed PET-AS methods. An example of implementation of the proposed framework, named PETASset, was built. In this work, a selection of PET-AS methods representing common approaches to PET image segmentation was evaluated within PETASset for the purpose of testing and demonstrating the capabilities of the software as a benchmark platform. Results: A selection of clinical, physical, and simulated phantom data, including "best estimates" reference contours from macroscopic specimens, simulation template, and CT scans was built into the PETASset application database. Specific metrics such as Dice Similarity Coefficient (DSC), Positive Predictive Value (PPV), and Sensitivity (S), were included to allow the user to compare the results of any given PET-AS algorithm to the reference contours. In addition, a tool to generate structured reports on the evaluation of the performance of PET-AS algorithms against the reference contours was built. The variation of the metric agreement values with the reference contours across the PET-AS methods evaluated for demonstration were between 0.51 and 0.83, 0.44 and 0.86, and 0.61 and 1.00 for DSC, PPV, and the S metric, respectively. Examples of agreement limits were provided to show how the software could be used to evaluate a new algorithm against the existing state-of-the art. Conclusions: PETASset provides a platform that allows standardizing the evaluation and comparison of different PET-AS methods on a wide range of PET datasets. The developed platform will be available to users willing to evaluate their PET-AS methods and contribute with more evaluation datasets.
KW - PET segmentation
KW - PET/CT
KW - conformity index
KW - outlining assessment
UR - http://www.scopus.com/inward/record.url?scp=85021765648&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85021765648&partnerID=8YFLogxK
U2 - 10.1002/mp.12312
DO - 10.1002/mp.12312
M3 - Article
C2 - 28474819
AN - SCOPUS:85021765648
SN - 0094-2405
VL - 44
SP - 4098
EP - 4111
JO - Medical physics
JF - Medical physics
IS - 8
ER -