Toward a standard for the evaluation of PET-Auto-Segmentation methods following the recommendations of AAPM task group No. 211: Requirements and implementation

Beatrice Berthon; Emiliano Spezi; Paulina Galavis; Tony Shepherd; Aditya Apte; Mathieu Hatt; Hadi Fayad; Elisabetta De Bernardi; Chiara D. Soffientini; C. Ross Schmidtlein; Issam El Naqa; Robert Jeraj; Wei Lu; Shiva Das; Habib Zaidi; Osama R. Mawlawi; Dimitris Visvikis; John A. Lee; Assen S. Kirov

doi:10.1002/mp.12312

Toward a standard for the evaluation of PET-Auto-Segmentation methods following the recommendations of AAPM task group No. 211: Requirements and implementation

Beatrice Berthon, Emiliano Spezi, Paulina Galavis, Tony Shepherd, Aditya Apte, Mathieu Hatt, Hadi Fayad, Elisabetta De Bernardi, Chiara D. Soffientini, C. Ross Schmidtlein, Issam El Naqa, Robert Jeraj, Wei Lu, Shiva Das, Habib Zaidi, Osama R. Mawlawi, Dimitris Visvikis, John A. Lee, Assen S. Kirov

Imaging Physics

Research output: Contribution to journal › Article › peer-review

31 Scopus citations

Abstract

Purpose: The aim of this paper is to define the requirements and describe the design and implementation of a standard benchmark tool for evaluation and validation of PET-auto-segmentation (PET-AS) algorithms. This work follows the recommendations of Task Group 211 (TG211) appointed by the American Association of Physicists in Medicine (AAPM). Methods: The recommendations published in the AAPM TG211 report were used to derive a set of required features and to guide the design and structure of a benchmarking software tool. These items included the selection of appropriate representative data and reference contours obtained from established approaches and the description of available metrics. The benchmark was designed in a way that it could be extendable by inclusion of bespoke segmentation methods, while maintaining its main purpose of being a standard testing platform for newly developed PET-AS methods. An example of implementation of the proposed framework, named PETASset, was built. In this work, a selection of PET-AS methods representing common approaches to PET image segmentation was evaluated within PETASset for the purpose of testing and demonstrating the capabilities of the software as a benchmark platform. Results: A selection of clinical, physical, and simulated phantom data, including "best estimates" reference contours from macroscopic specimens, simulation template, and CT scans was built into the PETASset application database. Specific metrics such as Dice Similarity Coefficient (DSC), Positive Predictive Value (PPV), and Sensitivity (S), were included to allow the user to compare the results of any given PET-AS algorithm to the reference contours. In addition, a tool to generate structured reports on the evaluation of the performance of PET-AS algorithms against the reference contours was built. The variation of the metric agreement values with the reference contours across the PET-AS methods evaluated for demonstration were between 0.51 and 0.83, 0.44 and 0.86, and 0.61 and 1.00 for DSC, PPV, and the S metric, respectively. Examples of agreement limits were provided to show how the software could be used to evaluate a new algorithm against the existing state-of-the art. Conclusions: PETASset provides a platform that allows standardizing the evaluation and comparison of different PET-AS methods on a wide range of PET datasets. The developed platform will be available to users willing to evaluate their PET-AS methods and contribute with more evaluation datasets.

Original language	English (US)
Pages (from-to)	4098-4111
Number of pages	14
Journal	Medical physics
Volume	44
Issue number	8
DOIs	https://doi.org/10.1002/mp.12312
State	Published - Aug 2017

Keywords

PET segmentation
PET/CT
conformity index
outlining assessment

ASJC Scopus subject areas

Biophysics
Radiology Nuclear Medicine and imaging

Access to Document

10.1002/mp.12312

Cite this

Berthon, B., Spezi, E., Galavis, P., Shepherd, T., Apte, A., Hatt, M., Fayad, H., De Bernardi, E., Soffientini, C. D., Ross Schmidtlein, C., El Naqa, I., Jeraj, R., Lu, W., Das, S., Zaidi, H., Mawlawi, O. R., Visvikis, D., Lee, J. A., & Kirov, A. S. (2017). Toward a standard for the evaluation of PET-Auto-Segmentation methods following the recommendations of AAPM task group No. 211: Requirements and implementation. Medical physics, 44(8), 4098-4111. https://doi.org/10.1002/mp.12312

Berthon, B, Spezi, E, Galavis, P, Shepherd, T, Apte, A, Hatt, M, Fayad, H, De Bernardi, E, Soffientini, CD, Ross Schmidtlein, C, El Naqa, I, Jeraj, R, Lu, W, Das, S, Zaidi, H, Mawlawi, OR, Visvikis, D, Lee, JA & Kirov, AS 2017, 'Toward a standard for the evaluation of PET-Auto-Segmentation methods following the recommendations of AAPM task group No. 211: Requirements and implementation', Medical physics, vol. 44, no. 8, pp. 4098-4111. https://doi.org/10.1002/mp.12312

@article{e533c253e67142f38c614d1937ba96f5,

title = "Toward a standard for the evaluation of PET-Auto-Segmentation methods following the recommendations of AAPM task group No. 211: Requirements and implementation",

abstract = "Purpose: The aim of this paper is to define the requirements and describe the design and implementation of a standard benchmark tool for evaluation and validation of PET-auto-segmentation (PET-AS) algorithms. This work follows the recommendations of Task Group 211 (TG211) appointed by the American Association of Physicists in Medicine (AAPM). Methods: The recommendations published in the AAPM TG211 report were used to derive a set of required features and to guide the design and structure of a benchmarking software tool. These items included the selection of appropriate representative data and reference contours obtained from established approaches and the description of available metrics. The benchmark was designed in a way that it could be extendable by inclusion of bespoke segmentation methods, while maintaining its main purpose of being a standard testing platform for newly developed PET-AS methods. An example of implementation of the proposed framework, named PETASset, was built. In this work, a selection of PET-AS methods representing common approaches to PET image segmentation was evaluated within PETASset for the purpose of testing and demonstrating the capabilities of the software as a benchmark platform. Results: A selection of clinical, physical, and simulated phantom data, including {"}best estimates{"} reference contours from macroscopic specimens, simulation template, and CT scans was built into the PETASset application database. Specific metrics such as Dice Similarity Coefficient (DSC), Positive Predictive Value (PPV), and Sensitivity (S), were included to allow the user to compare the results of any given PET-AS algorithm to the reference contours. In addition, a tool to generate structured reports on the evaluation of the performance of PET-AS algorithms against the reference contours was built. The variation of the metric agreement values with the reference contours across the PET-AS methods evaluated for demonstration were between 0.51 and 0.83, 0.44 and 0.86, and 0.61 and 1.00 for DSC, PPV, and the S metric, respectively. Examples of agreement limits were provided to show how the software could be used to evaluate a new algorithm against the existing state-of-the art. Conclusions: PETASset provides a platform that allows standardizing the evaluation and comparison of different PET-AS methods on a wide range of PET datasets. The developed platform will be available to users willing to evaluate their PET-AS methods and contribute with more evaluation datasets.",

keywords = "PET segmentation, PET/CT, conformity index, outlining assessment",

author = "Beatrice Berthon and Emiliano Spezi and Paulina Galavis and Tony Shepherd and Aditya Apte and Mathieu Hatt and Hadi Fayad and {De Bernardi}, Elisabetta and Soffientini, {Chiara D.} and {Ross Schmidtlein}, C. and {El Naqa}, Issam and Robert Jeraj and Wei Lu and Shiva Das and Habib Zaidi and Mawlawi, {Osama R.} and Dimitris Visvikis and Lee, {John A.} and Kirov, {Assen S.}",

note = "Publisher Copyright: {\textcopyright} 2017 The Authors. Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.",

year = "2017",

month = aug,

doi = "10.1002/mp.12312",

language = "English (US)",

volume = "44",

pages = "4098--4111",

journal = "Medical physics",

issn = "0094-2405",

publisher = "AAPM - American Association of Physicists in Medicine",

number = "8",

}

TY - JOUR

T1 - Toward a standard for the evaluation of PET-Auto-Segmentation methods following the recommendations of AAPM task group No. 211

T2 - Requirements and implementation

AU - Berthon, Beatrice

AU - Spezi, Emiliano

AU - Galavis, Paulina

AU - Shepherd, Tony

AU - Apte, Aditya

AU - Hatt, Mathieu

AU - Fayad, Hadi

AU - De Bernardi, Elisabetta

AU - Soffientini, Chiara D.

AU - Ross Schmidtlein, C.

AU - El Naqa, Issam

AU - Jeraj, Robert

AU - Lu, Wei

AU - Das, Shiva

AU - Zaidi, Habib

AU - Mawlawi, Osama R.

AU - Visvikis, Dimitris

AU - Lee, John A.

AU - Kirov, Assen S.

PY - 2017/8

Y1 - 2017/8

N2 - Purpose: The aim of this paper is to define the requirements and describe the design and implementation of a standard benchmark tool for evaluation and validation of PET-auto-segmentation (PET-AS) algorithms. This work follows the recommendations of Task Group 211 (TG211) appointed by the American Association of Physicists in Medicine (AAPM). Methods: The recommendations published in the AAPM TG211 report were used to derive a set of required features and to guide the design and structure of a benchmarking software tool. These items included the selection of appropriate representative data and reference contours obtained from established approaches and the description of available metrics. The benchmark was designed in a way that it could be extendable by inclusion of bespoke segmentation methods, while maintaining its main purpose of being a standard testing platform for newly developed PET-AS methods. An example of implementation of the proposed framework, named PETASset, was built. In this work, a selection of PET-AS methods representing common approaches to PET image segmentation was evaluated within PETASset for the purpose of testing and demonstrating the capabilities of the software as a benchmark platform. Results: A selection of clinical, physical, and simulated phantom data, including "best estimates" reference contours from macroscopic specimens, simulation template, and CT scans was built into the PETASset application database. Specific metrics such as Dice Similarity Coefficient (DSC), Positive Predictive Value (PPV), and Sensitivity (S), were included to allow the user to compare the results of any given PET-AS algorithm to the reference contours. In addition, a tool to generate structured reports on the evaluation of the performance of PET-AS algorithms against the reference contours was built. The variation of the metric agreement values with the reference contours across the PET-AS methods evaluated for demonstration were between 0.51 and 0.83, 0.44 and 0.86, and 0.61 and 1.00 for DSC, PPV, and the S metric, respectively. Examples of agreement limits were provided to show how the software could be used to evaluate a new algorithm against the existing state-of-the art. Conclusions: PETASset provides a platform that allows standardizing the evaluation and comparison of different PET-AS methods on a wide range of PET datasets. The developed platform will be available to users willing to evaluate their PET-AS methods and contribute with more evaluation datasets.

AB - Purpose: The aim of this paper is to define the requirements and describe the design and implementation of a standard benchmark tool for evaluation and validation of PET-auto-segmentation (PET-AS) algorithms. This work follows the recommendations of Task Group 211 (TG211) appointed by the American Association of Physicists in Medicine (AAPM). Methods: The recommendations published in the AAPM TG211 report were used to derive a set of required features and to guide the design and structure of a benchmarking software tool. These items included the selection of appropriate representative data and reference contours obtained from established approaches and the description of available metrics. The benchmark was designed in a way that it could be extendable by inclusion of bespoke segmentation methods, while maintaining its main purpose of being a standard testing platform for newly developed PET-AS methods. An example of implementation of the proposed framework, named PETASset, was built. In this work, a selection of PET-AS methods representing common approaches to PET image segmentation was evaluated within PETASset for the purpose of testing and demonstrating the capabilities of the software as a benchmark platform. Results: A selection of clinical, physical, and simulated phantom data, including "best estimates" reference contours from macroscopic specimens, simulation template, and CT scans was built into the PETASset application database. Specific metrics such as Dice Similarity Coefficient (DSC), Positive Predictive Value (PPV), and Sensitivity (S), were included to allow the user to compare the results of any given PET-AS algorithm to the reference contours. In addition, a tool to generate structured reports on the evaluation of the performance of PET-AS algorithms against the reference contours was built. The variation of the metric agreement values with the reference contours across the PET-AS methods evaluated for demonstration were between 0.51 and 0.83, 0.44 and 0.86, and 0.61 and 1.00 for DSC, PPV, and the S metric, respectively. Examples of agreement limits were provided to show how the software could be used to evaluate a new algorithm against the existing state-of-the art. Conclusions: PETASset provides a platform that allows standardizing the evaluation and comparison of different PET-AS methods on a wide range of PET datasets. The developed platform will be available to users willing to evaluate their PET-AS methods and contribute with more evaluation datasets.

KW - PET segmentation

KW - PET/CT

KW - conformity index

KW - outlining assessment

UR - http://www.scopus.com/inward/record.url?scp=85021765648&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85021765648&partnerID=8YFLogxK

U2 - 10.1002/mp.12312

DO - 10.1002/mp.12312

M3 - Article

C2 - 28474819

AN - SCOPUS:85021765648

SN - 0094-2405

VL - 44

SP - 4098

EP - 4111

JO - Medical physics

JF - Medical physics

IS - 8

ER -

Toward a standard for the evaluation of PET-Auto-Segmentation methods following the recommendations of AAPM task group No. 211: Requirements and implementation

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this