Deep learning outperformed 11 pathologists in the classification of histopathological melanoma images

A. Hekler, Jochen S. Utikal, Alexander H. Enk, Wiebke Solass, Max Schmitt, Joachim Klode, Dirk Schadendorf, Wiebke Sondermann, C. Franklin, F. Bestvater, Michael J. Flaig, Dieter Krahl, Christof von Kalle, Stefan Fröhling, Titus J. Brinker

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Background: The diagnosis of most cancers is made by a board-certified pathologist based on a tissue biopsy under the microscope. Recent research reveals a high discordance between individual pathologists. For melanoma, the literature reports on 25–26% of discordance for classifying a benign nevus versus malignant melanoma. A recent study indicated the potential of deep learning to lower these discordances. However, the performance of deep learning in classifying histopathologic melanoma images was never compared directly to human experts. The aim of this study is to perform such a first direct comparison. Methods: A total of 695 lesions were classified by an expert histopathologist in accordance with current guidelines (350 nevi/345 melanoma). Only the haematoxylin & eosin (H&E) slides of these lesions were digitalised via a slide scanner and then randomly cropped. A total of 595 of the resulting images were used to train a convolutional neural network (CNN). The additional 100 H&E image sections were used to test the results of the CNN in comparison to 11 histopathologists. Three combined McNemar tests comparing the results of the CNNs test runs in terms of sensitivity, specificity and accuracy were predefined to test for significance (p < 0.05). Findings: The CNN achieved a mean sensitivity/specificity/accuracy of 76%/60%/68% over 11 test runs. In comparison, the 11 pathologists achieved a mean sensitivity/specificity/accuracy of 51.8%/66.5%/59.2%. Thus, the CNN was significantly (p = 0.016) superior in classifying the cropped images. Interpretation: With limited image information available, a CNN was able to outperform 11 histopathologists in the classification of histopathological melanoma images and thus shows promise to assist human melanoma diagnoses.

Original languageEnglish (US)
Pages (from-to)91-96
Number of pages6
JournalEuropean Journal of Cancer
Volume118
DOIs
StatePublished - Sep 2019

Fingerprint

Melanoma
Learning
Sensitivity and Specificity
Nevi and Melanomas
Nevus
Hematoxylin
Eosine Yellowish-(YS)
Pathologists
Guidelines
Biopsy
Research
Neoplasms

Keywords

  • Artificial intelligence
  • Deep learning
  • Histopathology
  • Melanoma
  • Pathology

ASJC Scopus subject areas

  • Oncology
  • Cancer Research

Cite this

Hekler, A., Utikal, J. S., Enk, A. H., Solass, W., Schmitt, M., Klode, J., ... Brinker, T. J. (2019). Deep learning outperformed 11 pathologists in the classification of histopathological melanoma images. European Journal of Cancer, 118, 91-96. https://doi.org/10.1016/j.ejca.2019.06.012

Deep learning outperformed 11 pathologists in the classification of histopathological melanoma images. / Hekler, A.; Utikal, Jochen S.; Enk, Alexander H.; Solass, Wiebke; Schmitt, Max; Klode, Joachim; Schadendorf, Dirk; Sondermann, Wiebke; Franklin, C.; Bestvater, F.; Flaig, Michael J.; Krahl, Dieter; von Kalle, Christof; Fröhling, Stefan; Brinker, Titus J.

In: European Journal of Cancer, Vol. 118, 09.2019, p. 91-96.

Research output: Contribution to journalArticle

Hekler, A, Utikal, JS, Enk, AH, Solass, W, Schmitt, M, Klode, J, Schadendorf, D, Sondermann, W, Franklin, C, Bestvater, F, Flaig, MJ, Krahl, D, von Kalle, C, Fröhling, S & Brinker, TJ 2019, 'Deep learning outperformed 11 pathologists in the classification of histopathological melanoma images', European Journal of Cancer, vol. 118, pp. 91-96. https://doi.org/10.1016/j.ejca.2019.06.012
Hekler, A. ; Utikal, Jochen S. ; Enk, Alexander H. ; Solass, Wiebke ; Schmitt, Max ; Klode, Joachim ; Schadendorf, Dirk ; Sondermann, Wiebke ; Franklin, C. ; Bestvater, F. ; Flaig, Michael J. ; Krahl, Dieter ; von Kalle, Christof ; Fröhling, Stefan ; Brinker, Titus J. / Deep learning outperformed 11 pathologists in the classification of histopathological melanoma images. In: European Journal of Cancer. 2019 ; Vol. 118. pp. 91-96.
@article{96308b02bc4a47b985df025871368c30,
title = "Deep learning outperformed 11 pathologists in the classification of histopathological melanoma images",
abstract = "Background: The diagnosis of most cancers is made by a board-certified pathologist based on a tissue biopsy under the microscope. Recent research reveals a high discordance between individual pathologists. For melanoma, the literature reports on 25–26{\%} of discordance for classifying a benign nevus versus malignant melanoma. A recent study indicated the potential of deep learning to lower these discordances. However, the performance of deep learning in classifying histopathologic melanoma images was never compared directly to human experts. The aim of this study is to perform such a first direct comparison. Methods: A total of 695 lesions were classified by an expert histopathologist in accordance with current guidelines (350 nevi/345 melanoma). Only the haematoxylin & eosin (H&E) slides of these lesions were digitalised via a slide scanner and then randomly cropped. A total of 595 of the resulting images were used to train a convolutional neural network (CNN). The additional 100 H&E image sections were used to test the results of the CNN in comparison to 11 histopathologists. Three combined McNemar tests comparing the results of the CNNs test runs in terms of sensitivity, specificity and accuracy were predefined to test for significance (p < 0.05). Findings: The CNN achieved a mean sensitivity/specificity/accuracy of 76{\%}/60{\%}/68{\%} over 11 test runs. In comparison, the 11 pathologists achieved a mean sensitivity/specificity/accuracy of 51.8{\%}/66.5{\%}/59.2{\%}. Thus, the CNN was significantly (p = 0.016) superior in classifying the cropped images. Interpretation: With limited image information available, a CNN was able to outperform 11 histopathologists in the classification of histopathological melanoma images and thus shows promise to assist human melanoma diagnoses.",
keywords = "Artificial intelligence, Deep learning, Histopathology, Melanoma, Pathology",
author = "A. Hekler and Utikal, {Jochen S.} and Enk, {Alexander H.} and Wiebke Solass and Max Schmitt and Joachim Klode and Dirk Schadendorf and Wiebke Sondermann and C. Franklin and F. Bestvater and Flaig, {Michael J.} and Dieter Krahl and {von Kalle}, Christof and Stefan Fr{\"o}hling and Brinker, {Titus J.}",
year = "2019",
month = "9",
doi = "10.1016/j.ejca.2019.06.012",
language = "English (US)",
volume = "118",
pages = "91--96",
journal = "European Journal of Cancer",
issn = "0959-8049",
publisher = "Elsevier Limited",

}

TY - JOUR

T1 - Deep learning outperformed 11 pathologists in the classification of histopathological melanoma images

AU - Hekler, A.

AU - Utikal, Jochen S.

AU - Enk, Alexander H.

AU - Solass, Wiebke

AU - Schmitt, Max

AU - Klode, Joachim

AU - Schadendorf, Dirk

AU - Sondermann, Wiebke

AU - Franklin, C.

AU - Bestvater, F.

AU - Flaig, Michael J.

AU - Krahl, Dieter

AU - von Kalle, Christof

AU - Fröhling, Stefan

AU - Brinker, Titus J.

PY - 2019/9

Y1 - 2019/9

N2 - Background: The diagnosis of most cancers is made by a board-certified pathologist based on a tissue biopsy under the microscope. Recent research reveals a high discordance between individual pathologists. For melanoma, the literature reports on 25–26% of discordance for classifying a benign nevus versus malignant melanoma. A recent study indicated the potential of deep learning to lower these discordances. However, the performance of deep learning in classifying histopathologic melanoma images was never compared directly to human experts. The aim of this study is to perform such a first direct comparison. Methods: A total of 695 lesions were classified by an expert histopathologist in accordance with current guidelines (350 nevi/345 melanoma). Only the haematoxylin & eosin (H&E) slides of these lesions were digitalised via a slide scanner and then randomly cropped. A total of 595 of the resulting images were used to train a convolutional neural network (CNN). The additional 100 H&E image sections were used to test the results of the CNN in comparison to 11 histopathologists. Three combined McNemar tests comparing the results of the CNNs test runs in terms of sensitivity, specificity and accuracy were predefined to test for significance (p < 0.05). Findings: The CNN achieved a mean sensitivity/specificity/accuracy of 76%/60%/68% over 11 test runs. In comparison, the 11 pathologists achieved a mean sensitivity/specificity/accuracy of 51.8%/66.5%/59.2%. Thus, the CNN was significantly (p = 0.016) superior in classifying the cropped images. Interpretation: With limited image information available, a CNN was able to outperform 11 histopathologists in the classification of histopathological melanoma images and thus shows promise to assist human melanoma diagnoses.

AB - Background: The diagnosis of most cancers is made by a board-certified pathologist based on a tissue biopsy under the microscope. Recent research reveals a high discordance between individual pathologists. For melanoma, the literature reports on 25–26% of discordance for classifying a benign nevus versus malignant melanoma. A recent study indicated the potential of deep learning to lower these discordances. However, the performance of deep learning in classifying histopathologic melanoma images was never compared directly to human experts. The aim of this study is to perform such a first direct comparison. Methods: A total of 695 lesions were classified by an expert histopathologist in accordance with current guidelines (350 nevi/345 melanoma). Only the haematoxylin & eosin (H&E) slides of these lesions were digitalised via a slide scanner and then randomly cropped. A total of 595 of the resulting images were used to train a convolutional neural network (CNN). The additional 100 H&E image sections were used to test the results of the CNN in comparison to 11 histopathologists. Three combined McNemar tests comparing the results of the CNNs test runs in terms of sensitivity, specificity and accuracy were predefined to test for significance (p < 0.05). Findings: The CNN achieved a mean sensitivity/specificity/accuracy of 76%/60%/68% over 11 test runs. In comparison, the 11 pathologists achieved a mean sensitivity/specificity/accuracy of 51.8%/66.5%/59.2%. Thus, the CNN was significantly (p = 0.016) superior in classifying the cropped images. Interpretation: With limited image information available, a CNN was able to outperform 11 histopathologists in the classification of histopathological melanoma images and thus shows promise to assist human melanoma diagnoses.

KW - Artificial intelligence

KW - Deep learning

KW - Histopathology

KW - Melanoma

KW - Pathology

UR - http://www.scopus.com/inward/record.url?scp=85069049832&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85069049832&partnerID=8YFLogxK

U2 - 10.1016/j.ejca.2019.06.012

DO - 10.1016/j.ejca.2019.06.012

M3 - Article

C2 - 31325876

AN - SCOPUS:85069049832

VL - 118

SP - 91

EP - 96

JO - European Journal of Cancer

JF - European Journal of Cancer

SN - 0959-8049

ER -