Deep learning outperformed 136 of 157 dermatologists in a head-to-head dermoscopic melanoma image classification task

Collaborators

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

Background: Recent studies have successfully demonstrated the use of deep-learning algorithms for dermatologist-level classification of suspicious lesions by the use of excessive proprietary image databases and limited numbers of dermatologists. For the first time, the performance of a deep-learning algorithm trained by open-source images exclusively is compared to a large number of dermatologists covering all levels within the clinical hierarchy. Methods: We used methods from enhanced deep learning to train a convolutional neural network (CNN) with 12,378 open-source dermoscopic images. We used 100 images to compare the performance of the CNN to that of the 157 dermatologists from 12 university hospitals in Germany. Outperformance of dermatologists by the deep neural network was measured in terms of sensitivity, specificity and receiver operating characteristics. Findings: The mean sensitivity and specificity achieved by the dermatologists with dermoscopic images was 74.1% (range 40.0%–100%) and 60% (range 21.3%–91.3%), respectively. At a mean sensitivity of 74.1%, the CNN exhibited a mean specificity of 86.5% (range 70.8%–91.3%). At a mean specificity of 60%, a mean sensitivity of 87.5% (range 80%–95%) was achieved by our algorithm. Among the dermatologists, the chief physicians showed the highest mean specificity of 69.2% at a mean sensitivity of 73.3%. With the same high specificity of 69.2%, the CNN had a mean sensitivity of 84.5%. Interpretation: A CNN trained by open-source images exclusively outperformed 136 of the 157 dermatologists and all the different levels of experience (from junior to chief physicians) in terms of average specificity and sensitivity.

Original languageEnglish (US)
Pages (from-to)47-54
Number of pages8
JournalEuropean Journal of Cancer
Volume113
DOIs
StatePublished - May 2019

Fingerprint

Melanoma
Learning
Sensitivity and Specificity
Physicians
Dermatologists
ROC Curve
Germany
Databases

Keywords

  • Artificial intelligence
  • Melanoma
  • Skin cancer

ASJC Scopus subject areas

  • Oncology
  • Cancer Research

Cite this

Deep learning outperformed 136 of 157 dermatologists in a head-to-head dermoscopic melanoma image classification task. / Collaborators.

In: European Journal of Cancer, Vol. 113, 05.2019, p. 47-54.

Research output: Contribution to journalArticle

@article{8cd1c619d7b94c45b78e638506676141,
title = "Deep learning outperformed 136 of 157 dermatologists in a head-to-head dermoscopic melanoma image classification task",
abstract = "Background: Recent studies have successfully demonstrated the use of deep-learning algorithms for dermatologist-level classification of suspicious lesions by the use of excessive proprietary image databases and limited numbers of dermatologists. For the first time, the performance of a deep-learning algorithm trained by open-source images exclusively is compared to a large number of dermatologists covering all levels within the clinical hierarchy. Methods: We used methods from enhanced deep learning to train a convolutional neural network (CNN) with 12,378 open-source dermoscopic images. We used 100 images to compare the performance of the CNN to that of the 157 dermatologists from 12 university hospitals in Germany. Outperformance of dermatologists by the deep neural network was measured in terms of sensitivity, specificity and receiver operating characteristics. Findings: The mean sensitivity and specificity achieved by the dermatologists with dermoscopic images was 74.1{\%} (range 40.0{\%}–100{\%}) and 60{\%} (range 21.3{\%}–91.3{\%}), respectively. At a mean sensitivity of 74.1{\%}, the CNN exhibited a mean specificity of 86.5{\%} (range 70.8{\%}–91.3{\%}). At a mean specificity of 60{\%}, a mean sensitivity of 87.5{\%} (range 80{\%}–95{\%}) was achieved by our algorithm. Among the dermatologists, the chief physicians showed the highest mean specificity of 69.2{\%} at a mean sensitivity of 73.3{\%}. With the same high specificity of 69.2{\%}, the CNN had a mean sensitivity of 84.5{\%}. Interpretation: A CNN trained by open-source images exclusively outperformed 136 of the 157 dermatologists and all the different levels of experience (from junior to chief physicians) in terms of average specificity and sensitivity.",
keywords = "Artificial intelligence, Melanoma, Skin cancer",
author = "Collaborators and Brinker, {Titus J.} and Achim Hekler and Enk, {Alexander H.} and Joachim Klode and Axel Hauschild and Carola Berking and Bastian Schilling and Sebastian Haferkamp and Dirk Schadendorf and Tim Holland-Letz and Utikal, {Jochen S.} and {von Kalle}, Christof and Wiebke Ludwig-Peitsch and Judith Sirokay and Lucie Heinzerling and Magarete Albrecht and Katharina Baratella and Lena Bischof and Eleftheria Chorti and Anna Dith and Christina Drusio and Nina Giese and Emmanouil Gratsias and Klaus Griewank and Sandra Hallasch and Zdenka Hanhart and Saskia Herz and Katja Hohaus and Philipp Jansen and Finja Jockenh{\"o}fer and Theodora Kanaki and Sarah Knispel and Katja Leonhard and Anna Martaki and Liliana Matei and Johanna Matull and Alexandra Olischewski and Maximilian Petri and Placke, {Jan Malte} and Simon Raub and Katrin Salva and Swantje Schlott and Elsa Sody and Nadine Steingrube and Ingo Stoffels and Selma Ugurel and Anne Zaremba and Christoffer Gebhardt and Nina Booken and Maria Christolouka",
year = "2019",
month = "5",
doi = "10.1016/j.ejca.2019.04.001",
language = "English (US)",
volume = "113",
pages = "47--54",
journal = "European Journal of Cancer",
issn = "0959-8049",
publisher = "Elsevier Limited",

}

TY - JOUR

T1 - Deep learning outperformed 136 of 157 dermatologists in a head-to-head dermoscopic melanoma image classification task

AU - Collaborators

AU - Brinker, Titus J.

AU - Hekler, Achim

AU - Enk, Alexander H.

AU - Klode, Joachim

AU - Hauschild, Axel

AU - Berking, Carola

AU - Schilling, Bastian

AU - Haferkamp, Sebastian

AU - Schadendorf, Dirk

AU - Holland-Letz, Tim

AU - Utikal, Jochen S.

AU - von Kalle, Christof

AU - Ludwig-Peitsch, Wiebke

AU - Sirokay, Judith

AU - Heinzerling, Lucie

AU - Albrecht, Magarete

AU - Baratella, Katharina

AU - Bischof, Lena

AU - Chorti, Eleftheria

AU - Dith, Anna

AU - Drusio, Christina

AU - Giese, Nina

AU - Gratsias, Emmanouil

AU - Griewank, Klaus

AU - Hallasch, Sandra

AU - Hanhart, Zdenka

AU - Herz, Saskia

AU - Hohaus, Katja

AU - Jansen, Philipp

AU - Jockenhöfer, Finja

AU - Kanaki, Theodora

AU - Knispel, Sarah

AU - Leonhard, Katja

AU - Martaki, Anna

AU - Matei, Liliana

AU - Matull, Johanna

AU - Olischewski, Alexandra

AU - Petri, Maximilian

AU - Placke, Jan Malte

AU - Raub, Simon

AU - Salva, Katrin

AU - Schlott, Swantje

AU - Sody, Elsa

AU - Steingrube, Nadine

AU - Stoffels, Ingo

AU - Ugurel, Selma

AU - Zaremba, Anne

AU - Gebhardt, Christoffer

AU - Booken, Nina

AU - Christolouka, Maria

PY - 2019/5

Y1 - 2019/5

N2 - Background: Recent studies have successfully demonstrated the use of deep-learning algorithms for dermatologist-level classification of suspicious lesions by the use of excessive proprietary image databases and limited numbers of dermatologists. For the first time, the performance of a deep-learning algorithm trained by open-source images exclusively is compared to a large number of dermatologists covering all levels within the clinical hierarchy. Methods: We used methods from enhanced deep learning to train a convolutional neural network (CNN) with 12,378 open-source dermoscopic images. We used 100 images to compare the performance of the CNN to that of the 157 dermatologists from 12 university hospitals in Germany. Outperformance of dermatologists by the deep neural network was measured in terms of sensitivity, specificity and receiver operating characteristics. Findings: The mean sensitivity and specificity achieved by the dermatologists with dermoscopic images was 74.1% (range 40.0%–100%) and 60% (range 21.3%–91.3%), respectively. At a mean sensitivity of 74.1%, the CNN exhibited a mean specificity of 86.5% (range 70.8%–91.3%). At a mean specificity of 60%, a mean sensitivity of 87.5% (range 80%–95%) was achieved by our algorithm. Among the dermatologists, the chief physicians showed the highest mean specificity of 69.2% at a mean sensitivity of 73.3%. With the same high specificity of 69.2%, the CNN had a mean sensitivity of 84.5%. Interpretation: A CNN trained by open-source images exclusively outperformed 136 of the 157 dermatologists and all the different levels of experience (from junior to chief physicians) in terms of average specificity and sensitivity.

AB - Background: Recent studies have successfully demonstrated the use of deep-learning algorithms for dermatologist-level classification of suspicious lesions by the use of excessive proprietary image databases and limited numbers of dermatologists. For the first time, the performance of a deep-learning algorithm trained by open-source images exclusively is compared to a large number of dermatologists covering all levels within the clinical hierarchy. Methods: We used methods from enhanced deep learning to train a convolutional neural network (CNN) with 12,378 open-source dermoscopic images. We used 100 images to compare the performance of the CNN to that of the 157 dermatologists from 12 university hospitals in Germany. Outperformance of dermatologists by the deep neural network was measured in terms of sensitivity, specificity and receiver operating characteristics. Findings: The mean sensitivity and specificity achieved by the dermatologists with dermoscopic images was 74.1% (range 40.0%–100%) and 60% (range 21.3%–91.3%), respectively. At a mean sensitivity of 74.1%, the CNN exhibited a mean specificity of 86.5% (range 70.8%–91.3%). At a mean specificity of 60%, a mean sensitivity of 87.5% (range 80%–95%) was achieved by our algorithm. Among the dermatologists, the chief physicians showed the highest mean specificity of 69.2% at a mean sensitivity of 73.3%. With the same high specificity of 69.2%, the CNN had a mean sensitivity of 84.5%. Interpretation: A CNN trained by open-source images exclusively outperformed 136 of the 157 dermatologists and all the different levels of experience (from junior to chief physicians) in terms of average specificity and sensitivity.

KW - Artificial intelligence

KW - Melanoma

KW - Skin cancer

UR - http://www.scopus.com/inward/record.url?scp=85064094312&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064094312&partnerID=8YFLogxK

U2 - 10.1016/j.ejca.2019.04.001

DO - 10.1016/j.ejca.2019.04.001

M3 - Article

C2 - 30981091

AN - SCOPUS:85064094312

VL - 113

SP - 47

EP - 54

JO - European Journal of Cancer

JF - European Journal of Cancer

SN - 0959-8049

ER -