A convolutional neural network trained with dermoscopic images performed on par with 145 dermatologists in a clinical melanoma image classification task

Collaborators

Research output: Contribution to journalArticle

Abstract

Background: Recent studies have demonstrated the use of convolutional neural networks (CNNs) to classify images of melanoma with accuracies comparable to those achieved by board-certified dermatologists. However, the performance of a CNN exclusively trained with dermoscopic images in a clinical image classification task in direct competition with a large number of dermatologists has not been measured to date. This study compares the performance of a convolutional neuronal network trained with dermoscopic images exclusively for identifying melanoma in clinical photographs with the manual grading of the same images by dermatologists. Methods: We compared automatic digital melanoma classification with the performance of 145 dermatologists of 12 German university hospitals. We used methods from enhanced deep learning to train a CNN with 12,378 open-source dermoscopic images. We used 100 clinical images to compare the performance of the CNN to that of the dermatologists. Dermatologists were compared with the deep neural network in terms of sensitivity, specificity and receiver operating characteristics. Findings: The mean sensitivity and specificity achieved by the dermatologists with clinical images was 89.4% (range: 55.0%–100%) and 64.4% (range: 22.5%–92.5%). At the same sensitivity, the CNN exhibited a mean specificity of 68.2% (range 47.5%–86.25%). Among the dermatologists, the attendings showed the highest mean sensitivity of 92.8% at a mean specificity of 57.7%. With the same high sensitivity of 92.8%, the CNN had a mean specificity of 61.1%. Interpretation: For the first time, dermatologist-level image classification was achieved on a clinical image classification task without training on clinical images. The CNN had a smaller variance of results indicating a higher robustness of computer vision compared with human assessment for dermatologic image classification tasks.

Original languageEnglish (US)
Pages (from-to)148-154
Number of pages7
JournalEuropean Journal of Cancer
Volume111
DOIs
StatePublished - Apr 1 2019

Fingerprint

Melanoma
Dermatologists
Sensitivity and Specificity
ROC Curve
Learning

Keywords

  • Artificial intelligence
  • Diagnostics
  • Melanoma
  • Skin cancer

ASJC Scopus subject areas

  • Oncology
  • Cancer Research

Cite this

@article{d19b46730bb34429a78cc23b1e652053,
title = "A convolutional neural network trained with dermoscopic images performed on par with 145 dermatologists in a clinical melanoma image classification task",
abstract = "Background: Recent studies have demonstrated the use of convolutional neural networks (CNNs) to classify images of melanoma with accuracies comparable to those achieved by board-certified dermatologists. However, the performance of a CNN exclusively trained with dermoscopic images in a clinical image classification task in direct competition with a large number of dermatologists has not been measured to date. This study compares the performance of a convolutional neuronal network trained with dermoscopic images exclusively for identifying melanoma in clinical photographs with the manual grading of the same images by dermatologists. Methods: We compared automatic digital melanoma classification with the performance of 145 dermatologists of 12 German university hospitals. We used methods from enhanced deep learning to train a CNN with 12,378 open-source dermoscopic images. We used 100 clinical images to compare the performance of the CNN to that of the dermatologists. Dermatologists were compared with the deep neural network in terms of sensitivity, specificity and receiver operating characteristics. Findings: The mean sensitivity and specificity achieved by the dermatologists with clinical images was 89.4{\%} (range: 55.0{\%}–100{\%}) and 64.4{\%} (range: 22.5{\%}–92.5{\%}). At the same sensitivity, the CNN exhibited a mean specificity of 68.2{\%} (range 47.5{\%}–86.25{\%}). Among the dermatologists, the attendings showed the highest mean sensitivity of 92.8{\%} at a mean specificity of 57.7{\%}. With the same high sensitivity of 92.8{\%}, the CNN had a mean specificity of 61.1{\%}. Interpretation: For the first time, dermatologist-level image classification was achieved on a clinical image classification task without training on clinical images. The CNN had a smaller variance of results indicating a higher robustness of computer vision compared with human assessment for dermatologic image classification tasks.",
keywords = "Artificial intelligence, Diagnostics, Melanoma, Skin cancer",
author = "Collaborators and Brinker, {Titus J.} and Achim Hekler and Enk, {Alexander H.} and Joachim Klode and Axel Hauschild and Carola Berking and Bastian Schilling and Sebastian Haferkamp and Dirk Schadendorf and Stefan Fr{\"o}hling and Utikal, {Jochen S.} and {von Kalle}, Christof and Wiebke Ludwig-Peitsch and Judith Sirokay and {von Kalle}, Christof and Magarete Albrecht and Katharina Baratella and Lena Bischof and Eleftheria Chorti and Anna Dith and Christina Drusio and Nina Giese and Emmanouil Gratsias and Klaus Griewank and Sandra Hallasch and Zdenka Hanhart and Saskia Herz and Katja Hohaus and Philipp Jansen and Finja Jockenh{\"o}fer and Theodora Kanaki and Sarah Knispel and Katja Leonhard and Anna Martaki and Liliana Matei and Johanna Matull and Alexandra Olischewski and Maximilian Petri and Placke, {Jan Malte} and Simon Raub and Katrin Salva and Swantje Schlott and Elsa Sody and Nadine Steingrube and Ingo Stoffels and Selma Ugurel and Wiebke Sondermann and Anne Zaremba and Christoffer Gebhardt and Nina Booken",
year = "2019",
month = "4",
day = "1",
doi = "10.1016/j.ejca.2019.02.005",
language = "English (US)",
volume = "111",
pages = "148--154",
journal = "European Journal of Cancer",
issn = "0959-8049",
publisher = "Elsevier Limited",

}

TY - JOUR

T1 - A convolutional neural network trained with dermoscopic images performed on par with 145 dermatologists in a clinical melanoma image classification task

AU - Collaborators

AU - Brinker, Titus J.

AU - Hekler, Achim

AU - Enk, Alexander H.

AU - Klode, Joachim

AU - Hauschild, Axel

AU - Berking, Carola

AU - Schilling, Bastian

AU - Haferkamp, Sebastian

AU - Schadendorf, Dirk

AU - Fröhling, Stefan

AU - Utikal, Jochen S.

AU - von Kalle, Christof

AU - Ludwig-Peitsch, Wiebke

AU - Sirokay, Judith

AU - von Kalle, Christof

AU - Albrecht, Magarete

AU - Baratella, Katharina

AU - Bischof, Lena

AU - Chorti, Eleftheria

AU - Dith, Anna

AU - Drusio, Christina

AU - Giese, Nina

AU - Gratsias, Emmanouil

AU - Griewank, Klaus

AU - Hallasch, Sandra

AU - Hanhart, Zdenka

AU - Herz, Saskia

AU - Hohaus, Katja

AU - Jansen, Philipp

AU - Jockenhöfer, Finja

AU - Kanaki, Theodora

AU - Knispel, Sarah

AU - Leonhard, Katja

AU - Martaki, Anna

AU - Matei, Liliana

AU - Matull, Johanna

AU - Olischewski, Alexandra

AU - Petri, Maximilian

AU - Placke, Jan Malte

AU - Raub, Simon

AU - Salva, Katrin

AU - Schlott, Swantje

AU - Sody, Elsa

AU - Steingrube, Nadine

AU - Stoffels, Ingo

AU - Ugurel, Selma

AU - Sondermann, Wiebke

AU - Zaremba, Anne

AU - Gebhardt, Christoffer

AU - Booken, Nina

PY - 2019/4/1

Y1 - 2019/4/1

N2 - Background: Recent studies have demonstrated the use of convolutional neural networks (CNNs) to classify images of melanoma with accuracies comparable to those achieved by board-certified dermatologists. However, the performance of a CNN exclusively trained with dermoscopic images in a clinical image classification task in direct competition with a large number of dermatologists has not been measured to date. This study compares the performance of a convolutional neuronal network trained with dermoscopic images exclusively for identifying melanoma in clinical photographs with the manual grading of the same images by dermatologists. Methods: We compared automatic digital melanoma classification with the performance of 145 dermatologists of 12 German university hospitals. We used methods from enhanced deep learning to train a CNN with 12,378 open-source dermoscopic images. We used 100 clinical images to compare the performance of the CNN to that of the dermatologists. Dermatologists were compared with the deep neural network in terms of sensitivity, specificity and receiver operating characteristics. Findings: The mean sensitivity and specificity achieved by the dermatologists with clinical images was 89.4% (range: 55.0%–100%) and 64.4% (range: 22.5%–92.5%). At the same sensitivity, the CNN exhibited a mean specificity of 68.2% (range 47.5%–86.25%). Among the dermatologists, the attendings showed the highest mean sensitivity of 92.8% at a mean specificity of 57.7%. With the same high sensitivity of 92.8%, the CNN had a mean specificity of 61.1%. Interpretation: For the first time, dermatologist-level image classification was achieved on a clinical image classification task without training on clinical images. The CNN had a smaller variance of results indicating a higher robustness of computer vision compared with human assessment for dermatologic image classification tasks.

AB - Background: Recent studies have demonstrated the use of convolutional neural networks (CNNs) to classify images of melanoma with accuracies comparable to those achieved by board-certified dermatologists. However, the performance of a CNN exclusively trained with dermoscopic images in a clinical image classification task in direct competition with a large number of dermatologists has not been measured to date. This study compares the performance of a convolutional neuronal network trained with dermoscopic images exclusively for identifying melanoma in clinical photographs with the manual grading of the same images by dermatologists. Methods: We compared automatic digital melanoma classification with the performance of 145 dermatologists of 12 German university hospitals. We used methods from enhanced deep learning to train a CNN with 12,378 open-source dermoscopic images. We used 100 clinical images to compare the performance of the CNN to that of the dermatologists. Dermatologists were compared with the deep neural network in terms of sensitivity, specificity and receiver operating characteristics. Findings: The mean sensitivity and specificity achieved by the dermatologists with clinical images was 89.4% (range: 55.0%–100%) and 64.4% (range: 22.5%–92.5%). At the same sensitivity, the CNN exhibited a mean specificity of 68.2% (range 47.5%–86.25%). Among the dermatologists, the attendings showed the highest mean sensitivity of 92.8% at a mean specificity of 57.7%. With the same high sensitivity of 92.8%, the CNN had a mean specificity of 61.1%. Interpretation: For the first time, dermatologist-level image classification was achieved on a clinical image classification task without training on clinical images. The CNN had a smaller variance of results indicating a higher robustness of computer vision compared with human assessment for dermatologic image classification tasks.

KW - Artificial intelligence

KW - Diagnostics

KW - Melanoma

KW - Skin cancer

UR - http://www.scopus.com/inward/record.url?scp=85062449115&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062449115&partnerID=8YFLogxK

U2 - 10.1016/j.ejca.2019.02.005

DO - 10.1016/j.ejca.2019.02.005

M3 - Article

VL - 111

SP - 148

EP - 154

JO - European Journal of Cancer

JF - European Journal of Cancer

SN - 0959-8049

ER -