TY - JOUR
T1 - Biological data annotation via a human-augmenting AI-based labeling system
AU - van der Wal, Douwe
AU - Jhun, Iny
AU - Laklouk, Israa
AU - Nirschl, Jeff
AU - Richer, Lara
AU - Rojansky, Rebecca
AU - Theparee, Talent
AU - Wheeler, Joshua
AU - Sander, Jörg
AU - Feng, Felix
AU - Mohamad, Osama
AU - Savarese, Silvio
AU - Socher, Richard
AU - Esteva, Andre
N1 - Publisher Copyright:
© 2021, The Author(s).
PY - 2021/12
Y1 - 2021/12
N2 - Biology has become a prime area for the deployment of deep learning and artificial intelligence (AI), enabled largely by the massive data sets that the field can generate. Key to most AI tasks is the availability of a sufficiently large, labeled data set with which to train AI models. In the context of microscopy, it is easy to generate image data sets containing millions of cells and structures. However, it is challenging to obtain large-scale high-quality annotations for AI models. Here, we present HALS (Human-Augmenting Labeling System), a human-in-the-loop data labeling AI, which begins uninitialized and learns annotations from a human, in real-time. Using a multi-part AI composed of three deep learning models, HALS learns from just a few examples and immediately decreases the workload of the annotator, while increasing the quality of their annotations. Using a highly repetitive use-case—annotating cell types—and running experiments with seven pathologists—experts at the microscopic analysis of biological specimens—we demonstrate a manual work reduction of 90.60%, and an average data-quality boost of 4.34%, measured across four use-cases and two tissue stain types.
AB - Biology has become a prime area for the deployment of deep learning and artificial intelligence (AI), enabled largely by the massive data sets that the field can generate. Key to most AI tasks is the availability of a sufficiently large, labeled data set with which to train AI models. In the context of microscopy, it is easy to generate image data sets containing millions of cells and structures. However, it is challenging to obtain large-scale high-quality annotations for AI models. Here, we present HALS (Human-Augmenting Labeling System), a human-in-the-loop data labeling AI, which begins uninitialized and learns annotations from a human, in real-time. Using a multi-part AI composed of three deep learning models, HALS learns from just a few examples and immediately decreases the workload of the annotator, while increasing the quality of their annotations. Using a highly repetitive use-case—annotating cell types—and running experiments with seven pathologists—experts at the microscopic analysis of biological specimens—we demonstrate a manual work reduction of 90.60%, and an average data-quality boost of 4.34%, measured across four use-cases and two tissue stain types.
UR - http://www.scopus.com/inward/record.url?scp=85116515930&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85116515930&partnerID=8YFLogxK
U2 - 10.1038/s41746-021-00520-6
DO - 10.1038/s41746-021-00520-6
M3 - Article
C2 - 34620993
AN - SCOPUS:85116515930
SN - 2398-6352
VL - 4
JO - npj Digital Medicine
JF - npj Digital Medicine
IS - 1
M1 - 145
ER -