TY - JOUR
T1 - Identification of histological correlates of overall survival in lower grade gliomas using a bag-of-words paradigm
T2 - A preliminary analysis based on hematoxylin & eosin stained slides from the lower grade glioma cohort of the cancer genome Atlas
AU - Powell, Reid Trenton
AU - Olar, Adriana
AU - Narang, Shivali
AU - Rao, Ganesh
AU - Sulman, Erik
AU - Fuller, Gregory N.
AU - Rao, Arvind
N1 - Publisher Copyright:
© 2017 Journal of Pathology Informatics | Published by Wolters Kluwer - Medknow.
PY - 2017
Y1 - 2017
N2 - Background: Glioma, the most common primary brain neoplasm, describes a heterogeneous tumor of multiple histologic subtypes and cellular origins. At clinical presentation, gliomas are graded according to the World Health Organization guidelines (WHO), which reflect the malignant characteristics of the tumor based on histopathological and molecular features. Lower grade diffuse gliomas (LGGs) (WHO Grade II-III) have fewer malignant characteristics than high-grade gliomas (WHO Grade IV), and a better clinical prognosis, however, accurate discrimination of overall survival (OS) remains a challenge. In this study, we aimed to identify tissue-derived image features using a machine learning approach to predict OS in a mixed histology and grade cohort of lower grade glioma patients. To achieve this aim, we used H and E stained slides from the public LGG cohort of The Cancer Genome Atlas (TCGA) to create a machine learned dictionary of 'image-derived visual words' associated with OS. We then evaluated the combined efficacy of using these visual words in predicting short versus long OS by training a generalized machine learning model. Finally, we mapped these predictive visual words back to molecular signaling cascades to infer potential drivers of the machine learned survival-associated phenotypes. Methods: We analyzed digitized histological sections downloaded from the LGG cohort of TCGA using a bag-of-words approach. This method identified a diverse set of histological patterns that were further correlated with OS, histology, and molecular signaling activity using Cox regression, analysis of variance, and Spearman correlation, respectively. A support vector machine (SVM) model was constructed to discriminate patients into short and long OS groups dichotomized at 24-month. Results: This method identified disease-relevant phenotypes associated with OS, some of which are correlated with disease-associated molecular pathways. From these image-derived phenotypes, a generalized SVM model which could discriminate 24-month OS (area under the curve, 0.76) was obtained. Conclusion: Here, we demonstrated one potential strategy to incorporate image features derived from H and E stained slides into predictive models of OS. In addition, we showed how these image-derived phenotypic characteristics correlate with molecular signaling activity underlying the etiology or behavior of LGG.
AB - Background: Glioma, the most common primary brain neoplasm, describes a heterogeneous tumor of multiple histologic subtypes and cellular origins. At clinical presentation, gliomas are graded according to the World Health Organization guidelines (WHO), which reflect the malignant characteristics of the tumor based on histopathological and molecular features. Lower grade diffuse gliomas (LGGs) (WHO Grade II-III) have fewer malignant characteristics than high-grade gliomas (WHO Grade IV), and a better clinical prognosis, however, accurate discrimination of overall survival (OS) remains a challenge. In this study, we aimed to identify tissue-derived image features using a machine learning approach to predict OS in a mixed histology and grade cohort of lower grade glioma patients. To achieve this aim, we used H and E stained slides from the public LGG cohort of The Cancer Genome Atlas (TCGA) to create a machine learned dictionary of 'image-derived visual words' associated with OS. We then evaluated the combined efficacy of using these visual words in predicting short versus long OS by training a generalized machine learning model. Finally, we mapped these predictive visual words back to molecular signaling cascades to infer potential drivers of the machine learned survival-associated phenotypes. Methods: We analyzed digitized histological sections downloaded from the LGG cohort of TCGA using a bag-of-words approach. This method identified a diverse set of histological patterns that were further correlated with OS, histology, and molecular signaling activity using Cox regression, analysis of variance, and Spearman correlation, respectively. A support vector machine (SVM) model was constructed to discriminate patients into short and long OS groups dichotomized at 24-month. Results: This method identified disease-relevant phenotypes associated with OS, some of which are correlated with disease-associated molecular pathways. From these image-derived phenotypes, a generalized SVM model which could discriminate 24-month OS (area under the curve, 0.76) was obtained. Conclusion: Here, we demonstrated one potential strategy to incorporate image features derived from H and E stained slides into predictive models of OS. In addition, we showed how these image-derived phenotypic characteristics correlate with molecular signaling activity underlying the etiology or behavior of LGG.
KW - Bog-of-words
KW - low-grade glioma
KW - machine learning
KW - machine vision
UR - http://www.scopus.com/inward/record.url?scp=85018562763&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85018562763&partnerID=8YFLogxK
U2 - 10.4103/jpi.jpi_43_16
DO - 10.4103/jpi.jpi_43_16
M3 - Article
C2 - 28382223
AN - SCOPUS:85018562763
SN - 2229-5089
VL - 8
JO - Journal of Pathology Informatics
JF - Journal of Pathology Informatics
IS - 1
M1 - 19
ER -