Automated hypothesis generation based on mining scientific literature

Scott Spangler, Angela D. Wilkins, Benjamin J. Bachman, Meena Nagarajan, Tajhal Dayaram, Peter Haas, Sam Regenbogen, Curtis R. Pickering, Austin Comer, Jeffrey N. Myers, Ioana Stanoi, Linda Kato, Ana Lelescu, Jacques J. Labrie, Neha Parikh, Andreas Martin Lisewski, Lawrence Donehower, Ying Chen, Olivier Lichtarge

Research output: Chapter in Book/Report/Conference proceedingConference contribution

70 Scopus citations

Abstract

Keeping up with the ever-expanding flow of data and publications is untenable and poses a fundamental bottleneck to scientific progress. Current search technologies typically find many relevant documents, but they do not extract and organize the information content of these documents or suggest new scientific hypotheses based on this organized content. We present an initial case study on KnIT, a prototype system that mines the information contained in the scientific literature, represents it explicitly in a queriable network, and then further reasons upon these data to generate novel and experimentally testable hypotheses. KnIT combines entity detection with neighbor-text feature analysis and with graph-based diffusion of information to identify potential new properties of entities that are strongly implied by existing relationships. We discuss a successful application of our approach that mines the published literature to identify new protein kinases that phosphorylate the protein tumor suppressor p53. Retrospective analysis demonstrates the accuracy of this approach and ongoing laboratory experiments suggest that kinases identified by our system may indeed phosphorylate p53. These results establish proof of principle for automated hypothesis generation and discovery based on text mining of the scientific literature.

Original languageEnglish (US)
Title of host publicationKDD 2014 - Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages1877-1886
Number of pages10
ISBN (Print)9781450329569
DOIs
StatePublished - 2014
Event20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014 - New York, NY, United States
Duration: Aug 24 2014Aug 27 2014

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Other

Other20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014
Country/TerritoryUnited States
CityNew York, NY
Period8/24/148/27/14

Keywords

  • hypothesis generation
  • scientific discovery
  • text mining

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Automated hypothesis generation based on mining scientific literature'. Together they form a unique fingerprint.

Cite this