TY - GEN
T1 - Predicting future scientific discoveries based on a networked analysis of the past literature
AU - Nagarajan, Meena
AU - Wilkins, Angela D.
AU - Bachman, Benjamin J.
AU - Novikov, Ilya B.
AU - Bao, Shenghua
AU - Haas, Peter J.
AU - Terrón-Díaz, María E.
AU - Bhatia, Sumit
AU - Adikesavan, Anbu K.
AU - Labrie, Jacques J.
AU - Regenbogen, Sam
AU - Buchovecky, Christie M.
AU - Pickering, Curtis R.
AU - Kato, Linda
AU - Lisewski, Andreas M.
AU - Lelescu, Ana
AU - Zhang, Houyin
AU - Boyer, Stephen
AU - Weber, Griff
AU - Chen, Ying
AU - Donehower, Lawrence
AU - Spangler, Scott
AU - Lichtarge, Olivier
PY - 2015/8/10
Y1 - 2015/8/10
N2 - We present KnIT, the Knowledge Integration Toolkit, a system for accelerating scientific discovery and predicting previously unknown protein-protein interactions. Such predictions enrich biological research and are pertinent to drug discovery and the understanding of disease. Unlike a prior study, KnIT is now fully automated and demonstrably scalable. It extracts information from the scientific literature, automatically identifying direct and indirect references to protein interactions, which is knowledge that can be represented in network form. It then reasons over this network with techniques such as matrix factorization and graph diffusion to predict new, previously unknown interactions. The accuracy and scope of KnIT's knowledge extractions are validated using comparisons to structured, manually curated data sources as well as by performing retrospective studies that predict subsequent literature discoveries using literature available prior to a given date. The KnIT methodology is a step towards automated hypothesis generation from text, with potential application to other scientific domains.
AB - We present KnIT, the Knowledge Integration Toolkit, a system for accelerating scientific discovery and predicting previously unknown protein-protein interactions. Such predictions enrich biological research and are pertinent to drug discovery and the understanding of disease. Unlike a prior study, KnIT is now fully automated and demonstrably scalable. It extracts information from the scientific literature, automatically identifying direct and indirect references to protein interactions, which is knowledge that can be represented in network form. It then reasons over this network with techniques such as matrix factorization and graph diffusion to predict new, previously unknown interactions. The accuracy and scope of KnIT's knowledge extractions are validated using comparisons to structured, manually curated data sources as well as by performing retrospective studies that predict subsequent literature discoveries using literature available prior to a given date. The KnIT methodology is a step towards automated hypothesis generation from text, with potential application to other scientific domains.
KW - Hypothesis generation
KW - Scientific discovery
KW - Text mining
UR - http://www.scopus.com/inward/record.url?scp=84954121316&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84954121316&partnerID=8YFLogxK
U2 - 10.1145/2783258.2788609
DO - 10.1145/2783258.2788609
M3 - Conference contribution
AN - SCOPUS:84954121316
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 2019
EP - 2028
BT - KDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
T2 - 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015
Y2 - 10 August 2015 through 13 August 2015
ER -