Abstract
ABSTRACTIn 2020, the White House released the, “Call to Action to the Tech Community on New Machine Readable COVID-19 Dataset,” wherein artificial intelligence experts are asked to collect data and develop text mining techniques that can help the science community answer high-priority scientific questions related to COVID-19. The Allen Institute for AI and collaborators announced the availability of a rapidly growing open dataset of publications, the COVID-19 Open Research Dataset (CORD-19). As the pace of research accelerates, biomedical scientists struggle to stay current. To expedite their investigations, scientists leverage hypothesis generation systems, which can automatically inspect published papers to discover novel implicit connections. We present an automated general purpose hypothesis generation systems AGATHA-C and AGATHA-GP for COVID-19 research. The systems are based on graph-mining and the transformer model. The systems are massively validated using retrospective information rediscovery and proactive analysis involving human-in-the-loop expert analysis. Both systems achieve high-quality predictions across domains (in some domains up to 0.97% ROC AUC) in fast computational time and are released to the broad scientific community to accelerate biomedical research. In addition, by performing the domain expert curated study, we show that the systems are able to discover on-going research findings such as the relationship between COVID-19 and oxytocin hormone.ReproducibilityAll code, details, and pre-trained models are available at https://github.com/IlyaTyagin/AGATHA-C-GPCCS CONCEPTS• Applied computing → Bioinformatics; Document management and text processing; • Computing methodologies → Learning latent representations; Neural networks; Information extraction; Semantic networks.
Publisher
Cold Spring Harbor Laboratory
Reference43 articles.
1. [n.d.]. Citations Added to MEDLINE by Fiscal Year. https://www.nlm.nih.gov/bsd/stats/cit_added.html
2. Marina Aksenova , Justin Sybrandt , Biyun Cui , Vitali Sikirzhytski , Hao Ji , Diana Odhiambo , Matthew D Lucius , Jill R Turner , Eugenia Broude , Edsel Peña , et al. 2019. Inhibition of the Dead Box RNA Helicase 3 prevents HIV-1 Tat and cocaine-induced neurotoxicity by targeting microglia activation. Journal of Neuroimmune Pharmacology (2019), 1–15.
3. Lise Alschuler , Ann Marie Chiasson , Randy Horwitz , Esther Sternberg , Robert Crocker , Andrew Weil , and Victoria Maizes . 2020. Integrative medicine considerations for convalescence from mild-to-moderate COVID-19 disease. Explore (2020).
4. Patrick Arnold and Erhard Rahm . 2015. SemRep: A repository for semantic mapping. Datenbanksysteme für Business, Technologie und Web (BTW 2015) (2015).
5. Sayantan Basu , Sinchani Chakraborty , Atif Hassan , Sana Siddique , and Ashish Anand . 2020. ERLKG: Entity Representation Learning and Knowledge Graph based association analysis of COVID-19 through mining of unstructured biomedical corpora. In Proceedings of the First Workshop on Scholarly Document Processing. Association for Computational Linguistics, Online, 127–137. https://doi.org/10.18653/v1/2020.sdp-1.15
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献