Dug: A Semantic Search Engine Leveraging Peer-Reviewed Knowledge to Span Biomedical Data Repositories-Reference-Cited by-同舟云学术

Dug: A Semantic Search Engine Leveraging Peer-Reviewed Knowledge to Span Biomedical Data Repositories

Published:2021-07-09 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Waldrop Alexander M.^ORCID,Cheadle John B.^ORCID,Bradford Kira^ORCID,Preiss Alexander^ORCID,Chew Robert^ORCID,Holt Jonathan R.^ORCID,Braswell Nathan,Watson Matt,Crerar Andrew,Ball Chris M.^ORCID,Kebede Yaphet^ORCID,Schreep Carl,Linebaugh PJ,Hiles Hannah^ORCID,Boyles Rebecca^ORCID,Bizon Chris^ORCID,Krishnamurthy Ashok^ORCID,Cox Steve^ORCID

Abstract

AbstractMotivationAs the number of public data resources continues to proliferate, identifying relevant datasets across heterogenous repositories is becoming critical to answering scientific questions. To help researchers navigate this data landscape, we developed Dug: a semantic search tool for biomedical datasets utilizing evidence-based relationships from curated knowledge graphs to find relevant datasets and explain why those results are returned.ResultsDeveloped through the National Heart, Lung, and Blood Institute’s (NHLBI) BioData Catalyst ecosystem, Dug has indexed more than 15,911 study variables from public datasets. On a manually curated search dataset, Dug’s total recall (total relevant results/total results) of 0.79 outperformed default Elasticsearch’s total recall of 0.76. When using synonyms or related concepts as search queries, Dug (0.36) far outperformed Elasticsearch (0.14) in terms of total recall with no significant loss in the precision of its top results.Availability and ImplementationDug is freely available at https://github.com/helxplatform/dug. An example Dug deployment is also available for use at https://search.biodatacatalyst.renci.org/.Contactawaldrop@rti.org or scox@renci.org

Publisher

Cold Spring Harbor Laboratory

Reference33 articles.

1. Finding useful data across multiple biomedical data repositories using DataMed;Nat. Genet,2019

2. The Biomedical Data Translator Program: Conception, Culture, and Community;Biomedical Data Translator Consortium;Clin. Transl. Sci,2019

3. ROBOKOP KG and KGB: Integrated Knowledge Graphs from Federated Sources;J. Chem. Inf. Model,2019

4. The Unified Medical Language System (UMLS): integrating biomedical terminology

5. Brickley, D. et al. (2019) Google Dataset Search: Building a Search Engine for Datasets in an Open Web Ecosystem. In, The World Wide Web Conference, WWW ’19. Association for Computing Machinery, New York, NY, USA, pp. 1365–1375.