Prediction and curation of missing biomedical identifier mappings with Biomappings

Author:

Hoyt Charles Tapley1ORCID,Hoyt Amelia L2ORCID,Gyori Benjamin M1ORCID

Affiliation:

1. Laboratory of Systems Pharmacology, Harvard Medical School , Boston, MA 02115, United States

2. Beth Israel Deaconess Medical Center , Boston, MA 02215, United States

Abstract

AbstractMotivationBiomedical identifier resources (such as ontologies, taxonomies, and controlled vocabularies) commonly overlap in scope and contain equivalent entries under different identifiers. Maintaining mappings between these entries is crucial for interoperability and the integration of data and knowledge. However, there are substantial gaps in available mappings motivating their semi-automated curation.ResultsBiomappings implements a curation workflow for missing mappings which combines automated prediction with human-in-the-loop curation. It supports multiple prediction approaches and provides a web-based user interface for reviewing predicted mappings for correctness, combined with automated consistency checking. Predicted and curated mappings are made available in public, version-controlled resource files on GitHub. Biomappings currently makes available 9274 curated mappings and 40 691 predicted ones, providing previously missing mappings between widely used identifier resources covering small molecules, cell lines, diseases, and other concepts. We demonstrate the value of Biomappings on case studies involving predicting and curating missing mappings among cancer cell lines as well as small molecules tested in clinical trials. We also present how previously missing mappings curated using Biomappings were contributed back to multiple widely used community ontologies.Availability and implementationThe data and code are available under the CC0 and MIT licenses at https://github.com/biopragmatics/biomappings.

Funder

Defense Advanced Research Projects Agency

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Reference51 articles.

1. The ontologies community of practice: a CGIAR initiative for big data in agrifood systems;Arnaud;Patterns (N Y),2020

2. Automated assembly of molecular mechanisms at scale from text mining and curated databases;Bachman;Mol Syst Biol,2023

3. The Cellosaurus, a cell-line knowledge resource;Bairoch;J Biomol Tech,2018

4. Ubergraph: integrating OBO ontologies into a unified semantic graph;Balhoff;ICBO 2022,2022

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3