Bioschemas & Schema.org: a Lightweight Semantic Layer for Life Sciences Websites

Author:

Michel FranckORCID,The Bioschemas Community

Abstract

Web portals are commonly used to expose and share scientific data. They enable end users to find, organize and obtain data relevant to their interests. With the continuous growth of data across all science domains, researchers commonly find themselves overwhelmed as finding, retrieving and making sense of data becomes increasingly difficult. Search engines can help find relevant websites, but the short summarizations they provide in results lists are often little informative on how relevant a website is with respect to research interests. To yield better results, a strategy adopted by Google, Yahoo, Yandex and Bing involves consuming structured content that they extract from websites. Towards this end, the schema.org collaborative community defines vocabularies covering common entities and relationships (e.g., events, organizations, creative works) (Guha et al. 2016). Websites can leverage these vocabularies to embed semantic annotations within web pages, in the form of markup using standard formats. Search engines, in turn, exploit semantic markup to enhance the ranking of most relevant resources while providing more informative and accurate summarization. Additionally, adding such rich metadata is a step forward to make data FAIR, i.e. Findable, Accessible, Interoperable and Reusable. Although schema.org encompasses terms related to data repositories, datasets, citations, events, etc., it lacks specialized terms for modeling research entities. The Bioschemas community (Garcia et al. 2017) aims to extend schema.org to support markup for Life Sciences websites. A major pillar lies in reusing types from schema.org as well as well-adopted domain ontologies, while only proposing a limited set of new types. The goal is to enable semantic cross-linking between knowledge graphs extracted from marked-up websites. An overview of the main types is presented in Fig. 1. Bioschemas also provides profiles that specify how to describe an entity of some type. For instance, the protein profile requires a unique identifier, recommends to list transcribed genes and associated diseases, and points to recommended terms from the Protein Ontology and Semantic Science Integrated Ontology. The success of schema.org lies in its simplicity and the support by major search engines. By extending schema.org, Bioschemas enables life sciences research communities to benefit from a lightweight semantic layer on websites and thus facilitates discoverability and interoperability across them. From an initial pilot including just a few bio-types such as proteins and samples, the Bioschemas community has grown and is now opening up towards other disciplines. The biodiversity domain is a promising candidate for such further extensions. We can think of additional profiles to account for biodiversity-related information. For instance, since taxonomic registers are the backbone of many web portals and databases, new profiles could describe taxa and scientific names while reusing well-adopted vocabularies such as Darwin Core terms (Baskauf et al. 2016) or TDWG ontologies (TDWG Vocabulary Management Task Group 2013). Fostering the use of such markup by web portals reporting traits, observations or museum collections could not only improve information discovery using search engines, but could also be a key to spur large-scale biodiversity data integration scenarios.

Publisher

Pensoft Publishers

Reference4 articles.

1. Lessons Learned from Adapting the Darwin Core Vocabulary Standard for Use in RDF;Baskauf;Semantic Web,2016

2. Bioschemas: schema.org for the Life Sciences;Garcia;Proceedings of SWAT4LS,2017

3. Schema.org: Evolution of Structured Data on the Web;Guha;Communications of the ACM,2016

4. Report of the TDWG Vocabulary Management Task Group (VoMaG);Group

Cited by 16 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3