Semantic representation of scientific literature: bringing claims, contributions and named entities onto the Linked Open Data cloud

Author:

Sateli Bahar1,Witte René1

Affiliation:

1. Semantic Software Lab, Department of Computer Science and Software Engineering, Concordia University, Montréal, Québec, Canada

Abstract

Motivation.Finding relevant scientific literature is one of the essential tasks researchers are facing on a daily basis. Digital libraries and web information retrieval techniques provide rapid access to a vast amount of scientific literature. However, no further automated support is available that would enable fine-grained access to the knowledge ‘stored’ in these documents. The emerging domain ofSemantic Publishingaims at making scientific knowledge accessible to both humans and machines, by adding semantic annotations to content, such as a publication’s contributions, methods, or application domains. However, despite the promises of better knowledge access, the manual annotation of existing research literature is prohibitively expensive for wide-spread adoption. We argue that a novel combination of three distinct methods can significantly advance this vision in a fully-automated way: (i) Natural Language Processing (NLP) forRhetorical Entity(RE) detection; (ii)Named Entity(NE) recognition based on the Linked Open Data (LOD) cloud; and (iii) automatic knowledge base construction for both NEs and REs using semantic web ontologies that interconnect entities in documents with the machine-readable LOD cloud.Results.We present a complete workflow to transform scientific literature into a semantic knowledge base, based on the W3C standards RDF and RDFS. A text mining pipeline, implemented based on the GATE framework, automatically extracts rhetorical entities of typeClaimsandContributionsfrom full-text scientific literature. These REs are further enriched with named entities, represented as URIs to the linked open data cloud, by integrating the DBpedia Spotlight tool into our workflow. Text mining results are stored in a knowledge base through a flexible export process that provides for a dynamic mapping of semantic annotations to LOD vocabularies through rules stored in the knowledge base. We created a gold standard corpus from computer science conference proceedings and journal articles, whereClaimandContributionsentences are manually annotated with their respective types using LOD URIs. The performance of the RE detection phase is evaluated against this corpus, where it achieves an averageF-measure of 0.73. We further demonstrate a number of semantic queries that show how the generated knowledge base can provide support for numerous use cases in managing scientific literature.Availability.All software presented in this paper is available under open source licenses athttp://www.semanticsoftware.info/semantic-scientific-literature-peerj-2015-supplements. Development releases of individual components are additionally available on our GitHub page athttps://github.com/SemanticSoftwareLab.

Funder

NSERC Discovery Grant

Publisher

PeerJ

Subject

General Computer Science

Reference33 articles.

1. Publishing on the semantic web;Berners-Lee;Nature,2001

2. Beyond genes, proteins, and abstracts: identifying scientific claims from full-text biomedical articles;Blake;Journal of Biomedical Informatics,2010

3. Semantic enrichment and search: a case study on environmental science literature;Bontcheva;D-Lib Magazine,2015

4. The Document Components Ontology (DoCO);Constantin;The Semantic Web Journal,2015

5. GATE: a framework and graphical development environment for robust NLP tools and applications;Cunningham,2002

Cited by 17 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Conceptual spaces and scientific data models;Innovative Data Integration and Conceptual Space Modeling for COVID, Cancer, and Cardiac Care;2022

2. Discovering Research Hypotheses in Social Science Using Knowledge Graph Embeddings;The Semantic Web;2021

3. Generate FAIR Literature Surveys with Scholarly Knowledge Graphs;Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020;2020-08

4. From Publications to Knowledge Graphs;Communications in Computer and Information Science;2020

5. What's in this Collection Dataset? Semantic Annotation with GATE;Biodiversity Information Science and Standards;2019-06-18

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3