Data-driven classification of the certainty of scholarly assertions

Author:

Prieto Mario1,Deus Helena2,de Waard Anita3,Schultes Erik4,García-Jiménez Beatriz5,Wilkinson Mark D.1

Affiliation:

1. Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM)- Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Pozuelo de Alarcon, Madrid, Spain

2. Elsevier Inc., Cambridge, MA, United States of America

3. Elsevier Research Collaborations Unit, Jericho, VT, United States of America

4. GO FAIR International Support and Coordination Office, Leiden, The Netherlands

5. Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM)- Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Pozuelo de Alarcon, Madrid, Spain

Abstract

The grammatical structures scholars use to express their assertions are intended to convey various degrees of certainty or speculation. Prior studies have suggested a variety of categorization systems for scholarly certainty; however, these have not been objectively tested for their validity, particularly with respect to representing the interpretation by the reader, rather than the intention of the author. In this study, we use a series of questionnaires to determine how researchers classify various scholarly assertions, using three distinct certainty classification systems. We find that there are three distinct categories of certainty along a spectrum from high to low. We show that these categories can be detected in an automated manner, using a machine learning model, with a cross-validation accuracy of 89.2% relative to an author-annotated corpus, and 82.2% accuracy against a publicly-annotated corpus. This finding provides an opportunity for contextual metadata related to certainty to be captured as a part of text-mining pipelines, which currently miss these subtle linguistic cues. We provide an exemplar machine-accessible representation—a Nanopublication—where certainty category is embedded as metadata in a formal, ontology-based manner within text-mined scholarly assertions.

Funder

Isaac Peral/Marie Curie cofund with the Universidad Politécnica de Madrid

Spanish Ministerio de Economía y Competitividad

Severo Ochoa Program for Centres of Excellence in R&D

Agencia Estatal de Investigación of Spain

Consejo Social de la Universidad Politécnica de Madrid

Publisher

PeerJ

Subject

General Agricultural and Biological Sciences,General Biochemistry, Genetics and Molecular Biology,General Medicine,General Neuroscience

Reference60 articles.

1. TensorFlow: large-scale machine learning on heterogeneous distributed systems;Abadi;CoRR,2016

2. Distinct initiation and maintenance mechanisms cooperate to induce G1 cell cycle arrest in response to DNA damage;Agami;Cell,2000

3. Standardization and Transformation in Principal Component Analysis, with Applications to Archaeometry

4. Datastories at semeval-2017 task 4: deep lstm with attention for message-level and topic-based sentiment analysis;Baziotis,2017

5. Validation of a multi-source feedback tool for use in general practice;Campbell;Education for Primary Care,2010

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. 2nd Workshop on Digital Infrastructures for Scholarly Content Objects (DISCO'22);Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries;2022-06-20

2. Digital Infrastructures for Scholarly Content Objects;2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL);2021-09

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3