Exploring named‐entity recognition techniques for academic books

Author:

Calleja Ibañez Pablo1ORCID,Giménez‐Toledo Elea2ORCID

Affiliation:

1. Artificial Intelligence Department, Ontology Engineering Group Universidad Politécnica de Madrid Madrid Spain

2. Interdisciplinary Thematic Platform ES CIENCIA, Institute of Language, Literature & Anthropology (ILLA) Spanish National Research Council (CSIC) Madrid Spain

Abstract

Recent advances in the natural language processing (NLP) field have achieved impressive results in various tasks. However, NLP techniques are underrepresented in the analysis of Humanities and Social Science texts and in languages other than English. In particular, academic books are a highly valuable source of information that has not been exploited by these techniques at all. The recognition of named entities (person names, organizations or locations) and their semantic annotation over books could enrich the visibility and discoverability of the information by users. This is an opportunity for academia and the academic publishing industry in which semantic search is a central task and now books can be queried by named entities of interest that are in their content. This work proposes a methodology to apply named‐entity recognition to publish the results into an ontological semantic‐web format. The work has been performed over a corpus of academic books provided by UNE (Unión de Editoriales Universitarias Españolas, Union of Spanish University Presses). Results show an enrichment of the information extracted over the books and of the possibilities of querying them at the individual level but also within the whole set of books, increasing the possibilities for books to be discovered or retrieved beyond metadata.

Publisher

Wiley

Reference38 articles.

1. Defining discovery: Is Google Scholar a discovery platform? An essay on the need for a new approach to scholarly discovery

2. Trainable, scalable summarization using robust NLP and machine learning

3. Arenas‐Guerrero J. Chaves‐Fraga D. Toledo J. Pérez M. S. &Corcho O.(2022).Morph‐KGC: Scalable knowledge graph materialization with mapping partitions.Semantic Web.

4. Improving machine translation quality with automatic named entity recognition

5. Cañete J. Chaperon G. Fuentes R. Ho J.‐H. Kang H. &Pérez J.(2020).Spanish pre‐trained bert model and evaluation data. PML4DC at ICLR 2020.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3