Leveraging Semantic Text Analysis to Improve the Performance of Transformer-Based Relation Extraction

Author:

Evans Marie-Therese Charlotte1,Latifi Majid2ORCID,Ahsan Mominul2ORCID,Haider Julfikar3ORCID

Affiliation:

1. Solution Consultant, IDHL Group, Central House, Otley Road, Harrogate HG3 1UF, UK

2. Department of Computer Science, University of York, Deramore Lane, York YO10 5GH, UK

3. Department of Engineering, Manchester Metropolitan University, John Dalton Building, Chester Street, Manchester M1 5GD, UK

Abstract

Keyword extraction from Knowledge Bases underpins the definition of relevancy in Digital Library search systems. However, it is the pertinent task of Joint Relation Extraction, which populates the Knowledge Bases from which results are retrieved. Recent work focuses on fine-tuned, Pre-trained Transformers. Yet, F1 scores for scientific literature achieve just 53.2, versus 69 in the general domain. The research demonstrates the failure of existing work to evidence the rationale for optimisations to finetuned classifiers. In contrast, emerging research subjectively adopts the common belief that Natural Language Processing techniques fail to derive context and shared knowledge. In fact, global context and shared knowledge account for just 10.4% and 11.2% of total relation misclassifications, respectively. In this work, the novel employment of semantic text analysis presents objective challenges for the Transformer-based classification of Joint Relation Extraction. This is the first known work to quantify that pipelined error propagation accounts for 45.3% of total relation misclassifications, the most poignant challenge in this domain. More specifically, Part-of-Speech tagging highlights the misclassification of complex noun phrases, accounting for 25.47% of relation misclassifications. Furthermore, this study identifies two limitations in the purported bidirectionality of the Bidirectional Encoder Representations from Transformers (BERT) Pre-trained Language Model. Firstly, there is a notable imbalance in the misclassification of right-to-left relations, which occurs at a rate double that of left-to-right relations. Additionally, a failure to recognise local context through determiners and prepositions contributes to 16.04% of misclassifications. Furthermore, it is highlighted that the annotation scheme of the singular dataset utilised in existing research, Scientific Entities, Relations and Coreferences (SciERC), is marred by ambiguity. Notably, two asymmetric relations within this dataset achieve recall rates of only 10% and 29%.

Publisher

MDPI AG

Reference21 articles.

1. Santosh, T.Y.S.S., Chakraborty, P., Dutta, S., Sanyal, D.K., and Das, P.P. (2021, January 13). Joint entity and relation extraction from scientific documents: Role of linguistic information and entity types. Proceedings of the EEKE@JCDL, 21—Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents, Virtual Event.

2. Towards efficient navigation in digital libraries: Leveraging popularity, semantics and communities to recommend scholarly articles;Yadav;J. Inf.,2022

3. Sequence Alignment Ensemble with a Single Neural Network for Sequence Labeling;Jung;IEEE Access,2022

4. Wang, X., Jiang, Y., Bach, N., Wang, T., Huang, Z., Huang, F., and Tu, K. (2020). Automated concatenation of embeddings for structured prediction. arXiv.

5. Machine Learning for Text, by Charu, C. Aggarwal, New York, Springer, 2018. ISBN 9783319735306. XXIII+ 493 pages;Lu;Nat. Lang. Eng.,2022

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3