Leveraging Semantic Text Analysis to Improve the Performance of Transformer-Based Relation Extraction
-
Published:2024-02-06
Issue:2
Volume:15
Page:91
-
ISSN:2078-2489
-
Container-title:Information
-
language:en
-
Short-container-title:Information
Author:
Evans Marie-Therese Charlotte1, Latifi Majid2ORCID, Ahsan Mominul2ORCID, Haider Julfikar3ORCID
Affiliation:
1. Solution Consultant, IDHL Group, Central House, Otley Road, Harrogate HG3 1UF, UK 2. Department of Computer Science, University of York, Deramore Lane, York YO10 5GH, UK 3. Department of Engineering, Manchester Metropolitan University, John Dalton Building, Chester Street, Manchester M1 5GD, UK
Abstract
Keyword extraction from Knowledge Bases underpins the definition of relevancy in Digital Library search systems. However, it is the pertinent task of Joint Relation Extraction, which populates the Knowledge Bases from which results are retrieved. Recent work focuses on fine-tuned, Pre-trained Transformers. Yet, F1 scores for scientific literature achieve just 53.2, versus 69 in the general domain. The research demonstrates the failure of existing work to evidence the rationale for optimisations to finetuned classifiers. In contrast, emerging research subjectively adopts the common belief that Natural Language Processing techniques fail to derive context and shared knowledge. In fact, global context and shared knowledge account for just 10.4% and 11.2% of total relation misclassifications, respectively. In this work, the novel employment of semantic text analysis presents objective challenges for the Transformer-based classification of Joint Relation Extraction. This is the first known work to quantify that pipelined error propagation accounts for 45.3% of total relation misclassifications, the most poignant challenge in this domain. More specifically, Part-of-Speech tagging highlights the misclassification of complex noun phrases, accounting for 25.47% of relation misclassifications. Furthermore, this study identifies two limitations in the purported bidirectionality of the Bidirectional Encoder Representations from Transformers (BERT) Pre-trained Language Model. Firstly, there is a notable imbalance in the misclassification of right-to-left relations, which occurs at a rate double that of left-to-right relations. Additionally, a failure to recognise local context through determiners and prepositions contributes to 16.04% of misclassifications. Furthermore, it is highlighted that the annotation scheme of the singular dataset utilised in existing research, Scientific Entities, Relations and Coreferences (SciERC), is marred by ambiguity. Notably, two asymmetric relations within this dataset achieve recall rates of only 10% and 29%.
Reference21 articles.
1. Santosh, T.Y.S.S., Chakraborty, P., Dutta, S., Sanyal, D.K., and Das, P.P. (2021, January 13). Joint entity and relation extraction from scientific documents: Role of linguistic information and entity types. Proceedings of the EEKE@JCDL, 21—Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents, Virtual Event. 2. Towards efficient navigation in digital libraries: Leveraging popularity, semantics and communities to recommend scholarly articles;Yadav;J. Inf.,2022 3. Sequence Alignment Ensemble with a Single Neural Network for Sequence Labeling;Jung;IEEE Access,2022 4. Wang, X., Jiang, Y., Bach, N., Wang, T., Huang, Z., Huang, F., and Tu, K. (2020). Automated concatenation of embeddings for structured prediction. arXiv. 5. Machine Learning for Text, by Charu, C. Aggarwal, New York, Springer, 2018. ISBN 9783319735306. XXIII+ 493 pages;Lu;Nat. Lang. Eng.,2022
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|