PNER: Applying the Pipeline Method to Resolve Nested Issues in Named Entity Recognition-Reference-Cited by-同舟云学术

PNER: Applying the Pipeline Method to Resolve Nested Issues in Named Entity Recognition

Published:2024-02-20 Issue:5 Volume:14 Page:1717
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Yang Hongjian¹^ORCID,Zhang Qinghao¹,Kwon Hyuk-Chul¹

Affiliation:

1. Center for Artificial Intelligence Research, Pusan National University, Busan 46241, Republic of Korea

Abstract

Named entity recognition (NER) in natural language processing encompasses three primary types: flat, nested, and discontinuous. While the flat type often garners attention from researchers, nested NER poses a significant challenge. Current approaches to addressing nested NER involve sequence labeling methods with merged label layers, cascaded models, and those rooted in reading comprehension. Among these, sequence labeling with merged label layers stands out for its simplicity and ease of implementation. Yet, highlighted issues persist within this method, prompting our aim to enhance its efficacy. In this study, we propose augmentations to the sequence labeling approach by employing a pipeline model bifurcated into sequence labeling and text classification tasks. Departing from annotating specific entity categories, we amalgamated types into main and sub-categories for a unified treatment. These categories were subsequently embedded as identifiers in the recognition text for the text categorization task. Our choice of resolution involved BERT+BiLSTM+CRF for sequence labeling and the BERT model for text classification. Experiments were conducted across three nested NER datasets: GENIA, CMeEE, and GermEval 2014, featuring annotations varying from four to two levels. Before model training, we conducted separate statistical analyses on nested entities within the medical dataset CMeEE and the everyday life dataset GermEval 2014. Our research unveiled a consistent dominance of a particular entity category within nested entities across both datasets. This observation suggests the potential utility of labeling primary and subsidiary entities for effective category recognition. Model performance was evaluated based on F1 scores, considering correct recognition only when both the complete entity name and category were identified. Results showcased substantial performance enhancement after our proposed modifications compared to the original method. Additionally, our improved model exhibited strong competitiveness against existing models. F1 scores on the GENIA, CMeEE, and GermEval 2014 datasets reached 79.21, 66.71, and 87.81, respectively. Our research highlights that, while preserving the original method’s simplicity and implementation ease, our enhanced model achieves heightened performance and competitive prowess compared to other methodologies.

Funder

Institute of Information & communications Technology Planning & Evaluation

Publisher

MDPI AG

Link

https://www.mdpi.com/2076-3417/14/5/1717/pdf

Reference45 articles.

1. Babych, B., and Hartley, A. (2003, January 13). Improving Machine Translation Quality with Automatic Named Entity Recognition. Proceedings of the 7th International EAMT Workshop on MT and Other Language Technology Tools, Improving MT through Other Language Technology Tools, Resource and Tools for Building MT, Budapest, Hungary.

2. Cavedon, L., and Zukerman, I. (December, January 30). Named Entity Recognition for Question Answering. Proceedings of the Australasian Language Technology Workshop, Sydney, Australia.

3. COBERT: COVID-19 Question Answering System Using BERT;Alzubi;Arab. J. Sci. Eng.,2023

4. Le, P., and Titov, I. (2018, January 15–20). Improving Entity Linking by Modeling Latent Relations between Mentions. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia.

5. Wei, Z., Su, J., Wang, Y., Tian, Y., and Chang, Y. (2020, January 5–10). A Novel Cascade Binary Tagging Framework for Relational Triple Extraction. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online.