Chinese Named Entity Recognition Based on Boundary Enhancement with Multi-Class Information
-
Published:2023-12-02
Issue:23
Volume:13
Page:12925
-
ISSN:2076-3417
-
Container-title:Applied Sciences
-
language:en
-
Short-container-title:Applied Sciences
Author:
Li Shuiyan1, Qi Rongzhi23, Zhang Shengnan2
Affiliation:
1. School of Mathematics, Hohai University, Nanjing 211100, China 2. School of Computer and Information, Hohai University, Nanjing 211100, China 3. Key Laboratory of Water Big Data Technology of Ministry of Water Resources, Hohai University, Nanjing 211100, China
Abstract
Compared with English named entity recognition (NER), Chinese NER faces significant challenges due to the flexible, non-standard word formation and vague word boundaries, which cause a lot of boundary ambiguity and reduce the accuracy of entity identification. To address this issue, we propose a boundary enhancement with multi-class information model (BEMCI). The model integrates multiple types of information into text embedding while enhancing the subsequent syntax-structure information. A syntactic information analysis module is designed to highlight important syntax information from three aspects, namely part-of-speech tags, syntactic constituents, and dependency relations, to analyze sentence structures. Meanwhile, an improved contextual attention mechanism, which combines contextual and syntactic information using a gate mechanism to control the weight fusion, is proposed to further enhance the model’s boundary determination. Multiple sets of experiments conducted on six general datasets show that BEMCI outperforms other baselines, achieving the best results in four of these six datasets.
Funder
Key Research and Development Program of China
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference34 articles.
1. A survey of named entity recognition and classification;Nadeau;Lingvist. Investig.,2007 2. Chinese Named Entity Recognition: The State of the Art;Liu;Neurocomputing,2022 3. Sun, Z., Li, X., Sun, X., Meng, Y., Ao, X., He, Q., Wu, F., and Li, J. (2021, January 1–6). ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online. 4. Chen, H., Yu, S., and Lin, S. (2020, January 5–10). Glyph2vec: Learning Chinese Out-of-Vocabulary Word Embedding from Glyphs. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online. 5. Xuan, Z., Bao, R., and Jiang, S. (2020, January 12–15). FGN: Fusion Glyph Network for Chinese Named Entity Recognition. Proceedings of the 14th China Conference on Knowledge Graph and Semantic Computing (CCKS 2020), Nanchang, China.
|
|