Research on Chinese Named Entity Recognition Based on Lexical Information and Spatial Features

Author:

Zhang Zhipeng1,Liu Shengquan1ORCID,Jian Zhaorui1ORCID,Yin Huixin1

Affiliation:

1. School of Computer Science and Technology, Xinjiang University, Urumqi 830046, China

Abstract

In the field of Chinese-named entity recognition, recent research has sparked new interest by combining lexical features with character-based methods. Although this vocabulary enhancement method provides a new perspective, it faces two main challenges: firstly, using character-by-character matching can easily lead to conflicts during the vocabulary matching process. Although existing solutions attempt to alleviate this problem by obtaining semantic information about words, they still lack sufficient temporal sequential or global information acquisition; secondly, due to the limitations of dictionaries, there may be words in a sentence that do not match the dictionary. In this situation, existing vocabulary enhancement methods cannot effectively play a role. To address these issues, this paper proposes a method based on lexical information and spatial features. This method carefully considers the neighborhood and overlap relationships of characters in vocabulary and establishes global bidirectional semantic and temporal sequential information to effectively address the impact of conflicting vocabulary and character fusion on entity segmentation. Secondly, the attention score matrix extracted by the point-by-point convolutional network captures the local spatial relationship between characters without fused vocabulary information and characters with fused vocabulary information, aiming to compensate for information loss and strengthen spatial connections. The comparison results with the baseline model show that the SISF method proposed in this paper improves the F1 metric by 0.72%, 3.12%, 1.07%, and 0.37% on the Resume, Weibo, Ontonotes 4.0, and MSRA datasets, respectively.

Funder

National Key R&D Program of China

Major Science and Technology Projects in Xinjiang Uygur Autonomous Region

National Natural Science Foundation of China

Publisher

MDPI AG

Reference32 articles.

1. McIlraith, S.A., and Weinberger, K.Q. (2018, January 2–7). Empower Sequence Labeling with Task-Aware Neural Language Model. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, LA, USA.

2. Sun, T., Shao, Y., Li, X., Liu, P., Yan, H., Qiu, X., and Huang, X. (2020, January 7–12). Learning Sparse Sharing Architectures for Multiple Tasks. Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, the Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, the Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA.

3. Mitkov, R., and Angelova, G. (2017, January 2–8). Neural Reranking for Named Entity Recognition. Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, Varna, Bulgaria.

4. F-Score Driven Max Margin Neural Network for Named Entity Recognition in Chinese Social Media;Lapata;Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017,2017

5. Jurafsky, D., Chai, J., Schluter, N., and Tetreault, J.R. (2020, January 5–10). Simplify the Usage of Lexicon in Chinese NER. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3