Chinese Named Entity Recognition Method in History and Culture Field Based on BERT

Author:

Liu Shuang,Yang HuiORCID,Li Jiayi,Kolmanič Simon

Abstract

AbstractWith rapid development of the Internet, people have undergone tremendous changes in the way they obtain information. In recent years, knowledge graph is becoming a popular tool for the public to acquire knowledge. For knowledge graph of Chinese history and culture, most researchers adopted traditional named entity recognition methods to extract entity information from unstructured historical text data. However, the traditional named entity recognition method has certain defects, and it is easy to ignore the association between entities. To extract entities from a large amount of historical and cultural information more accurately and efficiently, this paper proposes one named entity recognition model combining Bidirectional Encoder Representations from Transformers and Bidirectional Long Short-Term Memory-Conditional Random Field (BERT-BiLSTM-CRF). First, a BERT pre-trained language model is used to encode a single character to obtain a vector representation corresponding to each character. Then one Bidirectional Long Short-Term Memory (BiLSTM) layer is applied to semantically encode the input text. Finally, the label with the highest probability is output through the Conditional Random Field (CRF) layer to obtain each character’s category. This model uses the Bidirectional Encoder Representations from Transformers (BERT) pre-trained language model to replace the static word vectors trained in the traditional way. In comparison, the BERT pre-trained language model can dynamically generate semantic vectors according to the context of words, which improves the representation ability of word vectors. The experimental results prove that the model proposed in this paper has achieved excellent results in the task of named entity recognition in the field of historical culture. Compared with the existing named entity identification methods, the precision rate, recall rate, and $$F_1$$ F 1 value have been significantly improved.

Funder

Economic and social development research project of Liaoning province in 2021

Research Innovation Team Grant Project

Graduate Research and Innovation Projects of Jiangsu Province

Publisher

Springer Science and Business Media LLC

Subject

Computational Mathematics,General Computer Science

Reference29 articles.

1. Zhang, M., Geng, G., Chen, J.: Semi-Supervised Bidirectional Long Short-Term Memory and Conditional Random Fields Model for named-entity recognition using embeddings from language models representations. Entropy 22, 252 (2020)

2. Wang, Z.N., Jiang, M., Gao, J.L., CHEN, Y.X.: Chinese named entity recognition method based on BERT[J]. Comput. Sci. 46(11A), 138–142 (2019)

3. Li, L.F., Yang, J.Q., Li, B.S., Du, Y.X., Hu, W.J.: Named entity recognition of Chinese EMR Based on Bert [J]. J. Inner Mongolia Univ. Sci. Technol. 39(01), 71–77 (2020)

4. Marcińczuk, M.: Automatic construction of complex features in conditional random fields for named entities recognition. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, Hissar, Bulgaria, 7–9 (2015)

5. Chen, S.D., Ouyang, X.Y.: Overview of named entity recognition technology [J/OL]. Radio Commun. Technol., 1–11 [2020-05-10][2020-05-19]. http://kns.cnki.net/kcms/detail/13.1099.TN.20200414.1436.002.html

Cited by 10 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Perspective of Digital Humanities On Person Names in Chinese Pre-Qin Classic;Journal on Computing and Cultural Heritage;2024-04-17

2. Multi-feature word embedding based named entity recognition in classical Chinese texts;International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2023);2024-03-27

3. A Quantum-Like Tensor Compression Sentence Representation Based on Constraint Functions for Semantics Analysis;International Journal of Computational Intelligence Systems;2024-01-03

4. Research on Named Entity Recognition in Traditional Chinese Medicine Herbal Texts;2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM);2023-12-05

5. Multi-Meta Information Embedding Enhanced BERT for Chinese Mechanics Entity Recognition;Applied Sciences;2023-10-15

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3