Named Entity Recognition in Government Audit Texts Based on ChineseBERT and Character-Word Fusion-Reference-Cited by-同舟云学术

Named Entity Recognition in Government Audit Texts Based on ChineseBERT and Character-Word Fusion

Published:2024-02-09 Issue:4 Volume:14 Page:1425
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Huang Baohua¹^ORCID,Lin Yunjie¹,Pang Si¹,Fu Long¹

Affiliation:

1. School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China

Abstract

Named entity recognition of government audit text is a key task of intelligent auditing. Aiming at the problems of scarcity of corpus in the field of governmental auditing, insufficient utilization of traditional character vector word-level information features, and insufficient capturing of auditing entity features, this study builds its own dataset in the field of auditing and proposes the model CW-CBGC for recognizing named entities in governmental auditing text based on ChineseBERT and character-word fusion. First, the ChineseBERT pre-training model is used to extract the character vector that integrates the features of glyph and pinyin, combining with word vectors dynamically constructed by the BERT pre-training model; then, the sequences of character-word fusion vectors are input into the bi-directional gated recurrent neural network (BiGRU) to learn the textual features. Finally, the global optimal sequence label is generated by Conditional Random Field (CRF), and the GHM classification loss function is used in the model training to solve the problem of error evaluation under the conditions of noisy entities and unbalanced number of entities. The F1 value of this study’s model on the audit dataset is 97.23%, which is 3.64% higher than the baseline model’s F1 value; the F1 value of the model on the public dataset Resume is 96.26%, which is 0.73–2.78% higher than the mainstream model. The experimental results show that the model proposed in this paper can effectively recognize the entities in government audit texts and has certain generalization ability.

Funder

National Natural Science Foundation of China

Publisher

MDPI AG

Link

https://www.mdpi.com/2076-3417/14/4/1425/pdf

Reference30 articles.

1. On State Audit Change and Development in the Age of Artificial Intelligence;Jiang;Financ. Account. Mon.,2022

2. A Review of Research on Named Entity Recognition Methods;Li;J. Front. Comput. Sci. Technol.,2022

3. Grishman, R., and Sundheim, B.M. (1996, January 5–9). Message Understanding Conference-6: A brief history. Proceedings of the 16th Conference on Computational Linguistics, Copenhagen, Denmark.

4. Enhancing HMM-based biomedical named entity recognition by studying special phenomena;Zhang;J. Biomed. Inform.,2004

5. Lafferty, J., Mccallum, A., and Pereira, F. (July, January 28). Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proceedings of the 18th International Conference on Machine Learning, Williamstown, MA, USA.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A New Chinese Named Entity Recognition Method for Pig Disease Domain Based on Lexicon-Enhanced BERT and Contrastive Learning;Applied Sciences;2024-08-08