Affiliation:
1. Department of Computer Science, Lüliang University, Lüliang 033000, China
2. Center for Information and Modern Education Technology, Lüliang University, Lüliang 033000, China
Abstract
The named entity recognition (NER) in the field of public interest litigation can assist prosecutors in handling cases and provide them with specific entities in making legal documents. Previously, the context-free deep learning model is used to catch the semantic comprehension, in which the static word vector is obtained without considering the context. Moreover, this kind of method relies on word segmentation technology and cannot solve the error transmission caused by word segmentation inaccuracy, which brings great challenges to the Chinese NER task. To tackle the above issues, an entity recognition method based on pretraining is proposed. First, based on the basic entities, three legal ontologies, NERP, NERCGP, and NERFPP are developed to expand the named entity recognition corpus in the judicial field. Second, a variant of the pretrained model BERT (Bidirectional Encoder Representations from Transformer) called BERT-WWM (whole-word mask)-EXT(extra) is introduced to catch the text character-level word vector hierarchical and the context bidirectional features, which effectively solve the problem of task boundary division of named entities. Then, to further improve the model recognition effect, the general knowledge learned from the pretrained model is used to fit the downstream neural network BiLSTM (bi-long short-term memory), and at the end of the architecture, CRF (conditional random fields) is introduced to restrict the label relationship. Finally, the experimental results show that the proposed method is more effective than the existing methods, which reach 96% and 90% in the F1 index of NER and NERP entities, respectively.
Funder
Doctoral Natural Science Foundation Project of Lüliang College
Subject
Computer Science Applications,Software
Reference36 articles.
1. Data mining for smart legal systems
2. Named entity recognition in the legal domain for ontology population;M. Bruckschen
3. Identifying Chinese names in unrestricted texts;M. S. Sun;Journal of Chinese Information Processing,1995
4. Enhanced sequence labeling based on latent variable conditional random fields
5. Named Entity Recognition in Hindi Using Hidden Markov Model
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. LeArNER;Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law;2023-06-19