Improving Low-Resource Chinese Named Entity Recognition Using Bidirectional Encoder Representation from Transformers and Lexicon Adapter

Author:

Dang Xiaochao1,Wang Li2,Dong Xiaohui2ORCID,Li Fenfang2,Deng Han2

Affiliation:

1. College of Computer Science & Engineering, Northwest Normal University, Lanzhou 730070, China

2. Gansu Province Internet of Things Engineering Research Centre, Northwest Normal University, Lanzhou 730070, China

Abstract

Due to their individual advantages, the integration of lexicon information and pre-trained models like BERT has been widely adopted in Chinese sequence labeling tasks. However, given their high demand for training data, efforts have been made to enhance their performance in low-resource scenarios. Currently, certain specialized domains, such as agriculture, the industrial sector, and the metallurgical industry, suffer from a scarcity of data. Consequently, there is a dearth of effective models for entity relationship recognition when faced with limited data availability. Inspired by this, we constructed a suitable small balanced dataset and proposed a based-domain-NER model. Firstly, we construct a domain-specific dictionary based on mine hoist equipment and fault text and generate a dictionary tree to obtain word vector information. Secondly, we use a Lexicon Adapter to obtain the vector information of the domain-specific dictionary feature words matched using the characters and calculate the weights between their word vectors, integrating position encoding to enhance the positional information of the word vectors. Finally, we incorporate word vector information into the feature extraction layer to enhance the boundary information of domain entities and mitigate the semantic loss problem caused via using only character feature representation. Experimental results on a manually annotated dataset of mine hoist fault texts show that this method outperforms BiLSTM, BiLSTM-CRF, BERT, BERT-BiLSTM-CRF, and LEBERT, effectively improving the accuracy of named entity recognition (NER) for mine hoist faults.

Funder

National Natural Science Foundation of China

Industrial Support Foundations of Gansu

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Reference42 articles.

1. A Survey on Deep Learning for Named Entity Recognition;Li;IEEE Trans. Knowl. Data Eng.,2020

2. Hedderich, M.A., Lange, L., Adel, H., Strötgen, J., and Klakow, D. (2021, January 6–11). A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios. Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, Online.

3. Liang, Y., Meng, F., Zhou, C., Xu, J., Chen, Y., Su, J., and Zhou, J. (2022). A variational hierarchical model for neural cross-lingual summarization. arXiv.

4. Xie, J., Yang, Z., Neubig, G., Smith, N.A., and Carbonell, J. (November, January 31). Neural Cross-Lingual Named Entity Recognition with Minimal Resources. Proceedings of the EMNLP, Brussels, Belgium.

5. Huang, Z., Xu, W., and Yu, K. (2015). Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv.

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3