Grammar Correction for Multiple Errors in Chinese Based on Prompt Templates
-
Published:2023-07-31
Issue:15
Volume:13
Page:8858
-
ISSN:2076-3417
-
Container-title:Applied Sciences
-
language:en
-
Short-container-title:Applied Sciences
Author:
Wang Zhici12, Yu Qiancheng12, Wang Jinyun3, Hu Zhiyong12, Wang Aoqiang12
Affiliation:
1. School of Computer Science and Engineering, North Minzu University, Yinchuan 750021, China 2. The Key Laboratory of Images and Graphics Intelligent Processing of State Ethnic Affairs Commission, North Minzu University, Yinchuan 750021, China 3. School of Business, North Minzu University, Yinchuan 750021, China
Abstract
Grammar error correction (GEC) is a crucial task in the field of Natural Language Processing (NLP). Its objective is to automatically detect and rectify grammatical mistakes in sentences, which possesses immense application research value. Currently, mainstream grammar-correction methods primarily rely on sequence labeling and text generation, which are two kinds of end-to-end methods. These methods have shown exemplary performance in areas with low error density but often fail to deliver satisfactory results in high-error density situations where multiple errors exist in a single sentence. Consequently, these methods tend to overcorrect correct words, leading to a high rate of false positives. To address this issue, we researched the specific characteristics of the Chinese grammar error correction (CGEC) task in high-error density situations. We proposed a grammar-correction method based on prompt templates. Firstly, we proposed a strategy for constructing prompt templates suitable for CGEC. This strategy transforms the CGEC task into a masked fill-in-the-blank task compatible with the masked language model BERT. Secondly, we proposed a method for dynamically updating templates, which incorporates already corrected errors into the template through dynamic updates to improve the template quality. Moreover, we used the phonetic and graphical resemblance knowledge from the confusion set as guiding information. By combining this with BERT’s prediction results, the model can more accurately select the correct characters, significantly enhancing the accuracy of the model’s prediction correction results. Our methods were validated through experiments on a public grammar-correction dataset. The results indicate that our method achieves higher correction performance and lower false correction rates in high-error density scenarios.
Funder
The 2022 University Research Platform “Digital Agriculture Empowering Ningxia Rural Revitalization Innovation Team” of North Minzu University The major key project of school-enterprise joint innovation in Yinchuan 2022 2022 Ningxia Autonomous Region Key Research and Development Plan (Talent Introduction Special) Project
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference22 articles.
1. Chinese grammatical error diagnosis based on sequence tagging methods;Wang;J. Phys. Conf. Ser.,2021 2. Cui, Y., Che, W., Liu, T., Qin, B., Wang, S., and Hu, G. (2020, January 16–20). Revisiting Pre-Trained Models for Chinese Natural Language Processing. Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Online. 3. Awasthi, A., Sarawagi, S., Coyal, R., Ghosh, S., and Piratla, V. (2019, January 3–7). Parallel iterative edit models for local sequence transduction. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China. 4. Omelianchuk, K., Atrasevych, V., Chernodub, A., and Skurzhanskyi, O. (2020, January 10). Gector-grammatical error correction: Tag, not rewrite. Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, Seattle, WA, USA. 5. Deng, L., Chen, Z., Lei, G., Xin, C., Xiong, X.Z., Rong, H.Q., and Dong, J.P. (2020, January 4). BERT enhanced neural machine translation and sequence tagging model for Chinese grammatical error diagnosis. Proceedings of the 6th workshop on Natural Language Processing Techniques for Educational Applications, Suzhou, China.
|
|