Abstract
Chinese Spelling Error Correction is a hot subject in the field of natural language processing. Researchers have already produced many great solutions, from the initial rule-based solution to the current deep learning method. At present, SpellGCN, proposed by Alibaba’s team, achieves the best results of which character level precision over SIGHAN2013 is 98.4%. However, when we apply this algorithm to practical error correction tasks, it produces many false error correction results. We believe that this is because the corpus used for model training contains significantly more errors than the text used for model correcting. In response to this problem, we propose performing a post-processing operation on the error correction tasks. We employ the initial model’s output as a candidate character, obtain various features of the character itself and its context, and then use a classification model to filter the initial model’s false error correction results. The post-processing idea introduced in this paper can apply to most Chinese Spelling Error Correction models to improve their performance over practical error correction tasks.
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference38 articles.
1. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding;Devlin;arXiv,2018
2. Spelling Error Correction with Soft-Masked BERT;Zhang;arXiv,2020
3. SpellGCN: Incorporating Phonological and Visual Similarities into Language Models for Chinese Spelling Check;Cheng;arXiv,2020
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献