MLSL-Spell: Chinese Spelling Check Based on Multi-Label Annotation-Reference-Cited by-同舟云学术

MLSL-Spell: Chinese Spelling Check Based on Multi-Label Annotation

Published:2024-03-18 Issue:6 Volume:14 Page:2541
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Jiang Liming¹^ORCID,Shen Xingfa¹,Zhao Qingbiao¹,Yao Jian¹

Affiliation:

1. School of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, China

Abstract

Chinese spelling errors are commonplace in our daily lives, which might be caused by input methods, optical character recognition, or speech recognition. Due to Chinese characters’ phonetic and visual similarities, the Chinese spelling check (CSC) is a very challenging task. However, the existing CSC solutions cannot achieve good spelling check performance since they often fail to fully extract the contextual information and Pinyin information. In this paper, we propose a novel CSC framework based on multi-label annotation (MLSL-Spell), consisting of two basic phases: spelling detection and correction. In the spelling detection phase, MLSL-Spell uses the fusion vectors of both character-based pre-trained context vectors and Pinyin vectors and adopts the sequence labeling method to explicitly label the type of misspelled characters. In the spelling correction phase, MLSL-Spell uses Masked Language Mode (MLM) model to generate candidate characters, then performs corresponding screenings according to the error types, and finally screens out the correct characters through the XGBoost classifier. Experiments show that the MLSL-Spell model outperforms the benchmark model. On SIGHAN 2013 dataset, the spelling detection F1 score of MLSL-Spell is 18.3% higher than that of the pointer network (PN) model, and the spelling correction F1 score is 10.9% higher. On SIGHAN 2015 dataset, the spelling detection F1 score of MLSL-Spell is 11% higher than that of Bert and 15.7% higher than that of the PN model. And the spelling correction F1 of MLSL-Spell score is 6.8% higher than that of PN model.

Funder

the “Pioneer” and “Leading Goose” R&D Program of Zhejiang

the National Natural Science Foundation of China

the Zhejiang Provincial Natural Science Foundation

Publisher

MDPI AG

Link

https://www.mdpi.com/2076-3417/14/6/2541/pdf

Reference42 articles.

1. Liu, C.L., Lai, M.H., Chuang, Y.H., and Lee, C.Y. (2010). Proceedings 2010: Posters, Coling 2010 Organizing Committee.

2. Liu, X., Cheng, K., Luo, Y., Duh, K., and Matsumoto, Y. (2013, January 14–18). A hybrid Chinese spelling correction using language model and statistical machine translation with reranking. Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing, Nagoya, Japan.

3. Support vector machines;Hearst;IEEE Intell. Syst. Their Appl.,1998

4. Wang, D., Song, Y., Li, J., Han, J., and Zhang, H. (November, January 31). A hybrid approach to automatic corpus generation for Chinese spelling check. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium.

5. Burstein, J., Doran, C., and Solorio, T. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. BERT-Inspired Progressive Stacking to Enhance Spelling Correction in Bengali Text;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-08-08