Affiliation:
1. Shanghai Jiao Tong University, Shanghai, China
2. Huawei Technologies Co. Ltd., Shanghai, China
3. Baosteel Research Institude, Shanghai, China
Abstract
Spelling check for Chinese has more challenging difficulties than that for other languages. A hybrid model for Chinese spelling check is presented in this article. The hybrid model consists of three components: one graph-based model for generic errors and two independently trained models for specific errors. In the graph model, a directed acyclic graph is generated for each sentence, and the single-source shortest-path algorithm is performed on the graph to detect and correct general spelling errors at the same time. Prior to that, two types of errors over functional words (characters) are first solved by conditional random fields: the confusion of “在” (
at
) (pinyin is
zai
in Chinese), “再” (
again
,
more
,
then
) (pinyin:
zai
) and “的” (
of
) (pinyin:
de
), “地” (-
ly
, adverb-forming particle) (pinyin:
de
), and “得” (
so that
,
have to
) (pinyin:
de
). Finally, a rule-based model is exploited to distinguish pronoun usage confusion: “她” (
she
) (pinyin:
ta
), “他” (
he
) (pinyin:
ta
), and some other common collocation errors. The proposed model is evaluated on the standard datasets released by the SIGHAN Bake-off shared tasks, giving state-of-the-art results.
Funder
Major Basic Research Program of Shanghai Science and Technology Committee
National Basic Research Program of China
National Natural Science Foundation of China
Cai Yuanpei Program
Art and Science Interdisciplinary Funds of Shanghai Jiao Tong University
Key Project of the National Society Science Foundation of China
Publisher
Association for Computing Machinery (ACM)
Cited by
15 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Automatic Spelling Corrector for Yorùbá Language Using Edit Distance and N-Gram Language Models;2024 International Conference on Science, Engineering and Business for Driving Sustainable Development Goals (SEB4SDG);2024-04-02
2. Multi-stage Legal Instrument Grammatical Error Correction via Seq2Edit and Data Augmentation;Proceedings of the 2023 6th International Conference on Machine Learning and Natural Language Processing;2023-12-27
3. How to Progressively Build Thai Spelling Correction Systems?;IEEE Access;2023
4. Research on Chinese Proofreading Technology with Feed-Back Mechanism;Computer Science and Application;2023
5. MCSCSet: A Specialist-annotated Dataset for Medical-domain Chinese Spelling Correction;Proceedings of the 31st ACM International Conference on Information & Knowledge Management;2022-10-17