Affiliation:
1. School of Computer Science and Technology, Guangdong University of Foreign Studies, Guangzhou, Guangdong, China
Abstract
Grammatical Error Correction (GEC) is a challenge in Natural Language Processing research. Although many researchers have been focusing on GEC in universal languages such as English or Chinese, few studies focus on Indonesian, which is a low-resource language. In this article, we proposed a GEC framework that has the potential to be a baseline method for Indonesian GEC tasks. This framework treats GEC as a multi-classification task. It integrates different language embedding models and deep learning models to correct 10 types of Part of Speech (POS) error in Indonesian text. In addition, we constructed an Indonesian corpus that can be utilized as an evaluation dataset for Indonesian GEC research. Our framework was evaluated on this dataset. Results showed that the Long Short-Term Memory model based on word-embedding achieved the best performance. Its overall macro-average F
0.5
in correcting 10 POS error types reached 0.551. Results also showed that the framework can be trained on a low-resource dataset.
Funder
National Natural Science Foundation of China
Major Projects of Guangdong Education Department for Foundation Research and Applied Research
Special Funds for the Cultivation of Guangdong College Student's Scientific and Technological Innovation
Publisher
Association for Computing Machinery (ACM)
Reference29 articles.
1. Comparative study of rule based approach for grammar checker;Baviskar Swapnali Deelip;Int. J. Manage. Technol. Eng.,2019
2. Ethnologue: Languages of the world of Asia;Simons Gary F.;SIL Int. Publ.,2005
3. Long Short-Term Memory
Cited by
14 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献