New Language Models for Spelling Correction
-
Published:2022
Issue:6
Volume:19
Page:
-
ISSN:2309-4524
-
Container-title:The International Arab Journal of Information Technology
-
language:en
-
Short-container-title:IAJIT
Author:
Laaroussi Saida,Aouragh Si Lhoussain,Yousfi Abdellah,Nejja Mohamed,Geddah Hicham,Ouatik El Alaoui Said
Abstract
Correcting spelling errors based on the context is a fairly significant problem in Natural Language Processing (NLP) applications. The majority of the work carried out to introduce the context into the process of spelling correction uses the n-gram language models. However, these models fail in several cases to give adequate probabilities for the suggested solutions of a misspelled word in a given context. To resolve this issue, we propose two new language models inspired by stochastic language models combined with edit distance. A first phase consists in finding the words of the lexicon orthographically close to the erroneous word and a second phase consists in ranking and limiting these suggestions. We have applied the new approach to Arabic language taking into account its specificity of having strong contextual connections between distant words in a sentence. To evaluate our approach, we have developed textual data processing applications, namely the extraction of distant transition dictionaries. The correction accuracy obtained exceeds 98% for the first 10 suggestions. Our approach has the advantage of simplifying the parameters to be estimated with a higher correction accuracy compared to n-gram language models. Hence the need to use such an approach.
Publisher
Zarqa University
Subject
General Computer Science
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Bi-directional GRU-Based Approach for Multi-Class Text Error Identification System;2024 IEEE 9th International Conference for Convergence in Technology (I2CT);2024-04-05