DIACRITIC-AWARE YORÙBÁ SPELL CHECKER
-
Published:2023-03-06
Issue:1
Volume:24
Page:
-
ISSN:2300-7036
-
Container-title:Computer Science
-
language:
-
Short-container-title:csci
Author:
Asahiah Franklin Oladiipo,Onifade Mary Taiwo,Adegunlehin Abayomi Emmanuel,Asahiah Adekemisola Olufunmilayo,Amoo Adekemi Olawunmi
Abstract
Spell checking and correction is still in infancy for Yorùbá language, and existing tools cannot be applied directly to address the problem because Yorùbá language requires extensive use of diacritics for marking phonemes and tone. We addressed this problem by collecting data from on-line sources and from optical character recognition of hard copy of books. The features relevant to spell checking and correction in this language that marks tones (and underdot) were identified through the review of existing spell checking solutions, analysis of the data collected and consultation with relevant Yorùbá Linguistics textbooks. A conceptual model was formulated as a parallel combination of a unigram language model and a language diacritic model to form a dictionary sub-model that is used by Error Detection and Candidate Generation modules. The candidate generation module was implemented as an inverse Levensthein edit-distance algorithm.
The system was evaluated using Detection Accuracy (calculated from Precision and Recall) and Suggestion Accuracy (SA) as metrics.Our experimental setups compared the performance of the component subsystems when used alone with the their combination into a unified model. The detection accuracies for each of the models were 93.23%, 94.10% and 95.01% respectively while their suggestion accuracies were 26.94%, 72.10% and 65.89% respectively. In relation to the size of training corpus, the unified model was able to achieve a increase of 1.83% in detection accuracy and 5.27% in suggestion accuracy for 70% increase in size of corpus. The results indicated that each of the sub-models in the dictionary played different roles while the increase in training data does not give a linear proportional increase in performance of the spell checker. The study also showed that spell checking a Yorùbá text was better when attention is paid to the diacritical aspect of the language
Publisher
AGHU University of Science and Technology Press
Subject
Artificial Intelligence,Computational Theory and Mathematics,Computer Graphics and Computer-Aided Design,Computer Networks and Communications,Computer Vision and Pattern Recognition,Modeling and Simulation,Computer Science (miscellaneous)
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献