DIACRITIC-AWARE YORÙBÁ SPELL CHECKER

Author:

Asahiah Franklin Oladiipo,Onifade Mary Taiwo,Adegunlehin Abayomi Emmanuel,Asahiah Adekemisola Olufunmilayo,Amoo Adekemi Olawunmi

Abstract

Spell checking and correction is still in infancy for Yorùbá language, and existing tools cannot be applied directly to address the problem because Yorùbá language requires extensive use of diacritics for marking phonemes and tone. We addressed this problem by collecting data from on-line sources and from optical character recognition of hard copy of books. The features relevant to spell checking and correction in this language that marks tones (and underdot) were identified through the review of existing spell checking solutions, analysis of the data collected and consultation with relevant Yorùbá Linguistics textbooks. A conceptual model was formulated as a parallel combination of a unigram language model and a language diacritic model to form a dictionary sub-model that is used by Error Detection and Candidate Generation modules. The candidate generation module was implemented as an inverse Levensthein edit-distance algorithm. The system was evaluated using Detection Accuracy (calculated from Precision and Recall) and Suggestion Accuracy (SA) as metrics.Our experimental setups compared the performance of the component subsystems when used alone with the their combination into a unified model. The detection accuracies for each of the models were 93.23%, 94.10% and 95.01% respectively while their suggestion accuracies were 26.94%, 72.10% and 65.89% respectively. In relation to the size of training corpus, the unified model was able to achieve a increase of 1.83% in detection accuracy and 5.27% in suggestion accuracy for 70% increase in size of corpus. The results indicated that each of the sub-models in the dictionary played different roles while the increase in training data does not give a linear proportional increase in performance of the spell checker. The study also showed that spell checking a Yorùbá text was better when attention is paid to the diacritical aspect of the language

Publisher

AGHU University of Science and Technology Press

Subject

Artificial Intelligence,Computational Theory and Mathematics,Computer Graphics and Computer-Aided Design,Computer Networks and Communications,Computer Vision and Pattern Recognition,Modeling and Simulation,Computer Science (miscellaneous)

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3