Affiliation:
1. SIGL Laboratory, ENSATe, Abdelmalek Essaadi University, Tetouan, Morocco
Abstract
Arabic diacritization is the task of restoring diacritics or vowels for Arabic texts considering that they are mostly written without them. This task, when automated, shows better results for some natural language processing tasks; hence, it is necessary for the field of Arabic language processing. In this paper, we are going to present a comparative study of some automatic diacritization systems. One uses a variant of the hidden Markov model. The other one is a pipeline, which includes a Long Short-Term Memory deep learning model, a rule-based correction component, and a statistical-based component. Additionally, we are proposing some modifications to those systems. We have trained and tested those systems in the same benchmark dataset based on the same evaluation metrics proposed in previous work. The best system results are 9.42% and 22.82% for the diacritic error rate DER and the word error rate WER, respectively.
Subject
Human-Computer Interaction
Reference22 articles.
1. Web Design for Dyslexics: Accessibility of Arabic Content
2. Arabic Text Diacritization Using Deep Neural Networks
3. Deep diacritization: efficient hierarchical recurrence for improved Arabic diacritization;B. AlKhamissi,2020
4. Statistical methods for automatic diacritization of Arabic text;M. Elshafei
5. An HMM approach to vowel restoration in Arabic and Hebrew