Author:
AZMI AQIL M.,ALMAJED REHAM S.
Abstract
AbstractIn Modern Standard Arabic texts are typically written without diacritical markings. The diacritics are important to clarify the sense and meaning of words. Lack of these markings may lead to ambiguity even for the natives. Often the natives successfully disambiguate the meaning through the context; however, many Arabic applications, such as machine translation, text-to-speech, and information retrieval, are vulnerable due to lack of diacritics. The process of automatically restoring diacritical marks is called diacritization or diacritic restoration. In this paper we discuss the properties of the Arabic language and the issues that are related to the lack of the diacritical marking. It will be followed by a survey of the recent algorithms that were developed to solve the diacritization problem. We also look into the future trend for researchers working in this area.
Publisher
Cambridge University Press (CUP)
Subject
Artificial Intelligence,Linguistics and Language,Language and Linguistics,Software
Reference47 articles.
1. Zerrouki T. 2011. Tashkeela: Arabic vocalized text corpus. Retreived June 9, 2013, from http://aracorpus.e3rab.com/.
2. Wikipedia. n.d. Danish and Norwegian alphabet. Retreived March 17, 2013, from http://en.wikipedia.org/wiki/Danish_and_Norwegian_alphabet.
3. A hybrid approach for building Arabic diacritizer
4. Cross-dialectal data sharing for acoustic modeling in Arabic speech recognition
5. A Stochastic Arabic Diacritizer Based on a Hybrid of Factorized and Unfactorized Textual Features
Cited by
51 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献