Abstract
Deep learning has emerged as a new area of machine learning research. It is an approach that can learn features and hierarchical representation purely from data and has been successfully applied to several fields such as images, sounds, text and motion. The techniques developed from deep learning research have already been impacting the research on Natural Language Processing (NLP). Arabic diacritics are vital components of Arabic text that remove ambiguity from words and reinforce the meaning of the text. In this paper, a Deep Belief Network (DBN) is used as a diacritizer for Arabic text. DBN is an algorithm among deep learning that has recently proved to be very effective for a variety of machine learning problems. We evaluate the use of DBNs as classifiers in automatic Arabic text diacritization. The DBN was trained to individually classify each input letter with the corresponding diacritized version. Experiments were conducted using two benchmark datasets, the LDC ATB3 and Tashkeela. Our best settings achieve a DER and WER of 2.21% and 6.73%, receptively, on the ATB3 benchmark with an improvement of 26% over the best published results. On the Tashkeela benchmark, our system continues to achieve high accuracy with a DER of 1.79% and 14% improvement.
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference89 articles.
1. Modern Arabic Sociolinguistics: Diglossia, Variation, Codeswitching, Attitudes and Identity;Albirini,2016
2. Arabic Natural Language Processing
3. Towards an Optimal Solution to Lemmatization in Arabic
4. A hybrid approach for Arabic diacritization;Said,2013
5. A Survey and Comparative Study of Arabic Diacritization Tools;Hamed;J. Lang. Technol. Comput. Linguist.,2017
Cited by
17 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献