Abstract
A frequency dictionary of printed Arabic text is essential for natural language processing. It includes 1,251 XML files of Arabic documents collected from ten newspapers and magazines from different countries and created as the PATD database. A total of 2,344 articles were created with various structures: open vocabulary, multi-font, multi-size, and multi-style text. From these articles, 1,102,078 tokens, 19,926 sentences, and 1,000,000 words were extracted. This dictionary provides detailed information for each word, including English equivalents, usage statistics, usage distribution, and the most widely used terms. A thematic vocabulary list of the top words on various topics is also provided. This frequency dictionary is a useful resource of modern Arabic vocabulary for various specialists, students, and learners. The frequency dictionary is freely available to interested researchers on the webpage.
Publisher
Institute of Slavic Studies Polish Academy of Sciences
Reference25 articles.
1. Abdelali, A. (2003). Localization in modern standard Arabic. Journal of the American Society for Information Science and Technology, 55(1), 23–28. https://doi.org/10.1002/asi.10340
2. Abdelali, A., Cowie, J., & Soliman, H. S. (2005). Building a modern standard Arabic corpus: Paper presented at the Computational Modeling of Lexical Acquisition Workshop, Croatia, 25th to 28th of July. https://www.researchgate.net/publication/228958341_Building_a_modern_standard_Arabic_corpus
3. Abdul Razak, Z. R. (2011). Modern media Arabic: A study of word frequency in world affairs and sports sections in Arabic newspapers [Doctoral dissertation, University of Birmingham]. https://etheses.bham.ac.uk/id/eprint/2882/
4. Abuleil, S., & Evans, M. (2002). Extracting an Arabic lexicon from Arabic newspaper text. Journal of Computer and the Humanities, 36({2), 191–221. https://doi.org/10.1023/A:1014368121689
5. Adham, M. A. A., al-Angelo, A. M., Agresti, A. N. D., & Finlay, B. (2009). Statistical methods for the social sciences (4th ed.). Pearson Education.