Affiliation:
1. Department of Informatics, King's College London, United Kingdom, Aldwych, London, UK
2. Department of Engineering, King's College London1, Strand Campus, Strand, United Kingdom, London
Abstract
Authorship identification is the process of extracting and analysing the writing styles of authors to identify the authorship. From the writing style, the author and his/her different characteristics can be recognised, which is very useful in digital forensics and cyber investigations. In the literature, authorship identification tasks were addressed on both long and short documents and performed on different languages, such as English, Arabic, Chinese, and Greek. This survey has reviewed the authorship identification tasks for the Arabic language to contribute to this area of research by exploring Arabic language performance and challenges. A total of 27 prominent Arabic studies of each authorship identification domain were reviewed considering the used data, selected features, utilised methods, and results. After a review of the various studies, it was concluded that the results of authorship identification tasks vary based on mostly the selected features and used dataset. Furthermore, the effective features differ from one dataset to another based on the various types of the Arabic language. However, all authorship identification tasks involving the Arabic language face considerable challenges with data pre-processing due to the challenging Arabic concatenative morphology.
Publisher
Association for Computing Machinery (ACM)
Reference63 articles.
1. Applying Authorship Analysis to Extremist-Group Web Forum Messages
2. Modern Standard Arabic Grammar Automatic Extraction from Penn 1 Arabic Treebank Using Natural Language Toolkit
3. M. Abdul-Mageed C. Zhang A. Hashemi and E. M. B. Nagoudi. 2019. AraNet: A deep learning toolkit for Arabic social media. Proceedings 4th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) . Retrieved from http://arxiv.org/abs/1912.13072.
4. A. Abdelali, K. Darwish, N. Durrani, and H. Mubarak. 2016. Farasa: A fast and furious segmenter for Arabic. Proceedings, 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, 11–16.
5. Author Attribution of Arabic Texts Using Extended Probabilistic Context Free Grammar Language Model
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献