Survey of Authorship Identification Tasks on Arabic Texts-Reference-Cited by-同舟云学术

Survey of Authorship Identification Tasks on Arabic Texts

Published:2023-04-12 Issue:4 Volume:22 Page:1-24
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Alqahtani Fatimah¹^ORCID,Dohler Mischa²^ORCID

Affiliation:

1. Department of Informatics, King's College London, United Kingdom, Aldwych, London, UK

2. Department of Engineering, King's College London1, Strand Campus, Strand, United Kingdom, London

Abstract

Authorship identification is the process of extracting and analysing the writing styles of authors to identify the authorship. From the writing style, the author and his/her different characteristics can be recognised, which is very useful in digital forensics and cyber investigations. In the literature, authorship identification tasks were addressed on both long and short documents and performed on different languages, such as English, Arabic, Chinese, and Greek. This survey has reviewed the authorship identification tasks for the Arabic language to contribute to this area of research by exploring Arabic language performance and challenges. A total of 27 prominent Arabic studies of each authorship identification domain were reviewed considering the used data, selected features, utilised methods, and results. After a review of the various studies, it was concluded that the results of authorship identification tasks vary based on mostly the selected features and used dataset. Furthermore, the effective features differ from one dataset to another based on the various types of the Arabic language. However, all authorship identification tasks involving the Arabic language face considerable challenges with data pre-processing due to the challenging Arabic concatenative morphology.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3564156

Reference63 articles.

1. Applying Authorship Analysis to Extremist-Group Web Forum Messages

2. Modern Standard Arabic Grammar Automatic Extraction from Penn 1 Arabic Treebank Using Natural Language Toolkit

3. M. Abdul-Mageed C. Zhang A. Hashemi and E. M. B. Nagoudi. 2019. AraNet: A deep learning toolkit for Arabic social media. Proceedings 4th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) . Retrieved from http://arxiv.org/abs/1912.13072.

4. A. Abdelali, K. Darwish, N. Durrani, and H. Mubarak. 2016. Farasa: A fast and furious segmenter for Arabic. Proceedings, 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, 11–16.

5. Author Attribution of Arabic Texts Using Extended Probabilistic Context Free Grammar Language Model

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Innovative Approaches to Arabic Author Identification: A Comprehensive Evaluation of Classical and Deep Learning Approaches;2024 Intelligent Methods, Systems, and Applications (IMSA);2024-07-13

2. AraXLM: New XLM-RoBERTa Based Method for Plagiarism Detection in Arabic Text;Lecture Notes in Networks and Systems;2024

3. Active Learning for News Article’s Authorship Identification;IEEE Access;2023