Arabic Authorship Attribution Using Synthetic Minority Over-Sampling Technique and Principal Components Analysis for Imbalanced Documents-Reference-Cited by-同舟云学术

Arabic Authorship Attribution Using Synthetic Minority Over-Sampling Technique and Principal Components Analysis for Imbalanced Documents

Published:2021-10 Issue:4 Volume:15 Page:1-17
ISSN:1557-3958
Container-title:International Journal of Cognitive Informatics and Natural Intelligence
language:en
Short-container-title:

Author:

Hadjadj Hassina¹^ORCID,Sayoud Halim¹

Affiliation:

1. USTHB University, Algeria

Abstract

Nowadays, dealing with imbalanced data represents a great challenge in data mining as well as in machine learning task. In this investigation, we are interested in the problem of class imbalance in Authorship Attribution (AA) task, with specific application on Arabic text data. This article proposes a new hybrid approach based on Principal Components Analysis (PCA) and Synthetic Minority Over-sampling Technique (SMOTE), which considerably improve the performances of authorship attribution on imbalanced data. The used dataset contains 7 Arabic books written by 7 different scholars, which are segmented into text segments of the same size, with an average length of 2900 words per text. The obtained results of our experiments show that the proposed approach using the SMO-SVM classifier, presents high performance in terms of authorship attribution accuracy (100%), especially with starting character-bigrams. In addition, the proposed method appears quite interesting by improving the AA performances in imbalanced datasets, mainly with function words.

Publisher

IGI Global

Subject

Artificial Intelligence,Human-Computer Interaction,Software

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Transformer-Based Approach to Authorship Attribution in Classical Arabic Texts;Applied Sciences;2023-06-18

2. A review on authorship attribution in text mining;WIREs Computational Statistics;2022-04-22

3. Author verification of Nahj Al-Balagha;Digital Scholarship in the Humanities;2022-01-20