Authorship Identification of a Russian-Language Text Using Support Vector Machine and Deep Neural Networks-Reference-Cited by-同舟云学术

Authorship Identification of a Russian-Language Text Using Support Vector Machine and Deep Neural Networks

Published:2020-12-25 Issue:1 Volume:13 Page:3
ISSN:1999-5903
Container-title:Future Internet
language:en
Short-container-title:Future Internet

Author:

Romanov Aleksandr^ORCID,Kurtukova Anna,Shelupanov Alexander,Fedotova Anastasia,Goncharov Valery

Abstract

The article explores approaches to determining the author of a natural language text and the advantages and disadvantages of these approaches. The importance of the considered problem is due to the active digitalization of society and reassignment of most parts of the life activities online. Text authorship methods are particularly useful for information security and forensics. For example, such methods can be used to identify authors of suicide notes, and other texts are subjected to forensic examinations. Another area of application is plagiarism detection. Plagiarism detection is a relevant issue both for the field of intellectual property protection in the digital space and for the educational process. The article describes identifying the author of the Russian-language text using support vector machine (SVM) and deep neural network architectures (long short-term memory (LSTM), convolutional neural networks (CNN) with attention, Transformer). The results show that all the considered algorithms are suitable for solving the authorship identification problem, but SVM shows the best accuracy. The average accuracy of SVM reaches 96%. This is due to thoroughly chosen parameters and feature space, which includes statistical and semantic features (including those extracted as a result of an aspect analysis). Deep neural networks are inferior to SVM in accuracy and reach only 93%. The study also includes an evaluation of the impact of attacks on the method on models’ accuracy. Experiments show that the SVM-based methods are unstable to deliberate text anonymization. In comparison, the loss in accuracy of deep neural networks does not exceed 20%. Transformer architecture is the most effective for anonymized texts and allows 81% accuracy to be achieved.

Funder

Ministry of Science and Higher Education of the Russian Federation

Publisher

MDPI AG

Subject

Computer Networks and Communications

Link

https://www.mdpi.com/1999-5903/13/1/3/pdf

Reference35 articles.

1. Identification Author of Source Code by Machine Learning Methods

2. Automatic text-independent speaker verification using convolutional deep belief network

Cited by 18 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Inference through innovation processes tested in the authorship attribution task;Communications Physics;2024-09-06

2. Genre Classification of Books in Russian with Stylometric Features: A Case Study;Information;2024-06-07

3. Authorship Attribution in Less-Resourced Languages: A Hybrid Transformer Approach for Romanian;Applied Sciences;2024-03-23

4. Semantic Clustering and Transfer Learning in Social Media Texts Authorship Attribution;IEEE Access;2024

5. Analysis of effective techniques and algorithms in terms of “text mining” to predict the authorship in Albanian language;CRJ;2023-09-18