Abstract
This article is the third paper in a series aimed at the establishment of the authorship of Russian-language texts. This paper considers methods for determining the authorship of classical Russian literary texts, as well as fanfiction texts. The process of determining the author was first considered in the classical version of classification experiments using a closed set of authors, and experiments were also completed for a complicated modification of the problem using an open set of authors. The use of methods to identify the author of the text is justified by the conclusions about the effectiveness of the fastText and Support Vector Machine (SVM) methods with the selection of informative features discussed in our past studies. In the case of open attribution, the proposed methods are based on the author’s combination of fastText and One-Class SVM as well as statistical estimates of a vector’s similarity measures. The feature selection algorithm for a closed set of authors is chosen based on a comparison of five different selection methods, including the previously considered genetic algorithm as a baseline. The regularization-based algorithm (RbFS) was found to be the most efficient method, while methods based on a complete enumeration (FFS and SFS) are found to be ineffective for any set of authors. The accuracy of the RbFS and SVM methods in the case of classical literary texts averaged 83%, which outperforms other selection methods by 3 to 10% for an identical number of features, and the average accuracy of fastText was 84%. For the open attribution in cross-topic classification, the average accuracy of the method based on the combination of One-Class SVM with RbFS and fastText was 85%, and for in-group classification, it was 75 to 78%, depending on the group, which is the best result among the open attribution methods considered.
Subject
Computational Mathematics,Computational Theory and Mathematics,Numerical Analysis,Theoretical Computer Science
Reference45 articles.
1. Romanov, A., Kurtukova, A., Shelupanov, A., Fedotova, A., and Goncharov, V. (2021). Authorship Identification of a Russian-Language Text Using Support Vector Machine and Deep Neural Networks. Future Internet, 13.
2. Romanov, A.S., Kurtukova, A.V., Sobolev, A.A., Shelupanov, A.A., and Fedotova, A.M. (2020). Determining the Age of the Author of the Text Based on Deep Neural Network Models. Information, 11.
3. Unifying Lexical, Syntactic, and Structural Representations of Written Language for Authorship Attribution;Jafariakinabad;SN Comput. Sci.,2021
4. Mahor, U., and Kumar, A. (2021). Information and Communication Technology for Competitive Strategies, ICTCS Springer.
5. Fedotova, A., Romanov, A., Kurtukova, A., and Shelupanov, A. (2022). Authorship Attribution of Social Media and Literary Russian-Language Texts Using Machine Learning Methods and Feature Selection. Future Internet, 14.
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献