Abstract
In this work, we propose a novel approach to solve the authorship identification task on a cross-topic and open-set scenario. Authorship verification is the task of determining whether or not two texts were written by the same author. We model the documents in a graph representation and then a graph neural network extracts relevant features from these graph representations. We present three strategies to represent the texts as graphs based on the co-occurrence of the POS labels of words. We propose a Siamese Network architecture composed of graph convolutional networks along with pooling and classification layers. We present different variants of the architecture and discuss the performance of each one. To evaluate our approach we used a collection of fanfiction texts provided by the PAN@CLEF 2021 shared task in two settings: a “small” corpus and a “large” corpus. Our graph-based approach achieved average scores (AUC ROC, F1, Brier score, F0.5u, and C@1) between 90% and 92.83% when training on the “small” and “large” corpus, respectively. Our model obtain results comparable to those of the state of the art in this task and greater than traditional baselines.
Subject
General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)
Reference46 articles.
1. Authorship Attribution
2. A survey of modern authorship attribution methods
3. A Survey On Authorship Attribution Approaches;Mekala;Int. J. Comput. Eng. Res. (IJCER),2018
4. Who’s At The Keyboard? Authorship Attribution in Digital Evidence Investigations;Chaski;Int. J. Digit. Evid.,2005
5. Effective identification of source code authors using byte-level information
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献