Affiliation:
1. Bauhaus-Universität Weimar , Weimar , Germany
2. Leipzig University , Leipzig , Germany
3. Martin-Luther-Universität Halle-Wittenberg , Halle , Germany
Abstract
Abstract
Authorship verification is the task of determining whether two texts were written by the same author based on a writing style analysis. Author obfuscation is the adversarial task of preventing a successful verification by altering a text’s style so that it does not resemble that of its original author anymore. This paper introduces new algorithms for both tasks and reports on a comprehensive evaluation to ascertain the merits of the state of the art in authorship verification to withstand obfuscation.
After introducing a new generalization of the well-known unmasking algorithm for short texts, thus completing our collection of state-of-the-art algorithms for verification, we introduce an approach that (1) models writing style difference as the Jensen-Shannon distance between the character n-gram distributions of texts, and (2) manipulates an author’s writing style in a sophisticated manner using heuristic search. For obfuscation, we explore the huge space of textual variants in order to find a paraphrased version of the to-be-obfuscated text that has a sufficiently high Jensen-Shannon distance at minimal costs in terms of text quality loss. We analyze, quantify, and illustrate the rationale of this approach, define paraphrasing operators, derive text length-invariant thresholds for termination, and develop an effective obfuscation framework. Our authorship obfuscation approach defeats the presented state-of-the-art verification approaches, while keeping text changes at a minimum. As a final contribution, we discuss and experimentally evaluate a reverse obfuscation attack against our obfuscation approach as well as possible remedies.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献