Affiliation:
1. Aalto University , Konemiehentie 2, 02150 Espoo , Finland
2. University of Waterloo , 200 University Ave W , Waterloo , ON N2L 3G1 , Canada
Abstract
Abstract
Stylometry can be used to profile or deanonymize authors against their will based on writing style. Style transfer provides a defence. Current techniques typically use either encoder-decoder architectures or rule-based algorithms. Crucially, style transfer must reliably retain original semantic content to be actually deployable. We conduct a multifaceted evaluation of three state-of-the-art encoder-decoder style transfer techniques, and show that all fail at semantic retainment. In particular, they do not produce appropriate paraphrases, but only retain original content in the trivial case of exactly reproducing the text. To mitigate this problem we propose ParChoice: a technique based on the combinatorial application of multiple paraphrasing algorithms. ParChoice strongly outperforms the encoder-decoder baselines in semantic retainment. Additionally, compared to baselines that achieve nonnegligible semantic retainment, ParChoice has superior style transfer performance. We also apply ParChoice to multi-author style imitation (not considered by prior work), where we achieve up to 75% imitation success among five authors. Furthermore, when compared to two state-of-the-art rule-based style transfer techniques, ParChoice has markedly better semantic retainment. Combining ParChoice with the best performing rulebased baseline (Mutant-X [34]) also reaches the highest style transfer success on the Brennan-Greenstadt and Extended-Brennan-Greenstadt corpora, with much less impact on original meaning than when using the rulebased baseline techniques alone. Finally, we highlight a critical problem that afflicts all current style transfer techniques: the adversary can use the same technique for thwarting style transfer via adversarial training. We show that adding randomness to style transfer helps to mitigate the effectiveness of adversarial training.
Reference74 articles.
1. [1] Ahmed Abbasi and Hsinchun Chen. Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Transactions on Information and System Security, 26(2):1–29, 2008.
2. [2] Sadia Afroz, Michael Brennan, and Rachel Greenstadt. Detecting Hoaxes, Frauds, and Deception in Writing Style Online. In Proceedings of the 2012 IEEE Symposium on Security and Privacy, pages 461–475, 2012.
3. [3] Mishari Almishari, Ekin Oguz, and Gene Tsudik. Fighting Authorship Linkability with Crowdsourcing. In Proceedings of the second ACM conference on Online social networks, pages 69–82, 2014.
4. [4] Douglas Bagnall. Author identification using multi-headed recurrent neural networks. In Working Notes of CLEF 2015 - Conference and Labs of the Evaluation Forum, 2015.
5. [5] Satanjeev Banerjee and Alon Lavie. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65–72, 2005.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Deep Learning for Text Style Transfer: A Survey;Computational Linguistics;2022-02-08
2. Optimization to the Rescue;Proceedings of the 2021 Research on offensive and defensive techniques in the Context of Man At The End (MATE) Attacks;2021-11-15