Attribution and Obfuscation of Neural Text Authorship: A Data Mining Perspective-Reference-Cited by-同舟云学术

Attribution and Obfuscation of Neural Text Authorship: A Data Mining Perspective

Published:2023-06-22 Issue:1 Volume:25 Page:1-18
ISSN:1931-0145
Container-title:ACM SIGKDD Explorations Newsletter
language:en
Short-container-title:SIGKDD Explor. Newsl.

Author:

Uchendu Adaku¹,Le Thai²,Lee Dongwon¹

Affiliation:

1. Penn State University, PA, USA

2. University of Mississippi, MS, USA

Abstract

Two interlocking research questions of growing interest and importance in privacy research are Authorship Attribution (AA) and Authorship Obfuscation (AO). Given an artifact, especially a text t in question, an AA solution aims to accurately attribute t to its true author out of many candidate authors while an AO solution aims to modify t to hide its true authorship. Traditionally, the notion of authorship and its accompanying privacy concern is only toward human authors. However, in recent years, due to the explosive advancements in Neural Text Generation (NTG) techniques in NLP, capable of synthesizing human-quality openended texts (so-called "neural texts"), one has to now consider authorships by humans, machines, or their combination. Due to the implications and potential threats of neural texts when used maliciously, it has become critical to understand the limitations of traditional AA/AO solutions and develop novel AA/AO solutions in dealing with neural texts. In this survey, therefore, we make a comprehensive review of recent literature on the attribution and obfuscation of neural text authorship from a Data Mining perspective, and share our view on their limitations and promising research directions.

Publisher

Association for Computing Machinery (ACM)

Subject

General Medicine

Link

https://dl.acm.org/doi/pdf/10.1145/3606274.3606276

Reference123 articles.

1. A. Abbasi and H. Chen . Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Transactions on Information Systems (TOIS), 26(2):1--29 , 2008 . A. Abbasi and H. Chen. Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Transactions on Information Systems (TOIS), 26(2):1--29, 2008.

2. Adversarial Watermarking Transformer: Towards Tracing Text Provenance with Data Hiding

3. Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-Based Detection

4. B. Ai , Y. Wang , Y. Tan , and T. Samson . Whodunit? learning to contrast for authorship attribution . Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing , 2022 . B. Ai, Y. Wang, Y. Tan, and T. Samson. Whodunit? learning to contrast for authorship attribution. Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, 2022.

5. A. Bakhtin , S. Gross , M. Ott , Y. Deng , M. Ranzato , and A. Szlam . Real or fake? learning to discriminate machine from human generated text. arXiv preprint arXiv:1906.03351 , 2019 . A. Bakhtin, S. Gross, M. Ott, Y. Deng, M. Ranzato, and A. Szlam. Real or fake? learning to discriminate machine from human generated text. arXiv preprint arXiv:1906.03351, 2019.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. CoAT: Corpus of artificial texts;Natural Language Processing;2024-09-06

2. Combating misinformation in the age of LLMs: Opportunities and challenges;AI Magazine;2024-08

3. College English Smart Classroom Learning Model Utilizing Data Mining Technology;International Journal of Web-Based Learning and Teaching Technologies;2024-07-17

4. Neural Authorship Attribution: Stylometric Analysis on Large Language Models;2023 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC);2023-11-02