The Limitations of Stylometry for Detecting Machine-Generated Fake News-Reference-Cited by-同舟云学术

The Limitations of Stylometry for Detecting Machine-Generated Fake News

Published:2020-06 Issue:2 Volume:46 Page:499-510
ISSN:0891-2017
Container-title:Computational Linguistics
language:en
Short-container-title:Computational Linguistics

Author:

Schuster Tal¹,Schuster Roei²,Shah Darsh J.¹,Barzilay Regina¹

Affiliation:

1. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology.

2. Computer Science Department, Tel Aviv University, and Computer Science Department, Cornell Tech.

Abstract

Recent developments in neural language models (LMs) have raised concerns about their potential misuse for automatically spreading misinformation. In light of these concerns, several studies have proposed to detect machine-generated fake news by capturing their stylistic differences from human-written text. These approaches, broadly termed stylometry, have found success in source attribution and misinformation detection in human-written texts. However, in this work, we show that stylometry is limited against machine-generated misinformation. Whereas humans speak differently when trying to deceive, LMs generate stylistically consistent text, regardless of underlying motive. Thus, though stylometry can successfully prevent impersonation by identifying text provenance, it fails to distinguish legitimate LM applications from those that introduce false information. We create two benchmarks demonstrating the stylistic similarity between malicious and legitimate uses of LMs, utilized in auto-completion and editing-assistance settings. 1 Our findings highlight the need for non-stylometry approaches in detecting machine-generated misinformation, and open up the discussion on the desired evaluation benchmarks.

Publisher

MIT Press - Journals

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Language and Linguistics

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/coli_a_00380

Reference51 articles.

1. Automatic IQ Estimation Using Stylometric Methods

2. Detecting Hoaxes, Frauds, and Deception in Writing Style Online

3. Doppelgänger Finder: Taking Stylometry to the Underground

4. MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims

Cited by 42 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. CoAT: Corpus of artificial texts;Natural Language Processing;2024-09-06

2. BERTGuard: Two-Tiered Multi-Domain Fake News Detection with Class Imbalance Mitigation;Big Data and Cognitive Computing;2024-08-16

3. Structural link prediction model with multi-view text semantic feature extraction;Intelligent Decision Technologies;2024-06-27

4. Towards a large sized curated and annotated corpus for discriminating between human written and AI generated texts: A case study of text sourced from Wikipedia and ChatGPT;Natural Language Processing Journal;2024-03

5. Mobile Text Misinformation Identification Using Machine Learning;Advances in Information Security, Privacy, and Ethics;2024-02-14