Finding Patient Zero and Tracking Narrative Changes in the Context of Online Disinformation Using Semantic Similarity Analysis-Reference-Cited by-同舟云学术

Finding Patient Zero and Tracking Narrative Changes in the Context of Online Disinformation Using Semantic Similarity Analysis

Published:2023-04-26 Issue:9 Volume:11 Page:2053
ISSN:2227-7390
Container-title:Mathematics
language:en
Short-container-title:Mathematics

Author:

Artene Codruț-Georgian¹^ORCID,Oprișa Ciprian²^ORCID,Buțincu Cristian Nicolae¹^ORCID,Leon Florin¹^ORCID

Affiliation:

1. Department of Computer Science and Engineering, “Gheorghe Asachi” Technical University of Iasi, 700050 Iasi, Romania

2. Computer Science Department, Technical University of Cluj-Napoca, 400114 Cluj-Napoca, Romania

Abstract

Disinformation in the form of news articles, also called fake news, is used by multiple actors for nefarious purposes, such as gaining political advantages. A key component for fake news detection is the ability to find similar articles in a large documents corpus, for tracking narrative changes and identifying the root source (patient zero) of a particular piece of information. This paper presents new techniques based on textual and semantic similarity that were adapted for achieving this goal on large datasets of news articles. The aim is to determine which of the implemented text similarity techniques is more suitable for this task. For text similarity, a Locality-Sensitive Hashing is applied on n-grams extracted from text to produce representations that are further indexed to facilitate the quick discovery of similar articles. The semantic textual similarity technique is based on sentence embeddings from pre-trained language models, such as BERT, and Named Entity Recognition. The proposed techniques are evaluated on a collection of Romanian articles to determine their performance in terms of quality of results and scalability. The presented techniques produce competitive results. The experimental results show that the proposed semantic textual similarity technique is better at identifying similar text documents, while the Locality-Sensitive Hashing text similarity technique outperforms it in terms of execution time and scalability. Even if they were evaluated only on Romanian texts and some of them are based on pre-trained models for the Romanian language, the methods that are the basis of these techniques allow their extension to other languages, with few to no changes, provided that there are pre-trained models for other languages as well. As for a cross-lingual setup, more changes are needed along with tests to demonstrate this capability. Based on the obtained results, one may conclude that the presented techniques are suitable to be integrated into a decentralized anti-disinformation platform for fact-checking and trust assessment.

Funder

European Union’s Horizon 2020 Research and Innovation Programme

Publisher

MDPI AG

Subject

General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Link

https://www.mdpi.com/2227-7390/11/9/2053/pdf

Reference45 articles.

1. Why do people believe in fake news over the Internet? An understanding from the perspective of existence of the habit of eating and drinking;Kanoh;Procedia Comput. Sci.,2018

2. All the news that’s fit to fabricate: AI-generated text as a tool of media misinformation;Kreps;J. Exp. Political Sci.,2022

3. Susukailo, V., Opirskyy, I., and Vasylyshyn, S. (2020, January 23–26). Analysis of the attack vectors used by threat actors during the pandemic. Proceedings of the 2020 IEEE 15th International Conference on Computer Sciences and Information Technologies (CSIT), Zbarazh, Ukraine.

4. Zhou, X., Wu, J., and Zafarani, R. (2020, January 11–14). SAFE: Similarity-Aware Multi-modal Fake News Detection. Proceedings of the Advances in Knowledge Discovery and Data Mining: 24th Pacific-Asia Conference, PAKDD 2020, Singapore.

5. Text similarity measures in news articles by vector space model using NLP;Singh;J. Inst. Eng. Ser.,2021