Abstract
The purpose of this study is to develop and validate a procedure known as the Information Vortex Indicator (IVI) for its effectiveness, designed to detect the timing of information vortex formation in textual data streams. Research has established that the formation of this vortex coincides with the onset of the dissemination of fake news (FN) concerning a particular object (such as a person, organization, company, event, etc.). The primary aim of this detection is to minimize the time required for an appropriate response or defense against the adverse effects of information turbulence caused by the spread of fake news. Methodology: The study used Big Data information resources analysis instruments (Gogołek, 2019, 2022), including selected statistical and artificial intelligence techniques and tools, to automatically detect vortex occurrence in real time. Experimental validation of the efficacy of these tools has been conducted, enabling a reliable assessment of the timing of vortex emergence. This assessment is quantified using the V-function, procedure, or test, which formally describes the IVI procedure. The V-function’s parameters are derived from the distribution patterns of letter pair clusters within the textual information stream. Conclusions: A comparison of manual (reference) and automatic detection of vortex emergence times confirmed an accuracy rate of over 80% in detecting the appearance of fake news. These results underscore the effectiveness of the IVI procedure and the utility of the selected tools for rapidly automating the detection of information vortices, which herald the propagation of fake news. Furthermore, the study demonstrates the applicability of IVI for the continuous monitoring of information with significant media value across multiple multilingual data streams. Originality: This research introduces a novel approach utilizing the distribution of letter pair clusters within information streams to detect the onset of information vortices, coinciding with the emergence of fake news. This methodology represents a unique contribution to the field, as prior research on this subject is limited.
Reference26 articles.
1. Arutyunov, A., Borisov, L., Fedorov, S., Ivchenko, A., Kirina-Lilinskaya, E., Orlov, Y., Osminin, K., Shilin, S., & Zeniuk, D. (2016). Statistical Properties of European Languages and Voynich Manuscript Analysis. CoRR, abs/1611.09122.
2. Camps, J.-B., Clérice, T., & Pinche, A. (2021). Noisy medieval data, from digitized manuscript to stylometric analysis: Evaluating Paul Meyer’s hagiographic hypothesis. Digital Scholarship in the Humanities, 36(2), ii49–ii71. https://doi.org/10.1093/llc/fqab033
3. Gogołek, W. (2006). Hit z komputera. Polityka, 45. Pobrane z https://technopolis.polityka.pl/2006/program-na-hit
4. Gogołek, W., & Kuczma, P. (2013). Rafinacja informacji sieciowych na przykładzie wyborów parlamentarnych. Część 1. Blogi, fora, analiza sentymentów. Studia Medioznawcze, 2(53), 89–109.
5. Gogołek, W. (2019). Refining Big Data. Bulletin of Science. Technology & Society, 37(4), 212–217. https://doi.org/10.1177/0270467619864012