Affiliation:
1. Institute for Humanities and Cultural Studies , Tehran, Iran
Abstract
Abstract
The COVID-19 pandemic provided an infodemic situation to face people in the society with a massive amount of information due to accessing social media, such as Twitter and Instagram. These platforms have made the information circulation easy and paved the ground to mix information and misinformation. One solution to prevent an infodemic situation is avoiding false information distribution and filtering the fake news to reduce the negative impact of such news in the society. This article aims at studying the properties of fake news in English and Persian using the textual information transmitted through language in the news. To this end, the properties existed in a text based on information theory, stylometry information from raw texts, readability of the texts, and linguistic information, such as phonology, syntax, and morphology, are studied. In this study, we use the XLM-RoBERTa representation with a convolutional neural network classifier as the basic model to detect English and Persian COVID-19 fake news. In addition, we propose different learning scenarios such that different feature sets are concatenated with the contextualized representation. According to the experimental results, adding any of the textual information to the basic model has improved the performance of the classifier for both English and Persian. Information about readability of the texts and stylometry features have been the most effective features for detecting English fake news and improved the performance by 2.72% based on F-measure. Augmenting this feature setting with the information amount and linguistic morphological information improved the performance of the classifier by 3.79% based on F-measure for Persian.
Publisher
Oxford University Press (OUP)
Subject
Computer Science Applications,Linguistics and Language,Language and Linguistics,Information Systems
Reference51 articles.
1. Lies kill, facts save: detecting COVID-19 misinformation in Twitter;Al-Rakhami;IEEE Access,2020
2. Detection of fake news text classification on COVID-19 using deep learning approaches;Bangyal;Computational Intelligence for Health Care,2021
3. naqše peykarehāye zabāni dar neveštane dasture zabān: mo‘arrefiye yek narmafzāre rāyāneyi [The role of corpora in writing a grammar: introducing a software];Bijankhan;Journal of Linguistics,2004
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献