The Early Detection of Fraudulent COVID-19 Products From Twitter Chatter: Data Set and Baseline Approach Using Anomaly Detection-Reference-Cited by-同舟云学术

The Early Detection of Fraudulent COVID-19 Products From Twitter Chatter: Data Set and Baseline Approach Using Anomaly Detection

Published:2023-03-14 Issue: Volume:3 Page:e43694
ISSN:2564-1891
Container-title:JMIR Infodemiology
language:en
Short-container-title:JMIR Infodemiology

Author:

Sarker Abeed^ORCID,Lakamana Sahithi^ORCID,Liao Ruqi^ORCID,Abbas Aamir^ORCID,Yang Yuan-Chi^ORCID,Al-Garadi Mohammed^ORCID

Abstract

Background Social media has served as a lucrative platform for spreading misinformation and for promoting fraudulent products for the treatment, testing, and prevention of COVID-19. This has resulted in the issuance of many warning letters by the US Food and Drug Administration (FDA). While social media continues to serve as the primary platform for the promotion of such fraudulent products, it also presents the opportunity to identify these products early by using effective social media mining methods. Objective Our objectives were to (1) create a data set of fraudulent COVID-19 products that can be used for future research and (2) propose a method using data from Twitter for automatically detecting heavily promoted COVID-19 products early. Methods We created a data set from FDA-issued warnings during the early months of the COVID-19 pandemic. We used natural language processing and time-series anomaly detection methods for automatically detecting fraudulent COVID-19 products early from Twitter. Our approach is based on the intuition that increases in the popularity of fraudulent products lead to corresponding anomalous increases in the volume of chatter regarding them. We compared the anomaly signal generation date for each product with the corresponding FDA letter issuance date. We also performed a brief manual analysis of chatter associated with 2 products to characterize their contents. Results FDA warning issue dates ranged from March 6, 2020, to June 22, 2021, and 44 key phrases representing fraudulent products were included. From 577,872,350 posts made between February 19 and December 31, 2020, which are all publicly available, our unsupervised approach detected 34 out of 44 (77.3%) signals about fraudulent products earlier than the FDA letter issuance dates, and an additional 6 (13.6%) within a week following the corresponding FDA letters. Content analysis revealed misinformation, information, political, and conspiracy theories to be prominent topics. Conclusions Our proposed method is simple, effective, easy to deploy, and does not require high-performance computing machinery unlike deep neural network–based methods. The method can be easily extended to other types of signal detection from social media data. The data set may be used for future research and the development of more advanced methods.

Publisher

JMIR Publications Inc.

Reference23 articles.

1. Trends in Number of COVID-19 Cases and Deaths in the US Reported to CDC, by State/TerritoryCenters for Disease Control and Prevention2021-02-19https://covid.cdc.gov/covid-data-tracker/#trends_totalandratedeathssevendayrate

2. Pseudoscience and fraudulent products for COVID-19 management

3. Preying on Public Fears and Anxieties in a Pandemic: Businesses Selling Unproven and Unlicensed “Stem Cell Treatments” for COVID-19

4. Non-evidenced based treatment: An unintended cause of morbidity and mortality related to COVID-19

5. Methanol poisoning during COVID-19 pandemic; A systematic scoping review