Author:
Danowski James,Riopelle Ken,Yan Bei
Abstract
Searching social media to find relevant semantic domains often results in large text files, many of which are irrelevant due to cross-domain content resulting from word polysemy, abstractness, and degree centrality. Through an iterative pruning process, Cascaded Semantic Fractionation (CSF) systematically removes these cross-domain links. The social network procedure performs community detection in semantic networks, locates the semantic groups containing the terms of interest, excludes intergroup links, and repeats community detection on the pruned intragroup network until the domain of interest is clarified. To illustrate CSF, we analyzed public Facebook posts, using the CrowdTangle app for historical data search, from February 3, 2020, to March 13, 2021, about the possible Wuhan lab leak of COVID-19 over a daily interval. The initial search using keywords located six multi-day bursts of posts of more than 500 per day among 95 K posts. These posts were network analyzed to find the domain of interest using the iterative community detection and pruning process. CSF can be applied to capture the evolutions in semantic domains over time. At the outset, the lab leak theory was presented in conspiracy theory terms. Over time, the conspiratorial elements washed out in favor of an accidental release as the issue moved from social to mainstream media and official government views. CSF identified the relevant social media semantic domain and tracked its changes.
Reference64 articles.
1. AbuhayT. M.
DemissieT. A.
AnwarA.
Towards Predicting Trend of Scientific Research Topics Using Topic Modeling2021
2. A comparison of three methods to determine the subject matter in textual data;Barnett;Front. Res. Metr. Analyt,2023
3. Community detection in social networks;Bedi;Wiley Interdiscipl. Rev,2016
4. Fast unfolding of communities in large networks;Blondel;J. Stat. Mech,2008
5. “Care and feeding of topic models: problems, diagnostics, and improvements,”;Boyd-Graber,2014