Abstract
During the COVID-19 pandemic, the scientific literature related to SARS-COV-2 has been growing dramatically. These literary items encompass a varied set of topics, ranging from vaccination to protective equipment efficacy as well as lockdown policy evaluations. As a result, the development of automatic methods that allow an in-depth exploration of this growing literature has become a relevant issue, both to identify the topical trends of COVID-related research and to zoom-in on its sub-themes. This work proposes a novel methodology, called LDA2Net, which combines topic modelling and network analysis, to investigate topics under their surface. More specifically, LDA2Net exploits the frequencies of consecutive words pairs (i.e. bigram) to build those network structures underlying the hidden topics extracted from large volumes of text by Latent Dirichlet Allocation (LDA). Results are promising and suggest that the topic model efficacy is magnified by the network-based representation. In particular, such enrichment is noticeable when it comes to displaying and exploring the topics at different levels of granularity.
Funder
Horizon 2020
Programma Operativo Nazionale Ricerca e Competitività
Publisher
Public Library of Science (PLoS)
Reference43 articles.
1. Wang LL, Lo K, Chandrasekhar Y, Reas R, Yang J, Eide D, et al. Cord-19: The covid-19 open research dataset. ArXiv. 2020;.
2. A scientometric overview of CORD-19;G Colavizza;Plos one,2021
3. A network approach to topic models;M Gerlach;Science advances,2018
4. Information fusion and artificial intelligence for smart healthcare: a bibliometric study;X Chen;Information Processing & Management,2023
5. A Bibliometric Review of Soft Computing for Recommender Systems and Sentiment Analysis;X Chen;IEEE Transactions on Artificial Intelligence,2022