Abstract
The richness of social media data has opened a new avenue for social science research to gain insights into human behaviors and experiences. In particular, emerging data-driven approaches relying on topic models provide entirely new perspectives on interpreting social phenomena. However, the short, text-heavy, and unstructured nature of social media content often leads to methodological challenges in both data collection and analysis. In order to bridge the developing field of computational science and empirical social research, this study aims to evaluate the performance of four topic modeling techniques; namely latent Dirichlet allocation (LDA), non-negative matrix factorization (NMF), Top2Vec, and BERTopic. In view of the interplay between human relations and digital media, this research takes Twitter posts as the reference point and assesses the performance of different algorithms concerning their strengths and weaknesses in a social science context. Based on certain details during the analytical procedures and on quality issues, this research sheds light on the efficacy of using BERTopic and NMF to analyze Twitter data.
Reference74 articles.
1. BERT for Arabic topic modeling: an experimental study on BERTopic technique;Abuzayed;Proc. Comput. Sci,2021
2. Using topic modeling methods for short-text data: a comparative analysis;Albalawi;Front. Artif. Intellig,2020
3. ZeroBERTo - leveraging zero-shot text classification by topic modeling
AlcoforadoA.
FerrazT. P.
GerberR.
BustosE.
OliveiraA. S.
VelosoB. M.
ChamFortaleza, Portugal and SpringerarXiv [Preprint]. arXiv: 2201.013372022
4. “A semi-supervised approach for user reviews topic modeling and classification,”;Alnusyan,2020
5. AndersonC
The End of Theory: The Data Deluge Makes the Scientific Method Obsolete2008
Cited by
257 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献