Sentiment analysis of coronavirus data with ensemble and machine learning methods-Reference-Cited by-同舟云学术

Sentiment analysis of coronavirus data with ensemble and machine learning methods

Published:2024-04-30 Issue:2 Volume:8 Page:175-185
ISSN:2587-1366
Container-title:Turkish Journal of Engineering
language:
Short-container-title:

Author:

Başarslan Muhammet Sinan¹^ORCID,Kayaalp Fatih²^ORCID

Affiliation:

1. İSTANBUL MEDENİYET ÜNİVERSİTESİ

2. DUZCE UNIVERSITY

Abstract

The coronavirus pandemic has distanced people from social life and increased the use of social media. People's emotions can be determined with text data collected from social media applications. This is used in many fields, especially in commerce. This study aims to predict people's sentiments about the pandemic by applying sentiment analysis to Twitter tweets about the pandemic using single machine learning classifiers (Decision Tree-DT, K-Nearest Neighbor-KNN, Logistic Regression-LR, Naïve Bayes-NB, Random Forest-RF) and ensemble learning methods (Majority Voting (MV), Probabilistic Voting (PV), and Stacking (STCK)). After vectorizing the tweets using two predictive methods, Word2Vec (W2V) and Doc2Vec, and two traditional word representation methods, Term Frequency-Inverse Document Frequency (TF-IDF) and Bag of Words (BOW), classification models built using single machine learning classifiers were compared to models built using ensemble learning methods (MV, PV and STCK) by heterogeneously combining single machine classifier algorithms. Accuracy (ACC), F-measure (F), precision (P), and recall (R) were used as performance measures, with training/test separation rates of 70%-30% and 80%-20%, respectively. Among these models, the ACC of ensemble learning models ranged from 89% to 73%, while the ACC of single classifier models ranged from 60% to 80%. Among the ensemble learning methods, STCK with Doc2Vec text representation/embedding method gave the best ACC result of 89%. According to the experimental results, ensemble models built with heterogeneous machine learning classifier algorithms gave better results than single machine learning classifier algorithms.

Funder

None

Publisher

Turkish Journal of Engineering

Reference33 articles.

1. Cauberghe, V., Van Wesenbeeck, I., De Jans, S., Hudders, L., & Ponnet, K. (2021). How adolescents use social media to cope with feelings of loneliness and anxiety during COVID-19 lockdown. Cyberpsychology, Behavior, and Social Networking, 24(4), 250-257. https://doi.org/10.1089/cyber.2020.0478

2. Vernikou, S., Lyras, A., & Kanavos, A. (2022). Multiclass sentiment analysis on COVID-19-related tweets using deep learning models. Neural Computing and Applications, 34(22), 19615-19627. https://doi.org/10.1007/s00521-022-07650-2

3. Antonio, V. D., Efendi, S., & Mawengkang, H. (2022). Sentiment analysis for Covid-19 in Indonesia on Twitter with TF-IDF featured extraction and stochastic gradient descent. International Journal of Nonlinear Analysis and Applications, 13(1), 1367-1373. https://doi.org/10.22075/IJNAA.2021.5735

4. Machuca, C. R., Gallardo, C., & Toasa, R. M. (2021). Twitter sentiment analysis on coronavirus: Machine learning approach. In Journal of Physics: Conference Series, 1828(1), 012104. https://doi.org/10.1088/1742-6596/1828/1/012104

5. Barkur, G., & Kamath, G. B. (2020). Sentiment analysis of nationwide lockdown due to COVID 19 outbreak: Evidence from India. Asian Journal of Psychiatry, 51, 102089. https://doi.org/10.1016/j.ajp.2020.102089