Machine Learning and Deep Learning Sentiment Analysis Models: Case Study on the SENT-COVID Corpus of Tweets in Mexican Spanish-Reference-Cited by-同舟云学术

Machine Learning and Deep Learning Sentiment Analysis Models: Case Study on the SENT-COVID Corpus of Tweets in Mexican Spanish

Published:2024-04-23 Issue:2 Volume:11 Page:24
ISSN:2227-9709
Container-title:Informatics
language:en
Short-container-title:Informatics

Author:

Gomez-Adorno Helena¹^ORCID,Bel-Enguix Gemma²^ORCID,Sierra Gerardo²^ORCID,Barajas Juan-Carlos³^ORCID,Álvarez William¹^ORCID

Affiliation:

1. Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico

2. Instituto de Ingeniería, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico

3. Facultad de Ciencias, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico

Abstract

This article presents a comprehensive evaluation of traditional machine learning and deep learning models in analyzing sentiment trends within the SENT-COVID Twitter corpus, curated during the COVID-19 pandemic. The corpus, filtered by COVID-19 related keywords and manually annotated for polarity, is a pivotal resource for conducting sentiment analysis experiments. Our study investigates various approaches, including classic vector-based systems such as word2vec, doc2vec, and diverse phrase modeling techniques, alongside Spanish pre-trained BERT models. We assess the performance of readily available sentiment analysis libraries for Python users, including TextBlob, VADER, and Pysentimiento. Additionally, we implement and evaluate traditional classification algorithms such as Logistic Regression, Naive Bayes, Support Vector Machines, and simple neural networks like Multilayer Perceptron. Throughout the research, we explore different dimensionality reduction techniques. This methodology enables a precise comparison among classification methods, with BETO-uncased achieving the highest accuracy of 0.73 on the test set. Our findings underscore the efficacy and applicability of traditional machine learning and deep learning models in analyzing sentiment trends within the context of low-resource Spanish language scenarios and emerging topics like COVID-19.

Funder

CONAHCYT

PAPIIT

Publisher

MDPI AG

Link

https://www.mdpi.com/2227-9709/11/2/24/pdf

Reference76 articles.

1. Shivaprasad, T., and Shetty, J. (2017, January 10–11). Sentiment analysis of product reviews: A review. Proceedings of the 2017 International Conference on Inventive Communication and Computational Technologies (ICICCT), Coimbatore, India.

2. Das, A., Gunturi, K.S., Chandrasekhar, A., Padhi, A., and Liu, Q. (2021, January 7–10). Automated pipeline for sentiment analysis of political tweets. Proceedings of the 2021 International Conference on Data Mining Workshops (ICDMW), Auckland, New Zealand.

3. Man, X., Luo, T., and Lin, J. (2019, January 6–9). Financial sentiment analysis (fsa): A survey. Proceedings of the 2019 IEEE International Conference on Industrial Cyber Physical Systems (ICPS), Taipei, Taiwan.

4. Shelar, A., and Huang, C.Y. (2018, January 12–14). Sentiment Analysis of Twitter Data. Proceedings of the 2018 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA.

5. Zahoor, S., and Rohilla, R. (2020, January 4–5). Twitter Sentiment Analysis Using Lexical or Rule Based Approach: A Case Study. Proceedings of the 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India.