Affiliation:
1. Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
2. Instituto de Ingeniería, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
3. Facultad de Ciencias, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
Abstract
This article presents a comprehensive evaluation of traditional machine learning and deep learning models in analyzing sentiment trends within the SENT-COVID Twitter corpus, curated during the COVID-19 pandemic. The corpus, filtered by COVID-19 related keywords and manually annotated for polarity, is a pivotal resource for conducting sentiment analysis experiments. Our study investigates various approaches, including classic vector-based systems such as word2vec, doc2vec, and diverse phrase modeling techniques, alongside Spanish pre-trained BERT models. We assess the performance of readily available sentiment analysis libraries for Python users, including TextBlob, VADER, and Pysentimiento. Additionally, we implement and evaluate traditional classification algorithms such as Logistic Regression, Naive Bayes, Support Vector Machines, and simple neural networks like Multilayer Perceptron. Throughout the research, we explore different dimensionality reduction techniques. This methodology enables a precise comparison among classification methods, with BETO-uncased achieving the highest accuracy of 0.73 on the test set. Our findings underscore the efficacy and applicability of traditional machine learning and deep learning models in analyzing sentiment trends within the context of low-resource Spanish language scenarios and emerging topics like COVID-19.