Abstract
As the number of individuals sharing their thoughts on Twitter continues to grow, comprehending the underlying sentiment behind these tweets becomes increasingly crucial for researchers. To identify the optimal model capable of accurately distinguishing tweet sentiment, the author uses a dataset published in 2022, containing tweet texts annotated with corresponding sentiments. Six basic machine learning classification methods are used for model training: Logistic Regression, Naïve Bayes Classifier, Support Vector Classifier, Decision Tree Classifier, Random Forest Classifier, and K-Nearest Neighbors Classifier. Subsequently, the author assesses the trained models. Through the validation, the author finds that the Logistic Regression, Support Vector Classifier, and Random Forest Classifier perform the highest accuracy and F1-score, and the differences between these three models are small. To improve the model, the author votes the best three models together to build a new model. This model’s accuracy and F1-score are better than all the basic models, and the accuracy and F1-score have all reached 71.6%. The research shows the differences between each model and the best model when distinguishing between positive tweets, neutral tweets, and negative tweets.