Abstract
A crucial part of sentiment classification is featuring extraction because it involves extracting valuable information from text data, which affects the model’s performance. The goal of this paper is to help in selecting a suitable feature extraction method to enhance the performance of sentiment analysis tasks. In order to provide directions for future machine learning and feature extraction research, it is important to analyze and summarize feature extraction techniques methodically from a machine learning standpoint. There are several methods under consideration, including Bag-of-words (BOW), Word2Vector, N-gram, Term Frequency- Inverse Document Frequency (TF-IDF), Hashing Vectorizer (HV), and Global vector for word representation (GloVe). To prove the ability of each feature extractor, we applied it to the Twitter US airlines and Amazon musical instrument reviews datasets. Finally, we trained a random forest classifier using 70% of the training data and 30% of the testing data, enabling us to evaluate and compare the performance using different metrics. Based on our results, we find that the TD-IDF technique demonstrates superior performance, with an accuracy of 99% in the Amazon reviews dataset and 96% in the Twitter US airlines dataset. This study underscores the paramount significance of feature extraction in sentiment analysis, endowing pragmatic insights to elevate model performance and steer future research pursuits.
Publisher
Public Library of Science (PLoS)
Reference43 articles.
1. A comprehensive survey on sentiment analysis: approaches, challenges and trends;M Birjali;Knowl-Based Syst,2021
2. Quantum computing and machine learning for Arabic language sentiment classification in social media;A Omar;Scientific Reports,2023
3. Comparative performance of ensemble machine learning for Arabic cyberbullying and offensive language detection;M Khairy;Language Resources and Evaluation,2023
4. A new feature selection method based on frequent and associated itemsets for text classification;H Mamdouh F;Concurrency and Computation: Practice and Experience,2022
5. Multi-label arabic text classification in online social networks;A Omar;Information Systems,2021
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献