Enhancing machine learning-based sentiment analysis through feature extraction techniques-Reference-Cited by-同舟云学术

Enhancing machine learning-based sentiment analysis through feature extraction techniques

Published:2024-02-14 Issue:2 Volume:19 Page:e0294968
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

A. Semary Noura,Ahmed Wesam,Amin Khalid,Pławiak Paweł^ORCID,Hammad Mohamed^ORCID

Abstract

A crucial part of sentiment classification is featuring extraction because it involves extracting valuable information from text data, which affects the model’s performance. The goal of this paper is to help in selecting a suitable feature extraction method to enhance the performance of sentiment analysis tasks. In order to provide directions for future machine learning and feature extraction research, it is important to analyze and summarize feature extraction techniques methodically from a machine learning standpoint. There are several methods under consideration, including Bag-of-words (BOW), Word2Vector, N-gram, Term Frequency- Inverse Document Frequency (TF-IDF), Hashing Vectorizer (HV), and Global vector for word representation (GloVe). To prove the ability of each feature extractor, we applied it to the Twitter US airlines and Amazon musical instrument reviews datasets. Finally, we trained a random forest classifier using 70% of the training data and 30% of the testing data, enabling us to evaluate and compare the performance using different metrics. Based on our results, we find that the TD-IDF technique demonstrates superior performance, with an accuracy of 99% in the Amazon reviews dataset and 96% in the Twitter US airlines dataset. This study underscores the paramount significance of feature extraction in sentiment analysis, endowing pragmatic insights to elevate model performance and steer future research pursuits.

Publisher

Public Library of Science (PLoS)

Reference43 articles.

1. A comprehensive survey on sentiment analysis: approaches, challenges and trends;M Birjali;Knowl-Based Syst,2021

2. Quantum computing and machine learning for Arabic language sentiment classification in social media;A Omar;Scientific Reports,2023

3. Comparative performance of ensemble machine learning for Arabic cyberbullying and offensive language detection;M Khairy;Language Resources and Evaluation,2023

4. A new feature selection method based on frequent and associated itemsets for text classification;H Mamdouh F;Concurrency and Computation: Practice and Experience,2022

5. Multi-label arabic text classification in online social networks;A Omar;Information Systems,2021

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enhancing hydrological data completeness: A performance evaluation of various machine learning techniques using probabilistic fusion imputer with neural networks for streamflow data reconstruction;Journal of Hydrology;2024-08

2. Analyzing Instagram User Sentiment Toward MSME Using Naive Bayes and Logistic Regression;2024 International Conference on Data Science and Its Applications (ICoDSA);2024-07-10