Mitigating Class Imbalance in Sentiment Analysis through GPT-3-Generated Synthetic Sentences-Reference-Cited by-同舟云学术

Mitigating Class Imbalance in Sentiment Analysis through GPT-3-Generated Synthetic Sentences

Published:2023-08-29 Issue:17 Volume:13 Page:9766
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Suhaeni Cici¹,Yong Hwan-Seung¹

Affiliation:

1. Department of Computer Science and Engineering, Ewha Womans University, Seoul 03760, Republic of Korea

Abstract

In this paper, we explore the effectiveness of the GPT-3 model in tackling imbalanced sentiment analysis, focusing on the Coursera online course review dataset that exhibits high imbalance. Training on such skewed datasets often results in a bias towards the majority class, undermining the classification performance for minority sentiments, thereby accentuating the necessity for a balanced dataset. Two primary initiatives were undertaken: (1) synthetic review generation via fine-tuning of the Davinci base model from GPT-3 and (2) sentiment classification utilizing nine models on both imbalanced and balanced datasets. The results indicate that good-quality synthetic reviews substantially enhance sentiment classification performance. Every model demonstrated an improvement in accuracy, with an average increase of approximately 12.76% on the balanced dataset. Among all the models, the Multinomial Naïve Bayes achieved the highest accuracy, registering 75.12% on the balanced dataset. This study underscores the potential of the GPT-3 model as a feasible solution for addressing data imbalance in sentiment analysis and offers significant insights for future research.

Funder

Korea Agency for Infrastructure Technology Advancement

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/17/9766/pdf

Reference41 articles.

1. Kanojia, D., and Joshi, A. (2023). Applications and Challenges of Sentiment Analysis in Real-Life Scenarios. arXiv.

2. Sentiment Analysis of COVID-19 Tweets from Selected Hashtags in Nigeria Using VADER and Text Blob Analyser;Abiola;J. Electr. Syst. Inf. Technol.,2023

3. Best Algorithm in Sentiment Analysis of Presidential Election in Indonesia on Twitter;Hananto;Int. J. Intell. Syst. Appl. Eng.,2023

4. Bonetti, A., Martínez-Sober, M., Torres, J.C., Vega, J.M., Pellerin, S., and Vila-Francés, J. (2023). Comparison between Machine Learning and Deep Learning Approaches for the Detection of Toxic Comments on Social Networks. Appl. Sci., 13.

5. Muhammad, S.H., Abdulmumin, I., Yimam, S.M., Adelani, D.I., Ahmad, I.S., Ousidhoum, N., Ayele, A., Mohammad, S.M., Beloucif, M., and Ruder, S. (2023). SemEval-2023 Task 12: Sentiment Analysis for African Languages (AfriSenti-SemEval). arXiv.

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Aspect-based sentiment analysis: A dual-task learning architecture using imbalanced maximized-area under the curve proximate support vector machine and reinforcement learning;Information Sciences;2024-09

2. A review on emotion detection by using deep learning techniques;Artificial Intelligence Review;2024-07-11

3. Addressing Data Scarcity in the Medical Domain: A GPT-Based Approach for Synthetic Data Generation and Feature Extraction;Information;2024-05-06

4. Generative Pre-Trained Transformer (GPT) in Research: A Systematic Review on Data Augmentation;Information;2024-02-08

5. Comparative Analysis of Machine Learning Algorithms for Arabic Sentiment Analysis on Imbalanced Social Media Data;2024 ASU International Conference in Emerging Technologies for Sustainability and Intelligent Systems (ICETSIS);2024-01-28