Tackling the Problem of Class Imbalance in Multi-class Sentiment Classification: An Experimental Study-Reference-Cited by-同舟云学术

Tackling the Problem of Class Imbalance in Multi-class Sentiment Classification: An Experimental Study

Published:2019-06-01 Issue:2 Volume:44 Page:151-178
ISSN:2300-3405
Container-title:Foundations of Computing and Decision Sciences
language:en
Short-container-title:

Author:

Lango Mateusz¹

Affiliation:

1. Institute of Computing Sciences , Poznan University of Technology , Poznań , Poland

Abstract

Abstract Sentiment classification is an important task which gained extensive attention both in academia and in industry. Many issues related to this task such as handling of negation or of sarcastic utterances were analyzed and accordingly addressed in previous works. However, the issue of class imbalance which often compromises the prediction capabilities of learning algorithms was scarcely studied. In this work, we aim to bridge the gap between imbalanced learning and sentiment analysis. An experimental study including twelve imbalanced learning preprocessing methods, four feature representations, and a dozen of datasets, is carried out in order to analyze the usefulness of imbalanced learning methods for sentiment classification. Moreover, the data difficulty factors — commonly studied in imbalanced learning — are investigated on sentiment corpora to evaluate the impact of class imbalance.

Publisher

Walter de Gruyter GmbH

Link

https://www.sciendo.com/pdf/10.2478/fcds-2019-0009

Reference75 articles.

1. [1] Abbasi, A., France, S., Zhang, Z., Chen, H.: Selecting Attributes for Sentiment Classification Using Feature Relation Networks. IEEE Transactions on Knowledge and Data Engineering, 23 (3), 447-462 (2011).

2. [2] Baccianella, S., Esuli, A., Sebastiani, F.: Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Proc. of the Int. Conference on Language Resources and Evaluation (2010).

3. [3] Blagus, R., Lusa, L.: SMOTE for high-dimensional class-imbalanced data. BMC Bioinformatics, 14 (1), 1471–2105 (2013).

4. [4] Blitzer, M. D., Pereira, F.: Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. In Proc. of the Annual Meeting of the Association for Computational Linguistics (ACL-2007), 440-447 (2007).

5. [5] Błaszczyński, J., Stefanowski, J.: Neighbourhood sampling in bagging for imbalanced data. Neurocomputing, 150 A, 184–203 (2015).

Cited by 22 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An automated approach for binary classification on imbalanced data;Knowledge and Information Systems;2024-01-12

2. Q8KNN: A Novel 8-Bit KNN Quantization Method for Edge Computing in Smart Lighting Systems with NodeMCU;Lecture Notes in Networks and Systems;2024

3. GMMSampling: a new model-based, data difficulty-driven resampling method for multi-class imbalanced data;Machine Learning;2023-11-20

4. Gene Selection for High-Dimensional Imbalanced Biomedical Data Based on Marine Predators Algorithm and Evolutionary Population Dynamics;Arabian Journal for Science and Engineering;2023-09-12

5. Sentiment Analysis Framework using Deep Active Learning for Smartphone Aspect Based Rating Prediction;Foundations of Computing and Decision Sciences;2023-06-01