Breaking the Curse of Class Imbalance: Bangla Text Classification-Reference-Cited by-同舟云学术

Breaking the Curse of Class Imbalance: Bangla Text Classification

Published:2022-04-29 Issue:5 Volume:21 Page:1-21
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Rafi-Ur-Rashid Md.¹^ORCID,Mahbub Mahim¹^ORCID,Adnan Muhammad Abdullah¹^ORCID

Affiliation:

1. Bangladesh University of Engineering & Technology (BUET), Bangladesh, and United International University, Dhaka, Bangladesh

Abstract

This article addresses the class imbalance issue in a low-resource language called Bengali. As a use-case, we choose one of the most fundamental NLP tasks, i.e., text classification, where we utilize three benchmark text corpora: fake-news dataset, sentiment analysis dataset, and song lyrics dataset. Each of them contains a critical class imbalance. We attempt to tackle the problem by applying several strategies that include data augmentation with synthetic samples via text and embedding generation in order to augment the proportion of the minority samples. Moreover, we apply ensembling of deep learning models by subsetting the majority samples. Additionally, we enforce the focal loss function for class-imbalanced data classification. We also apply the outlier detection technique, data resampling, and hidden feature extraction to improve the minority-f1 score. All of our experimentations are entirely focused on textual content analysis, which results in a more than90%minority f1 score for each of the three tasks. It is an excellent outcome on such highly class-imbalanced datasets.

Funder

ICT Division, Government of the People’s Republic of Bangladesh

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3511601

Reference70 articles.

1. Charu C. Aggarwal. 2015. Outlier analysis. In Data Mining. Springer, 237–263.

2. Outlier detection for high dimensional data

3. Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques

4. A Survey of Opinion Mining in Arabic

5. A Large Scale Arabic Sentiment Lexicon for Arabic Opinion Mining

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Depression Intensity Identification using Transformer Ensemble Technique for the Resource-constrained Bengali Language;Journal of Engineering Advancements;2024-05-10

2. Class overlap handling methods in imbalanced domain: A comprehensive survey;Multimedia Tools and Applications;2024-01-11

3. Sentiment Analysis in Low-Resource Settings: A Comprehensive Review of Approaches, Languages, and Data Sources;IEEE Access;2024

4. A Comprehensive Roadmap on Bangla Text-based Sentiment Analysis;ACM Transactions on Asian and Low-Resource Language Information Processing;2023-04-06