Sampling Techniques to Overcome Class Imbalance in a Cyberbullying Context-Reference-Cited by-同舟云学术

Sampling Techniques to Overcome Class Imbalance in a Cyberbullying Context

Published:2019-07-16 Issue:1 Volume:3 Page:21
ISSN:2530-9455
Container-title:Journal of Computer-Assisted Linguistic Research
language:
Short-container-title:J. Comp. Assist. Linguist. Res.

Author:

Colton David,Hofmann Markus

Abstract

<div data-canvas-width="705.3003252350338">The majority of datasets suffer from class imbalance where samples of a dominant class significantly outnumber the samples available for the minority class that is to be detected. Prediction and classification machine learning models work best when there are roughly equal numbers of each class type. This paper explores sampling techniques that can be used to overcome this class imbalance problem in a cyberbullying context. A newly classified cyberbullying dataset, including detailed descriptions of the criteria used in its classification, was used to examine the feasibility of applying text mining techniques, to automate the detection of cyberbullying text when the dataset shows a significant class imbalance between the positive, cyberbullying, sample and the negative, not cyberbullying, samples. In this paper, we will investigate if oversampling the minority positive class or undersampling the majority negative class affects the performance of a prediction model. A compromise solution where the positive class is partially oversampled, and the negative class is partially undersampled is also examined. Although not strictly a class imbalance solution, sampling using the most frequently observed features was also explored.</div><p> </p>

Publisher

Universitat Politecnica de Valencia

Subject

General Earth and Planetary Sciences,General Environmental Science

Reference26 articles.

1. Cardie, Claire. 1997. "Improving minority class prediction using case-specific feature weights." Proceedings of the Fourteenth International Conference on Machine Learning. Morgan Kaufmann. 57-65.

2. Chan, Philip K., and Salvatore J. Stolfo. 1998. "Toward Scalable Learning with Non-uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection." In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining. AAAI Press. 164-168.

3. Chawla, Nitesh V. and Bowyer, Kevin W. and Hall, Lawrence O. and Kegelmeyer, W. Philip. 2002. "SMOTE: Synthetic Minority Over-sampling Technique." Journal of Artificial Intelligence Research. 321-357. https://doi.org/10.1613/jair.953

4. Chen, Ying, Yilu Zhou, Sencun Zhu, and Heng Xu. 2012. "Detecting Offensive Language in Social Media to Protect Adolescent Online Safety." Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Confernece on Social Computing (SocialCom). IEEE. 71-80. https://doi.org/10.1109/SocialCom-PASSAT.2012.55

5. Cionnaith, Fiachra Ó. 2012. Third suicide in weeks linked to cyberbullying. Accessed 03 14, 2019. http://www.irishexaminer.com/ireland/third-suicide-in-weeks-linked-to-cyberbullying-212271.html.

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Class imbalance-sensitive approach based on PLMs for the detection of cyberbullying in English and Arabic datasets;Behaviour & Information Technology;2024-02-08

2. Machine Learning-Based Early Warning Level Prediction for Cyanobacterial Blooms Using Environmental Variable Selection and Data Resampling;Toxics;2023-11-23

3. IMBoost: A New Weighting Factor for Boosting to Improve the Classification Performance of Imbalanced Data;Complexity;2023-11-11

4. The effect of rebalancing techniques on the classification performance in cyberbullying datasets;Neural Computing and Applications;2023-11-06

5. Detection of violence using mosaicking and DFE- WLSRF: Deep feature extraction with weighted least square with random forest;Multimedia Tools and Applications;2023-10-11