Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter-Reference-Cited by-同舟云学术

Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter

Published:2020-11-15 Issue:4 Volume:7 Page:52
ISSN:2227-9709
Container-title:Informatics
language:en
Short-container-title:Informatics

Author:

Talpur Bandeh Ali,O’Sullivan Declan^ORCID

Abstract

Twitter enables millions of active users to send and read concise messages on the internet every day. Yet some people use Twitter to propagate violent and threatening messages resulting in cyberbullying. Previous research has focused on whether cyberbullying behavior exists or not in a tweet (binary classification). In this research, we developed a model for detecting the severity of cyberbullying in a tweet. The developed model is a feature-based model that uses features from the content of a tweet, to develop a machine learning classifier for classifying the tweets as non-cyberbullied, and low, medium, or high-level cyberbullied tweets. In this study, we introduced pointwise semantic orientation as a new input feature along with utilizing predicted features (gender, age, and personality type) and Twitter API features. Results from experiments with our proposed framework in a multi-class setting are promising both with respect to Kappa (84%), classifier accuracy (93%), and F-measure (92%) metric. Overall, 40% of the classifiers increased performance in comparison with baseline approaches. Our analysis shows that features with the highest odd ratio: for detecting low-level severity include: age group between 19–22 years and users with <1 year of Twitter account activation; for medium-level severity: neuroticism, age group between 23–29 years, and being a Twitter user between one to two years; and for high-level severity: neuroticism and extraversion, and the number of times tweet has been favorited by other users. We believe that this research using a multi-class classification approach provides a step forward in identifying severity at different levels (low, medium, high) when the content of a tweet is classified as cyberbullied. Lastly, the current study only focused on the Twitter platform; other social network platforms can be investigated using the same approach to detect cyberbullying severity patterns.

Publisher

MDPI AG

Subject

Computer Networks and Communications,Human-Computer Interaction,Communication

Link

https://www.mdpi.com/2227-9709/7/4/52/pdf

Reference98 articles.

1. Detecting variation of emotions in online activities

2. Cyberbullying: causes, effects, and remedies

3. Cyberbullying and Self-Esteem*

Cited by 21 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. SAMME.C2 algorithm for imbalanced multi-class classification;Soft Computing;2024-07-24

2. Detecting Cyberbullying in Twitter: A Multi-Model Approach;2024 4th International Conference on Data Engineering and Communication Systems (ICDECS);2024-03-22

3. Detecting Virtual Harassment in Social Media Using Machine Learning;Lecture Notes in Computer Science;2024

4. NLP Applications—Social Media;Cognitive Informatics in Biomedicine and Healthcare;2024

5. Smart Language Checker: A Machine Learning Solution for Offensive Language detection in Social Media;2023 International Conference on Data Science, Agents & Artificial Intelligence (ICDSAAI);2023-12-21