Affiliation:
1. Department of Computer Science Engineering, Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam, Tamil Nadu
Abstract
Social media has become one of the most popular medium of communication and the post may be predominantly unstructured, informal, and frequently misspelled. It has become increasingly common for users to use abusive language in their comments. Detecting offensive language on social media platforms and the presence of such language on the Internet has become a major challenge for modern society. To overcome this challenge, Offensive Language Classification based on the Chaotic Antlion optimization algorithm has been proposed. Initially, the dataset is pre-processed using NLP languages for removing irrelevant data. Consequently, statistical, synthetic, and lexicon features are extracted using various feature extraction techniques. A Chaotic Antlion Optimization Algorithm is used to select the most relevant features during the feature selection phase. After selecting the features, a Ghost network classifies the input data into four classes namely offensive, non-offensive, swear, and offensive but not offensive. The proposed method was evaluated based on a number of variables, including precision, accuracy, specificity, recall, and F-measure. The best classification accuracy is achieved by the suggested method, which is 99.27% for the SOLID dataset and 98.99% for the OLID dataset. The suggested method outperforms the DCNN, Simple Logistics, and CNN methods in terms of overall accuracy by 4.99%, 8.72%, and 10.4%, respectively.
Subject
Artificial Intelligence,General Engineering,Statistics and Probability
Reference27 articles.
1. A meta-analysis of the association between adolescent social media use and depressive symptoms;Ivie;Journal of Affective Disorders,2020
2. Directions in abusive language training data, a systematic review: Garbage in, garbage out;Vidgen;Plos One,2020
3. Pitenis Z. , Zampieri M. and Ranasinghe T. , Offensive language identification in Greek, arXiv preprint arXiv:2003.07459, 2020.
4. Çöltekin Ç. , A corpus of Turkish offensive language on social media, In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 6174–6184), 2020.
5. Mubarak H. , Rashed A. , Darwish K. , Samih Y. and Abdelali A. , Arabic offensive language on twitter: Analysis and experiments, arXiv preprint arXiv:2004.02192, 2020.