Affiliation:
1. Instituto Politécnico Nacional, CIC, Mexico City, Mexico
2. Queen Mary University of London, London, United Kingdom
Abstract
Nowadays, social media experience an increase in hostility, which leads to many people suffering from online abusive behavior and harassment. We introduce a new publicly available annotated dataset for abusive language detection in short texts. The dataset includes comments from YouTube, along with contextual information: replies, video, video title, and the original description. The comments in the dataset are labeled as abusive or not and are classified by topic: politics, religion, and other. In particular, we discuss our refined annotation guidelines for such classification. We report a number of strong baselines on this dataset for the tasks of abusive language detection and topic classification, using a number of classifiers and text representations. We show that taking into account the conversational context, namely, replies, greatly improves the classification results as compared with using only linguistic features of the comments. We also study how the classification accuracy depends on the topic of the comment.
Funder
CONACYT, Mexico, Mexican Government
Secretaría de Investigación y Posgrado of the Instituto Politécnico Nacional, Mexico
Plataforma de Aprendizaje Profundo para Tecnologías del Lenguaje of the Laboratorio de Supercómputo of the INAOE, Mexico
Reference73 articles.
1. Multi-label emotion classification using content-based features in Twitter;Ameer;Computación y Sistemas,2021
2. CIC at checkthat! 2021: fake news detection using machine learning and data augmentation;Ashraf,2021
3. Individual vs. group violent threats classification in online discussions;Ashraf,2020
4. Human aggressiveness and reactions towards uncertain decisions;Bashir;International Journal of Advanced and Applied Sciences,2019
5. SemEval-2019 task 5: multilingual detection of hate speech against immigrants and women in Twitter;Basile,2019
Cited by
15 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献