Detection of Offensive Language and ITS Severity for Low Resource Language

Author:

Saeed Ramsha1ORCID,Afzal Hammad1ORCID,Rauf Sadaf Abdul2ORCID,Iltaf Naima1ORCID

Affiliation:

1. National University of Science and Technology (NUST), Pakistan

2. Fatima Jinnah Women University (FJWU), Pakistan

Abstract

Continuous proliferation of hate speech in different languages on social media has drawn significant attention from researchers in the past decade. Detecting hate speech is indispensable irrespective of the scale of use of language, as it inflicts huge harm on society. This work presents a first resource for classifying the severity of hate speech in addition to classifying offensive and hate speech content. Current research mostly limits hate speech classification to its primary categories, such as racism, sexism, and hatred of religions. However, hate speech targeted at different protected characteristics also manifests in different forms and intensities. It is important to understand varying severity levels of hate speech so that the most harmful cases of hate speech may be identified and dealt with earlier than the less harmful ones. In this work, we focus on detecting offensive speech, hate speech, and multiple levels of hate speech in the Urdu language. We investigate three primary target categories of hate speech: religion, racism, and national origin. We further divide these categories into levels based on the severity of hate conveyed. The severity levels are referred to as symbolization , insult , and attribution . A corpus comprising more than 20,000 tweets against the corresponding hate speech categories and severity levels is collected and annotated. A comprehensive experimentation scheme is applied using traditional as well as deep learning–based models to examine their impact on hate speech detection. The highest macro-averaged F-score yielded for detecting offensive speech is 86% while the highest F-scores for detecting hate speech with respect to ethnicity, national origin, and religious affiliation are 80%, 81%, and 72%, respectively. This shows that results are very encouraging and would provide a lead towards further investigation in this domain.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference62 articles.

1. Swati Agarwal and Ashish Sureka. 2016. But I did not mean it! Intent classification of racist posts on Tumblr. In 2016 European Intelligence and Security Informatics Conference (EISIC’16). IEEE, 124–127.

2. Automatic Detection of Offensive Language for Urdu and Roman Urdu

3. Nuha Albadi, Maram Kurdi, and Shivakant Mishra. 2018. Are they our brothers? Analysis and detection of religious hate speech in the Arabic twittersphere. In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’18). IEEE, 69–76.

4. Ika Alfina, Rio Mulia, Mohamad Ivan Fanany, and Yudo Ekanata. 2017. Hate speech detection in the Indonesian language: A dataset and preliminary study. In 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS’17).

Cited by 7 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3