Affiliation:
1. Department of computer engineering and technology, Guru Nanak Dev University, Amritsar, India
Abstract
The appearance of inflammatory language on social media by college or university students is quite prevalent, inspiring platforms to engage in community safety mechanisms. Escalating hate speech entails creating sophisticated artificial intelligence-based, machine learning, and deep learning algorithms to detect offensive internet content. With a few noteworthy exceptions, the majority of the studies on automatic hate speech recognition have emphasized high-resource languages, mainly English. We bridge this gap by addressing hate speech detection in Punjabi (Gurmukhi), a low-resource Indo-Aryan language articulated in Indian educational institutions. This research identifies cross-lingual hate speech in the code-switched English-Punjabi language used on social media. It proposes an approach combining the best hate speech detection techniques to cover existing state-of-the-art system gaps and limitations. In this method, the Roman Punjabi is transliterated, and then Bidirectional Encoder Representations from Transformer (BERT) based models are employed for hate detection. The proposed model has achieved 0.86 precision and 0.83 recall, and various higher educational institutions could employ it to discover the issues/domains where hate prevails the most.
Publisher
Association for Computing Machinery (ACM)