Affiliation:
1. SASTRA University, India
2. SASTRA University
Abstract
The use of mixed language in social media has increased and the need of the hour is to detect abusive and offensive content. Hierarchical attention network (HAN) is employed for classifying offensive content both at word and sentence level. Data from Thinkspeak cloud tweets containing annotated Tamil and English text is used as a training set for the HAN model. The attention mechanism captures the significance from both word and sentence levels. Cross-entropy loss function and backpropagation algorithm in the model classify offensive code-mixed text with an accuracy of 0.58. The above model can be employed for classifying other mixed language text too.