Abstract
Abstract
Sarcasm detection research in the Bengali language so far can be considered to be narrow due to the unavailability of resources. In this paper, we introduce a large-scale self-annotated Bengali corpus for sarcasm detection research problem in the Bengali language named ‘Ben-Sarc’ containing 25,636 comments, manually collected from different public Facebook pages and evaluated by external evaluators. Then we present a complete strategy to utilize different models of traditional machine learning, deep learning, and transfer learning to detect sarcasm from text using the Ben-Sarc corpus. Finally, we demonstrate a comparison between the performance of traditional machine learning, deep learning, and transfer learning models on our Ben-Sarc corpus. Transfer learning using Indic-Transformers Bengali Bidirectional Encoder Representations from Transformers as a pre-trained source model has achieved the highest accuracy of 75.05%. The second-highest accuracy is obtained by the long short-term memory model with 72.48% and Multinomial Naive Bayes is acquired the third highest with 72.36% accuracy for deep learning and machine learning, respectively. The Ben-Sarc corpus is made publicly available in the hope of advancing the Bengali Natural Language Processing Community. The Ben-Sarc is available at https://github.com/sanzanalora/Ben-Sarc.
Publisher
Cambridge University Press (CUP)
Reference57 articles.
1. Contextualized Sarcasm Detection on Twitter
2. Detecting Abusive Comments in Discussion Threads Using Naïve Bayes
3. Joshi, A. , Tripathi, V. , Patel, K. , Bhattacharyya, P. and Carman, M. (2016). Are word embedding-based features useful for sarcasm detection? In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas: Association for Computational Linguistics, pp. 1006–1011.
4. Identification of Multilingual Offense and Troll from Social Media Memes Using Weighted Ensemble of Multimodal Features
5. Ahmed, M. F. , Mahmud, Z. , Biash, Z. T. , Ryen, A. A. N. , Hossain, A. and Ashraf, F. B. (2021). Bangla text dataset and exploratory analysis for online harassment detection. CoRR, abs/2102.02478.