MIMIC: Misogyny Identification in Multimodal Internet Content in Hindi-English Code-Mixed Language-Reference-Cited by-同舟云学术

MIMIC: Misogyny Identification in Multimodal Internet Content in Hindi-English Code-Mixed Language

Published:2024-04-04 Issue: Volume: Page:
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Singh Aakash¹^ORCID,Sharma Deepawali¹^ORCID,Singh Vivek Kumar²^ORCID

Affiliation:

1. Department of Computer Science, Banaras Hindu University, Varanasi-221005 (India)

2. Department of Computer Science, University of Delhi, Delhi-110007 (India)

Abstract

Over the years, social media has emerged as one of the most popular platforms where people express their views and share thoughts about various aspects. The social media content now includes a variety of components such as text, images, videos etc. One type of interest is memes, which often combine text and images. It is relevant to mention here that, social media being an unregulated platform, sometimes also has instances of discriminatory, offensive and hateful content being posted. Such content adversely affects the online well-being of the users. Therefore, it is very important to develop computational models to automatically detect such content so that appropriate corrective action can be taken. Accordingly, there have been research efforts on automatic detection of such content focused mainly on the texts. However, the fusion of multimodal data (as in memes) creates various challenges in developing computational models that can handle such data, more so in the case of low-resource languages. Among such challenges, the lack of suitable datasets for developing computational models for handling memes in low-resource languages is a major problem. This work attempts to bridge the research gap by providing a large-sized curated dataset comprising 5,054 memes in Hindi-English code-mixed language, which are manually annotated by three independent annotators. It comprises two subtasks: (i) Subtask-1 (Binary classification involving tagging a meme as misogynous or non-misogynous), and (ii) Subtask-2 (multi-label classification of memes into different categories). The data quality is evaluated by computing Krippendorff's alpha. Different computational models are then applied on the data in three settings: text-only, image-only, and multimodal models using fusion techniques. The results show that the proposed multimodal method using the fusion technique may be the preferred choice for the identification of misogyny in multimodal Internet content and that the dataset is suitable for advancing research and development in the area.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3656169

Reference57 articles.

1. Memes in a digital world: Reconciling with a conceptual troublemaker;Shifman L.;Journal of computer-mediated communication,2013

2. Sharma, D., Gupta, V., & Singh, V. K. (2022, December). Detection of homophobia & transphobia in Malayalam and Tamil: Exploring deep learning methods. In International Conference on Advanced Network Technologies and Intelligent Computing (pp. 217-226). Cham: Springer Nature Switzerland.

3. Sharma D. Singh A. & Singh V. K. (2024). THAR-Targeted Hate Speech Against Religion: A high-quality Hindi-English code-mixed Dataset with the Application of Deep Learning Models for Automatic Detection. ACM Transactions on Asian and Low-Resource Language Information Processing.

4. Razavi, A. H., Inkpen, D., Uritsky, S., & Matwin, S. (2010). Offensive language detection using multi-level classification. In Advances in Artificial Intelligence: 23rd Canadian Conference on Artificial Intelligence, Canadian AI 2010, Ottawa, Canada, May 31–June 2, 2010. Proceedings 23 (pp. 16-27). Springer Berlin Heidelberg.

5. Chakraborty, A., Joardar, S., & Sekh, A. A. (2023). Ensemble Classifier for Hindi Hostile Content Detection. ACM Transactions on Asian and Low-Resource Language Information Processing.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Misogynistic attitude detection in YouTube comments and replies: A high-quality dataset and algorithmic models;Computer Speech & Language;2025-01

2. Using Explainable AI (XAI) for Identification of Subjectivity in Hate Speech Annotations for Low-Resource Languages;4th International Workshop on OPEN CHALLENGES IN ONLINE SOCIAL NETWORKS;2024-09-10

3. PsyChatbot: A Psychological Counseling Agent Towards Depressed Chinese Population Based on Cognitive Behavioural Therapy;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-07-05