MuLX-QA: Classifying Multi-Labels and Extracting Rationale Spans in Social Media Posts-Reference-Cited by-同舟云学术

MuLX-QA: Classifying Multi-Labels and Extracting Rationale Spans in Social Media Posts

Published:2024-05-06 Issue:3 Volume:18 Page:1-26
ISSN:1559-1131
Container-title:ACM Transactions on the Web
language:en
Short-container-title:ACM Trans. Web

Author:

Poddar Soham¹^ORCID,Mukherjee Rajdeep¹^ORCID,Samad Azlaan²^ORCID,Ganguly Niloy¹^ORCID,Ghosh Saptarshi¹^ORCID

Affiliation:

1. Indian Institute of Technology, Kharagpur, India

2. Leibniz University, Hannover, Germany

Abstract

While social media platforms play an important role in our daily lives in obtaining the latest news and trends from across the globe, they are known to be prone to widespread proliferation of harmful information in different forms leading to misconceptions among the masses. Accordingly, several prior works have attempted to tag social media posts with labels/classes reflecting their veracity, sentiments, hate content, and so on. However, in order to have a convincing impact, it is important to additionally extract the post snippets on which the labelling decision is based. We call such a post snippet the rationale . These rationales significantly improve human trust and debuggability of the predictions, especially when detecting misinformation or stigmas from social media posts. These rationale spans or snippets are also helpful in post-classification social analysis, such as for finding out the target communities in hate-speech, or for understanding the arguments or concerns against the intake of vaccines. Also it is observed that a post may express multiple notions of misinformation, hate, sentiment, and the like. Thus, the task of determining (one or multiple) labels for a given piece of text, along with the text snippets explaining the rationale behind each of the identified labels is a challenging multi-label, multi-rationale classification task, which is still nascent in the literature. While transformer -based encoder-decoder generative models such as BART and T5 are well suited for the task, in this work we show how a relatively simpler encoder-only discriminative question-answering (QA) model can be effectively trained using simple template-based questions to accomplish the task. We thus propose MuLX-QA and demonstrate its utility in producing (label, rationale span) pairs in two different settings: multi-class (on the HateXplain dataset related to hate speech on social media), and multi-label (on the CAVES dataset related to COVID-19 anti-vaccine concerns). MuLX-QA outperforms heavier generative models in both settings. We also demonstrate the relative advantage of our proposed model MuLX-QA over strong baselines when trained with limited data. We perform several ablation studies, and experiments to better understand the effect of training MuLX-QA with different question prompts, and draw interesting inferences. Additionally, we show that MuLX-QA is effective on social media posts in resource-poor non-English languages as well. Finally, we perform a qualitative analysis of our model predictions and compare them with those of our strongest baseline.

Funder

Prime Minister’s Research Fellowship

Ministry of Education, Government of India

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3653303

Reference98 articles.

1. Negin Abadani, Jamshid Mozafari, Afsaneh Fatemi, Mohammd Ali Nematbakhsh, and Arefeh Kazemi. 2021. ParSQuAD: Machine translated squad dataset for persian question answering. In Proceedings of the2021 7th International Conference on Web Research (ICWR). IEEE, 163–168.

2. V. Adarsh P. Arun Kumar V. Lavanya and G. R. Gangadharan. 2023. Fair and explainable depression detection in social media. Information Processing & Management 60 1 (2023) 103168.

3. Alan Aipe, N. Mukuntha, Asif Ekbal, and Sadao Kurohashi. 2018. Deep learning approach towards multi-label classification of crisis related tweets. In Proceedings of the 15th ISCRAM Conference.

4. A Literature Review of Textual Hate Speech Detection Methods and Datasets

5. Rabah Alzaidy, Cornelia Caragea, and C. Lee Giles. 2019. Bi-LSTM-CRF sequence labeling for keyphrase extraction from scholarly documents. In The Web Conference2551–2557.