MuLX-QA: Classifying Multi-Labels and Extracting Rationale Spans in Social Media Posts

Author:

Poddar Soham1ORCID,Mukherjee Rajdeep1ORCID,Samad Azlaan2ORCID,Ganguly Niloy1ORCID,Ghosh Saptarshi1ORCID

Affiliation:

1. Indian Institute of Technology, Kharagpur, India

2. Leibniz University, Hannover, Germany

Abstract

While social media platforms play an important role in our daily lives in obtaining the latest news and trends from across the globe, they are known to be prone to widespread proliferation of harmful information in different forms leading to misconceptions among the masses. Accordingly, several prior works have attempted to tag social media posts with labels/classes reflecting their veracity, sentiments, hate content, and so on. However, in order to have a convincing impact, it is important to additionally extract the post snippets on which the labelling decision is based. We call such a post snippet the rationale . These rationales significantly improve human trust and debuggability of the predictions, especially when detecting misinformation or stigmas from social media posts. These rationale spans or snippets are also helpful in post-classification social analysis, such as for finding out the target communities in hate-speech, or for understanding the arguments or concerns against the intake of vaccines. Also it is observed that a post may express multiple notions of misinformation, hate, sentiment, and the like. Thus, the task of determining (one or multiple) labels for a given piece of text, along with the text snippets explaining the rationale behind each of the identified labels is a challenging multi-label, multi-rationale classification task, which is still nascent in the literature. While transformer -based encoder-decoder generative models such as BART and T5 are well suited for the task, in this work we show how a relatively simpler encoder-only discriminative question-answering (QA) model can be effectively trained using simple template-based questions to accomplish the task. We thus propose MuLX-QA and demonstrate its utility in producing (label, rationale span) pairs in two different settings: multi-class (on the HateXplain dataset related to hate speech on social media), and multi-label (on the CAVES dataset related to COVID-19 anti-vaccine concerns). MuLX-QA outperforms heavier generative models in both settings. We also demonstrate the relative advantage of our proposed model MuLX-QA over strong baselines when trained with limited data. We perform several ablation studies, and experiments to better understand the effect of training MuLX-QA with different question prompts, and draw interesting inferences. Additionally, we show that MuLX-QA is effective on social media posts in resource-poor non-English languages as well. Finally, we perform a qualitative analysis of our model predictions and compare them with those of our strongest baseline.

Funder

Prime Minister’s Research Fellowship

Ministry of Education, Government of India

Publisher

Association for Computing Machinery (ACM)

Reference98 articles.

1. Negin Abadani, Jamshid Mozafari, Afsaneh Fatemi, Mohammd Ali Nematbakhsh, and Arefeh Kazemi. 2021. ParSQuAD: Machine translated squad dataset for persian question answering. In Proceedings of the2021 7th International Conference on Web Research (ICWR). IEEE, 163–168.

2. V. Adarsh P. Arun Kumar V. Lavanya and G. R. Gangadharan. 2023. Fair and explainable depression detection in social media. Information Processing & Management 60 1 (2023) 103168.

3. Alan Aipe, N. Mukuntha, Asif Ekbal, and Sadao Kurohashi. 2018. Deep learning approach towards multi-label classification of crisis related tweets. In Proceedings of the 15th ISCRAM Conference.

4. A Literature Review of Textual Hate Speech Detection Methods and Datasets

5. Rabah Alzaidy, Cornelia Caragea, and C. Lee Giles. 2019. Bi-LSTM-CRF sequence labeling for keyphrase extraction from scholarly documents. In The Web Conference2551–2557.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3