Affiliation:
1. University of Moratuwa
2. University of Essex
3. Massey University
Abstract
Abstract
Code-mixing and code-switching (CMCS) are prevalent phenomena observed in social media conversations and various other modes of communication. When developing Natural Language Processing (NLP) systems such as sentiment analysers and hate-speech detectors that operate on this social media data, CMCS text poses challenges. Recent studies have demonstrated that prompt-based learning of pre-trained language models (PLMs) outperforms full fine-tuning of PLMs across various NLP tasks. Despite the growing interest in CMCS text classification, the effectiveness of prompt-based learning for the task remains unexplored. Our study endeavours to bridge this gap by examining the impact of prompt-based learning on CMCS text classification. We discern that the performance in CMCS text classification is significantly influenced by the inclusion of multiple scripts and the intensity of code-mixing. In response, we introduce a novel method, Dynamic+AdapterPrompt, which employs distinct models for each script, integrated with adapters. While DynamicPrompt captures the script-specific representation of CMCS text, AdapterPrompt emphasizes capturing the task-oriented functionality. Our experiments span across Sinhala-English, Kannada-English, and Hindi-English datasets, encompassing sentiment classification, hate-speech detection, and humour detection tasks. The outcomes indicate that our proposed method outperforms strong fine-tuning baselines and basic prompting strategies.
Publisher
Research Square Platform LLC