Share What You Already Know: Cross-Language-Script Transfer and Alignment for Sentiment Detection in Code-Mixed Data-Reference-Cited by-同舟云学术

Share What You Already Know: Cross-Language-Script Transfer and Alignment for Sentiment Detection in Code-Mixed Data

Published:2024-07-12 Issue:7 Volume:23 Page:1-15
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Pahari Niraj¹^ORCID,Shimada Kazutaka²^ORCID

Affiliation:

1. Department of Artificial Intelligence, Kyushu Institute of Technology, Iizuka, Japan

2. Department of Artificial Intelligence, Kyushu Institute of Technology, Iizuka Japan

Abstract

Code-switching entails mixing multiple languages. It is an increasingly occurring phenomenon in social media texts. Usually, code-mixed texts are written in a single script, even though the languages involved have different scripts. Pre-trained multilingual models primarily utilize the data in the native script of the language. In existing studies, the code-switched texts are utilized as they are. However, using the native script for each language can generate better representations of the text owing to the pre-trained knowledge. Therefore, a cross-language-script knowledge-sharing architecture utilizing the cross-attention and alignment of the representations of text in individual language scripts was proposed in this study. Experimental results on two different datasets containing Nepali-English and Hindi-English code-switched texts, demonstrate the effectiveness of the proposed method. The interpretation of the model using the model explainability technique illustrates the sharing of language-specific knowledge between language-specific representations.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3661307

Reference42 articles.

1. LAPCA: Language-Agnostic Pretraining with Cross-Lingual Alignment

2. NLP-CIC at SemEval-2020 Task 9: Analysing Sentiment in Code-switching Language Using a Simple Deep-learning Classifier

3. Steven Cao, Nikita Kitaev, and Dan Klein. 2020. Multilingual alignment of contextual word representations. In Proceedings of the International Conference on Learning Representations.

4. Unsupervised and Pseudo-Supervised Vision-Language Alignment in Visual Dialog

5. Unsupervised Cross-lingual Representation Learning at Scale