Lexicon‐based fine‐tuning of multilingual language models for low‐resource language sentiment analysis-Reference-Cited by-同舟云学术

Lexicon‐based fine‐tuning of multilingual language models for low‐resource language sentiment analysis

Published:2024-04 Issue: Volume: Page:
ISSN:2468-2322
Container-title:CAAI Transactions on Intelligence Technology
language:en
Short-container-title:CAAI Trans on Intel Tech

Author:

Dhananjaya Vinura¹,Ranathunga Surangika¹²^ORCID,Jayasena Sanath¹

Affiliation:

1. Department of Computer Science and Engineering University of Moratuwa Moratuwa Sri Lanka

2. School of Mathematical and Computational Sciences Massey University Palmerston North New Zealand

Abstract

AbstractPre‐trained multilingual language models (PMLMs) such as mBERT and XLM‐R have shown good cross‐lingual transferability. However, they are not specifically trained to capture cross‐lingual signals concerning sentiment words. This poses a disadvantage for low‐resource languages (LRLs) that are under‐represented in these models. To better fine‐tune these models for sentiment classification in LRLs, a novel intermediate task fine‐tuning (ITFT) technique based on a sentiment lexicon of a high‐resource language (HRL) is introduced. The authors experiment with LRLs Sinhala, Tamil and Bengali for a 3‐class sentiment classification task and show that this method outperforms vanilla fine‐tuning of the PMLM. It also outperforms or is on‐par with basic ITFT that relies on an HRL sentiment classification dataset.

Funder

University of Moratuwa

Publisher

Institution of Engineering and Technology (IET)

Reference42 articles.

1. Are All Languages Created Equal in Multilingual BERT?

2. A primer on pretrained Multilingual Language Models;Doddapaneni S.;CoRR,2021

3. Intermediate‐task transfer learning with BERT for sarcasm detection;Savini E.;Mathematics,2022