Augmenting sentiment prediction capabilities for code-mixed tweets with multilingual transformers-Reference-Cited by-同舟云学术

Augmenting sentiment prediction capabilities for code-mixed tweets with multilingual transformers

Published:2024-04-15 Issue:1 Volume:14 Page:
ISSN:1869-5469
Container-title:Social Network Analysis and Mining
language:en
Short-container-title:Soc. Netw. Anal. Min.

Author:

Hashmi Ehtesham,Yayilgan Sule Yildirim,Shaikh Sarang

Abstract

AbstractPeople in the modern digital era are increasingly embracing social media platforms to express their concerns and emotions in the form of reviews or comments. While positive interactions within diverse communities can considerably enhance confidence, it is critical to recognize that negative comments can hurt people’s reputations and well-being. Currently, individuals tend to express their thoughts in their native languages on these platforms, which is quite challenging due to potential syntactic ambiguity in these languages. Most of the research has been conducted for resource-aware languages like English. However, low-resource languages such as Urdu, Arabic, and Hindi present challenges due to limited linguistic resources, making information extraction labor-intensive. This study concentrates on code-mixed languages, including three types of text: English, Roman Urdu, and their combination. This study introduces robust transformer-based algorithms to enhance sentiment prediction in code-mixed text, which is a combination of Roman Urdu and English in the same context. Unlike conventional deep learning-based models, transformers are adept at handling syntactic ambiguity, facilitating the interpretation of semantics across various languages. We used state-of-the-art transformer-based models like Electra, code-mixed BERT (cm-BERT), and Multilingual Bidirectional and Auto-Regressive Transformers (mBART) to address sentiment prediction challenges in code-mixed tweets. Furthermore, results reveal that mBART outperformed the Electra and cm-BERT models for sentiment prediction in code-mixed text with an overall F1-score of 0.73. In addition to this, we also perform topic modeling to uncover shared characteristics within the corpus and reveal patterns and commonalities across different classes.

Funder

NTNU Norwegian University of Science and Technology

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s13278-024-01245-6.pdf

Reference47 articles.

1. Ahmad GI, Singla J (2022) (lisacmt) language identification and sentiment analysis of english-urdu ‘code-mixed’ text using lstm. In: 2022 international conference on inventive computation technologies (ICICT), IEEE, pp 430–435

2. Alaparthi S, Mishra M (2020) Bidirectional encoder representations from transformers (bert): a sentiment analysis odyssey. eprint2007.01127

3. Ali H, Hashmi E, Yayilgan Yildirim S et al (2024) Analyzing amazon products sentiment: a comparative study of machine and deep learning, and transformer-based techniques. Electronics 13(7):1305

4. Altaf A, Anwar MW, Jamal MH, et al (2023) Exploiting linguistic features for effective sentence-level sentiment analysis in urdu language. Multimedia Tools and Applications pp 1–27

5. Cañete J (2019) Compilation of large spanish unannotated corpora. Zenodo, mayo de

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A robust hybrid approach with product context-aware learning and explainable AI for sentiment analysis in Amazon user reviews;Electronic Commerce Research;2024-08-31

2. Event-Oriented State Alignment Network for Weakly Supervised Temporal Language Grounding;Entropy;2024-08-27

3. Securing tomorrow: a comprehensive survey on the synergy of Artificial Intelligence and information security;AI and Ethics;2024-07-30

4. Enhancing Misogyny Detection in Bilingual Texts Using FastText and Explainable AI;2024 International Conference on Engineering & Computing Technologies (ICECT);2024-05-23

5. Enhancing Multilingual Hate Speech Detection: From Language-Specific Insights to Cross-Linguistic Integration;IEEE Access;2024