Author:
Hashmi Ehtesham,Yayilgan Sule Yildirim,Shaikh Sarang
Abstract
AbstractPeople in the modern digital era are increasingly embracing social media platforms to express their concerns and emotions in the form of reviews or comments. While positive interactions within diverse communities can considerably enhance confidence, it is critical to recognize that negative comments can hurt people’s reputations and well-being. Currently, individuals tend to express their thoughts in their native languages on these platforms, which is quite challenging due to potential syntactic ambiguity in these languages. Most of the research has been conducted for resource-aware languages like English. However, low-resource languages such as Urdu, Arabic, and Hindi present challenges due to limited linguistic resources, making information extraction labor-intensive. This study concentrates on code-mixed languages, including three types of text: English, Roman Urdu, and their combination. This study introduces robust transformer-based algorithms to enhance sentiment prediction in code-mixed text, which is a combination of Roman Urdu and English in the same context. Unlike conventional deep learning-based models, transformers are adept at handling syntactic ambiguity, facilitating the interpretation of semantics across various languages. We used state-of-the-art transformer-based models like Electra, code-mixed BERT (cm-BERT), and Multilingual Bidirectional and Auto-Regressive Transformers (mBART) to address sentiment prediction challenges in code-mixed tweets. Furthermore, results reveal that mBART outperformed the Electra and cm-BERT models for sentiment prediction in code-mixed text with an overall F1-score of 0.73. In addition to this, we also perform topic modeling to uncover shared characteristics within the corpus and reveal patterns and commonalities across different classes.
Funder
NTNU Norwegian University of Science and Technology
Publisher
Springer Science and Business Media LLC
Reference47 articles.
1. Ahmad GI, Singla J (2022) (lisacmt) language identification and sentiment analysis of english-urdu ‘code-mixed’ text using lstm. In: 2022 international conference on inventive computation technologies (ICICT), IEEE, pp 430–435
2. Alaparthi S, Mishra M (2020) Bidirectional encoder representations from transformers (bert): a sentiment analysis odyssey. eprint2007.01127
3. Ali H, Hashmi E, Yayilgan Yildirim S et al (2024) Analyzing amazon products sentiment: a comparative study of machine and deep learning, and transformer-based techniques. Electronics 13(7):1305
4. Altaf A, Anwar MW, Jamal MH, et al (2023) Exploiting linguistic features for effective sentence-level sentiment analysis in urdu language. Multimedia Tools and Applications pp 1–27
5. Cañete J (2019) Compilation of large spanish unannotated corpora. Zenodo, mayo de
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献