Author:
Nazeer Imran,Rehman Jawaria,Butt Minaam
Abstract
This research investigates the code-switching dynamics in the Urdu-English multilingual ChatGPT models aimed at discovering the themes, challenges, and implications. Utilizing text data retrieved from online resources, social media platforms and subject-oriented conversations, code switching will be examined through preprocessing and annotation processes. Algorithms are developed to automatically detect and classify code-switching instances, followed by an in-depth analysis of frequency, distribution, and contextual triggers. The study evaluates the role of ChatGPT in code-switched activities by generating text sets and ranking them based on language identification, syntactic coherence, and semantic consistency. Data evidenced that code-switching is often and that ChatGPT can communicate in different languages. The findings will be helpful in the process of refining AI-based natural language processing systems. The work investigates the more detailed perception of language change in digital environments. It provides a basis for designing more welcoming and culturally considerate communication and media tools.
Publisher
Research for Humanity (Private) Limited
Reference22 articles.
1. Ali, A., Jabbar, Q., Malik, N. A., Kiani, H. B., Noreen, Z., & Toan, L. N. (2021). Clausal-Internal Switching in Urdu-English: An Evaluation of the Matrix Language Frame Model. REiLA: Journal of Research and Innovation in Language, 3(3), 159-169. https://doi.org/10.31849/reila.v3i3.6774
2. Ali, F., & Shaikh, A. (2022). A Corpus-based Analysis of Code-switching Patterns in Urdu-English Bilinguals. Cosmic Journal of Linguistics, 1(1), 97-111. https://journals.cosmic.edu.pk/CJL/article/view/68
3. Ali, F. (2023). Constructing identity through code choice and code-switching: Evidence from multilingual Muslim women in Barcelona. Revista Española de Lingüística Aplicada/Spanish Journal of Applied Linguistics, 36(1), 204-233. https://doi.org/10.1075/resla.20015.ali
4. Balloccu, S., Schmidtová, P., Lango, M., & Dušek, O. (2024). Leak, cheat, repeat: Data contamination and evaluation malpractices in closed-source llms. arXiv preprint arXiv:2402.03927. https://doi.org/10.48550/arXiv.2402.03927
5. Doğruöz, A. S., Sitaram, S., & Yong, Z. X. (2023). Representativeness as a forgotten lesson for multilingual and code-switched data collection and preparation. arXiv preprint arXiv:2310.20470. https://doi.org/10.48550/arXiv.2310.20470