Identifying and Analyzing Different Aspects of English-Hindi Code-Switching in Twitter

Author:

Rudra Koustav1ORCID,Sharma Ashish2,Bali Kalika2,Choudhury Monojit2,Ganguly Niloy1

Affiliation:

1. Department of CSE, IIT Kharagpur, India

2. Microsoft Research Lab, Bangalore, India

Abstract

Code-switching or the juxtaposition of linguistic units from two or more languages in a single utterance, has, in recent times, become very common in text, thanks to social media and other computer mediated forms of communication. In this exploratory study of English-Hindi code-switching on Twitter, we automatically create a large corpus of code-switched tweets and devise techniques to identify the relationship between successive components in a code-switched tweet. More specifically, we identify pragmatic functions such as narrative-evaluative, negative reinforcement, translation or semantically equivalent statements, and so on characterizing the relation between successive components. We analyze the difference/similarity between switching patterns in code-switched and monolingual multi-component tweets. We observe strong dominance of narrative-evaluative (non-opinion to opinion or vice versa) switching in case of both code-switched and monolingual multi-component tweets in around 40% of cases. Polarity switching appears to be a prevalent switching phenomenon (10%) specifically in code-switched tweets (three to four times higher than monolingual multi-component tweets) where preference of expressing negative sentiment in Hindi is approximately twice compared to English. Positive reinforcement appears to be an important pragmatic function for English multi-component tweets, whereas negative reinforcement plays a key role for Devanagari multi-component tweets. Our results also indicate that the extent and nature of code-switching also strongly depend on the topic (sports, politics, etc.) of discussion.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Cited by 9 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Artificial Intelligence inspired method for cross-lingual cyberhate detection from low resource languages;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-08-16

2. "We're Not in That Circle of Misinformation": Understanding Community-Based Trusted Messengers Through Cultural Code-Switching;Proceedings of the ACM on Human-Computer Interaction;2024-04-17

3. Code-switching input for machine translation: a case study of Vietnamese–English data;International Journal of Multilingualism;2023-06-27

4. A Language-Free Hate Speech Identification on Code-mixed Conversational Tweets;4th International Conference on Artificial Intelligence and Applied Mathematics in Engineering;2023

5. Satiric parody through Indian English tweets in Twitter;World Englishes;2022-04-30

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3