Identifying and Analyzing Different Aspects of English-Hindi Code-Switching in Twitter-Reference-Cited by-同舟云学术

Identifying and Analyzing Different Aspects of English-Hindi Code-Switching in Twitter

Published:2019-09-30 Issue:3 Volume:18 Page:1-28
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Rudra Koustav¹^ORCID,Sharma Ashish²,Bali Kalika²,Choudhury Monojit²,Ganguly Niloy¹

Affiliation:

1. Department of CSE, IIT Kharagpur, India

2. Microsoft Research Lab, Bangalore, India

Abstract

Code-switching or the juxtaposition of linguistic units from two or more languages in a single utterance, has, in recent times, become very common in text, thanks to social media and other computer mediated forms of communication. In this exploratory study of English-Hindi code-switching on Twitter, we automatically create a large corpus of code-switched tweets and devise techniques to identify the relationship between successive components in a code-switched tweet. More specifically, we identify pragmatic functions such as narrative-evaluative, negative reinforcement, translation or semantically equivalent statements, and so on characterizing the relation between successive components. We analyze the difference/similarity between switching patterns in code-switched and monolingual multi-component tweets. We observe strong dominance of narrative-evaluative (non-opinion to opinion or vice versa) switching in case of both code-switched and monolingual multi-component tweets in around 40% of cases. Polarity switching appears to be a prevalent switching phenomenon (10%) specifically in code-switched tweets (three to four times higher than monolingual multi-component tweets) where preference of expressing negative sentiment in Hindi is approximately twice compared to English. Positive reinforcement appears to be an important pragmatic function for English multi-component tweets, whereas negative reinforcement plays a key role for Devanagari multi-component tweets. Our results also indicate that the extent and nature of code-switching also strongly depend on the topic (sports, politics, etc.) of discussion.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3314935

Reference79 articles.

1. I may talk in English but gaali toh Hindi mein hi denge : A study of English-Hindi code-switching and swearing pattern on social networks

2. Networked multilingualism: Some language practices on Facebook and their implications

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Artificial Intelligence inspired method for cross-lingual cyberhate detection from low resource languages;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-08-16

2. "We're Not in That Circle of Misinformation": Understanding Community-Based Trusted Messengers Through Cultural Code-Switching;Proceedings of the ACM on Human-Computer Interaction;2024-04-17

3. Code-switching input for machine translation: a case study of Vietnamese–English data;International Journal of Multilingualism;2023-06-27

4. A Language-Free Hate Speech Identification on Code-mixed Conversational Tweets;4th International Conference on Artificial Intelligence and Applied Mathematics in Engineering;2023

5. Satiric parody through Indian English tweets in Twitter;World Englishes;2022-04-30