Affiliation:
1. Computer Science & Systems Engineering, Andhra University College of Engineering, Vishakhapatnam, India
Abstract
With the proliferation of informal content on various social media platforms in the form of posts, comments, and feedback, the importance of analyzing text in code-mixed form is gaining importance. Telugu, a low-resource Indian language, has a lot of online content being generated in code-mixed form. However, the lack of large corpora, annotated data and Natural Language Processing (NLP) resources are impeding research on Telugu-English code-mixed data. This paper provides a survey of existing literature on Telugu-English code-mixed text in the areas of resources, POS tagging, Named Entity Recognition, language identification, sentiment analysis, application tasks, dialog systems, and Question-Answering. Various datasets being used by the researchers in the field, along with methods applied to them are detailed. Research gaps are identified to provide future direction for researchers working in this field.
Publisher
Association for Computing Machinery (ACM)
Reference65 articles.
1. Gazi Imtiyaz Ahmad and Jimmy Singla. 2021. Sentiment Analysis of Code-Mixed Social Media Text (SA-CMSMT) in Indian-Languages. In 2021 International Conference on Computing Sciences (ICCS). IEEE, 25–33.
2. Machine Learning Techniques for Sentiment Analysis of Code-Mixed and Switched Indian Social Media Text Corpus - A Comprehensive Review
3. Vibhuti Bansal, Mrinal Tyagi, Rajesh Sharma, Vedika Gupta, and Qin Xin. 2022. A Transformer Based Approach for Abuse Detection in Code Mixed Indic Languages.ACM Transactions on Asian and Low-Resource Language Information Processing (2022).
4. Sentiment analysis for mixed script Indic sentences
5. Improving Code-mixed POS Tagging Using Code-mixed Embeddings