A survey on NLP tasks, resources and techniques for low-resource Telugu-English code-mixed text-Reference-Cited by-同舟云学术

A survey on NLP tasks, resources and techniques for low-resource Telugu-English code-mixed text

Published:2024-09-10 Issue: Volume: Page:
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Maddu Sandeep¹^ORCID,Sanapala Viziananda Row¹^ORCID

Affiliation:

1. Computer Science & Systems Engineering, Andhra University College of Engineering, Vishakhapatnam, India

Abstract

With the proliferation of informal content on various social media platforms in the form of posts, comments, and feedback, the importance of analyzing text in code-mixed form is gaining importance. Telugu, a low-resource Indian language, has a lot of online content being generated in code-mixed form. However, the lack of large corpora, annotated data and Natural Language Processing (NLP) resources are impeding research on Telugu-English code-mixed data. This paper provides a survey of existing literature on Telugu-English code-mixed text in the areas of resources, POS tagging, Named Entity Recognition, language identification, sentiment analysis, application tasks, dialog systems, and Question-Answering. Various datasets being used by the researchers in the field, along with methods applied to them are detailed. Research gaps are identified to provide future direction for researchers working in this field.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3695766

Reference65 articles.

1. Gazi Imtiyaz Ahmad and Jimmy Singla. 2021. Sentiment Analysis of Code-Mixed Social Media Text (SA-CMSMT) in Indian-Languages. In 2021 International Conference on Computing Sciences (ICCS). IEEE, 25–33.

2. Machine Learning Techniques for Sentiment Analysis of Code-Mixed and Switched Indian Social Media Text Corpus - A Comprehensive Review

3. Vibhuti Bansal, Mrinal Tyagi, Rajesh Sharma, Vedika Gupta, and Qin Xin. 2022. A Transformer Based Approach for Abuse Detection in Code Mixed Indic Languages.ACM Transactions on Asian and Low-Resource Language Information Processing (2022).

4. Sentiment analysis for mixed script Indic sentences

5. Improving Code-mixed POS Tagging Using Code-mixed Embeddings