Affiliation:
1. Central University of Tamil Nadu, India
2. University of Galway, Ireland
Abstract
This chapter explores the acquisition of language resources for low-resource languages in digital discourse, focusing on code-mixed content in social media posts and comments. Comprehensive language resources enhance our understanding of language usage patterns in online discussions. The methodology includes data collection, annotation, and automated tools, resulting in a practical framework. Challenges addressed in the chapter include data diversity, annotation complexity, linguistic variability, limited annotated data, and ethical considerations. Solutions proposed range from robust data preprocessing techniques to contextual analysis approaches, ensuring acquired quality and ethical use of resources. The chapter also discusses future directions, paving the way for further research and technology development in natural language processing and sociolinguistics. This chapter equips researchers and practitioners with essential knowledge and tools for advancing linguistic analysis and technological applications in the evolving digital landscape.
Reference32 articles.
1. MSIR@FIRE: A Comprehensive Report from 2013 to 2016
2. DravidianCodeMix: sentiment analysis and offensive language identification dataset for Dravidian languages in code-mixed text
3. Findings of the Sentiment Analysis of Dravidian Languages in Code-Mixed Text.;B. R.Chakravarthi;CEUR Workshop Proceedings,2021
4. Code-switching | Linguistic Benefits & Challenges | Britannica. (n.d.). Retrieved September 16, 2023, from https://www.britannica.com/topic/code-switching
5. Depicting a Neural Model for Lemmatization and POS Tagging of Words from Palaeographic Stone Inscriptions