CodeFed: Federated Speech Recognition for Low-Resource Code-Switching Detection-Reference-Cited by-同舟云学术

CodeFed: Federated Speech Recognition for Low-Resource Code-Switching Detection

Published:2022-11-17 Issue: Volume: Page:
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Madan Chetan¹,Diddee Harshita¹,Kumar Deepika¹,Mittal Mamta²

Affiliation:

1. Department of Computer Science and Engineering, Bharati Vidyapeeth’s college of Engineering, New Delhi - 110063, India

2. Delhi Skill and Entrepreneurship University, New Delhi, India

Abstract

One common constraint in the practical application of speech recognition is Code Switching. The issue of code-switched languages is especially aggravated in the context of Indian languages - since most massively multilingual models are trained on corpora that are not representative of the diverse set of Indian languages. An associated constraint with such systems is the privacy-intrusive nature of the applications that aim to collate such representative data. To collectively mitigate both problems, this works presents CodeFed: A federated learning-based code-switching detection model that can be deployed to collaboratively trained by leveraging private data from multiple users, without compromising their privacy. Using a representative low-resource Indic dataset, we demonstrate the superior performance of a collaboratively trained global model that is trained using federated learning on three low-resource Indic languages - Gujarati, Tamil and Telugu and draw a comparison of the model with respect to most current work in the field. Finally, to evaluate the practical realizability of the proposed system, CodeFed also discusses the system overview of the label generation architecture which may accompany CodeFed’s possible real-time deployment.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3571732

Reference41 articles.

1. A Survey of Current Datasets for Code-Switching Research

2. Cecilia Montes-Alcalá . 2005 . Dear amigo”: Exploring code-switching in personal letters . In Selected Proceedings of the second workshop on Spanish Sociolinguistics. Cascadilla Proceedings Project Somerville, MA, 102–108 . Cecilia Montes-Alcalá. 2005. Dear amigo”: Exploring code-switching in personal letters. In Selected Proceedings of the second workshop on Spanish Sociolinguistics. Cascadilla Proceedings Project Somerville, MA, 102–108.

3. Investigations on speech recognition systems for low-resource dialectal Arabic–English code-switching speech

4. Shuguang Chen , Gustavo Aguilar , Anirudh Srinivasan , Mona Diab , and Thamar Solorio . 2022 . CALCS 2021 Shared Task: Machine Translation for Code-Switched Data. arXiv preprint arXiv:2202 .09625(2022). Shuguang Chen, Gustavo Aguilar, Anirudh Srinivasan, Mona Diab, and Thamar Solorio. 2022. CALCS 2021 Shared Task: Machine Translation for Code-Switched Data. arXiv preprint arXiv:2202.09625(2022).

5. Language Models for Code-switch Detection of te reo Māori and English in a Low-resource Setting