Approaches for Multilingual Phone Recognition in Code-switched and Non-code-switched Scenarios Using Indian Languages-Reference-Cited by-同舟云学术

Approaches for Multilingual Phone Recognition in Code-switched and Non-code-switched Scenarios Using Indian Languages

Published:2021-07-31 Issue:4 Volume:20 Page:1-19
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

E. Manjunath K.¹,M. Srinivasa Raghavan K.²,Rao K. Sreenivasa³,Jayagopi Dinesh Babu²,Ramasubramanian V.²

Affiliation:

1. International Institute of Information Technology Bangalore, U. R. Rao Satellite Centre, ISRO, Bangalore, Karnataka, India

2. International Institute of Information Technology Bangalore, Bangalore, Karnataka, India

3. Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India

Abstract

In this study, we evaluate and compare two different approaches for multilingual phone recognition in code-switched and non-code-switched scenarios. First approach is a front-end Language Identification (LID)-switched to a monolingual phone recognizer (LID-Mono), trained individually on each of the languages present in multilingual dataset. In the second approach, a common multilingual phone-set derived from the International Phonetic Alphabet (IPA) transcription of the multilingual dataset is used to develop a Multilingual Phone Recognition System (Multi-PRS). The bilingual code-switching experiments are conducted using Kannada and Urdu languages. In the first approach, LID is performed using the state-of-the-art i-vectors. Both monolingual and multilingual phone recognition systems are trained using Deep Neural Networks. The performance of LID-Mono and Multi-PRS approaches are compared and analysed in detail. It is found that the performance of Multi-PRS approach is superior compared to more conventional LID-Mono approach in both code-switched and non-code-switched scenarios. For code-switched speech, the effect of length of segments (that are used to perform LID) on the performance of LID-Mono system is studied by varying the window size from 500 ms to 5.0 s, and full utterance. The LID-Mono approach heavily depends on the accuracy of the LID system and the LID errors cannot be recovered. But, the Multi-PRS system by virtue of not having to do a front-end LID switching and designed based on the common multilingual phone-set derived from several languages, is not constrained by the accuracy of the LID system, and hence performs effectively on code-switched and non-code-switched speech, offering low Phone Error Rates than the LID-Mono system.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3437256

Reference64 articles.

1. Mixed Language Speech Recognition without Explicit Identification of Language

2. W. M. Campbell J. Campbell D. A. Reynolds E. Singer and P. A. Torres-Carrasquillo. 2006. Support vector machines for speaker and language recognition. Comput. Speech Lang. 20 2-3 (2006) 210–229. DOI:https://doi.org/10.1016/j.csl.2005.06.003 W. M. Campbell J. Campbell D. A. Reynolds E. Singer and P. A. Torres-Carrasquillo. 2006. Support vector machines for speaker and language recognition. Comput. Speech Lang. 20 2-3 (2006) 210–229. DOI:https://doi.org/10.1016/j.csl.2005.06.003

3. LIBSVM

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Comparative Analysis of Automatic Speech Recognition Techniques;Advances in Computer Science Research;2023