An Overview of Indian Spoken Language Recognition from Machine Learning Perspective

Author:

Dey Spandan1,Sahidullah Md2,Saha Goutam1

Affiliation:

1. Indian Institute of Technology, Kharagpur, India

2. Université de Lorraine, CNRS, Inria, LORIA, France

Abstract

Automatic spoken language identification (LID) is a very important research field in the era of multilingual voice-command-based human-computer interaction. A front-end LID module helps to improve the performance of many speech-based applications in the multilingual scenario. India is a populous country with diverse cultures and languages. The majority of the Indian population needs to use their respective native languages for verbal interaction with machines. Therefore, the development of efficient Indian spoken language recognition systems is useful for adapting smart technologies in every section of Indian society. The field of Indian LID has started gaining momentum since the early 2000s, mainly due to the development of several standard multilingual speech corpora for the Indian languages. Even though significant research progress has already been made in this field, to the best of our knowledge, there are not many attempts to analytically review them collectively. In this work, we have conducted one of the very first attempts to present a comprehensive review of the Indian spoken language recognition research field. In-depth analysis has been presented to emphasize the unique challenges of low-resource and mutual influences for developing LID systems in the Indian contexts. Several essential aspects of the Indian LID research, such as the detailed description of the available speech corpora, the major research contributions, including the earlier attempts based on statistical modeling to the recent approaches based on different neural network architectures, and the future research trends are discussed. This review work will help assess the state of the present Indian LID research by any active researcher or any research enthusiasts from related fields.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference207 articles.

1. Spoken Language Recognition: From Fundamentals to Practice

2. An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation

3. Towards Emotion Independent Language Identification System

4. Analysis of score normalization in multilingual speaker recognition;Matějka Pavel;Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’17),2017

5. Eberhard David M. Gary F. Simons and Charles D. Fennig (eds.). 2020. Ethnologue: Languages of the World Twenty-third Edition. SIL International Dallas TX.

Cited by 5 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Towards audio-based identification of Ethio-Semitic languages using recurrent neural network;Scientific Reports;2023-11-07

2. Comparing The Fine-Tuning and Performance of Whisper Pre-Trained Models for Turkish Speech Recognition Task;2023 7th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT);2023-10-26

3. Ethio-Semitic language identification using convolutional neural networks with data augmentation;Multimedia Tools and Applications;2023-09-26

4. Recognizing Indian Languages Speech Sound using Transfer Learning Approach;2023 4th International Conference on Electronics and Sustainable Communication Systems (ICESC);2023-07-06

5. Cross-corpora spoken language identification with domain diversification and generalization;Computer Speech & Language;2023-06

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3