Spoken Language Identification Using Deep Learning

Author:

Singh Gundeep1ORCID,Sharma Sahil1ORCID,Kumar Vijay2ORCID,Kaur Manjit3ORCID,Baz Mohammed4ORCID,Masud Mehedi5ORCID

Affiliation:

1. Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala, India

2. Computer Science and Engineering Department, National Institute of Technology, Hamirpur, India

3. School of Engineering and Applied Sciences, Bennett University, Greater Noida, India

4. Department of Computer Engineering, College of Computer and Information Technology, Taif University, P.O. Box. 11099, Taif 21994, Saudi Arabia

5. Department of Computer Science, College of Computer and Information Technology, Taif University, P.O. Box. 11099, Taif 21994, Saudi Arabia

Abstract

The process of detecting language from an audio clip by an unknown speaker, regardless of gender, manner of speaking, and distinct age speaker, is defined as spoken language identification (SLID). The considerable task is to recognize the features that can distinguish between languages clearly and efficiently. The model uses audio files and converts those files into spectrogram images. It applies the convolutional neural network (CNN) to bring out main attributes or features to detect output easily. The main objective is to detect languages out of English, French, Spanish, and German, Estonian, Tamil, Mandarin, Turkish, Chinese, Arabic, Hindi, Indonesian, Portuguese, Japanese, Latin, Dutch, Portuguese, Pushto, Romanian, Korean, Russian, Swedish, Tamil, Thai, and Urdu. An experiment was conducted on different audio files using the Kaggle dataset named spoken language identification. These audio files are comprised of utterances, each of them spanning over a fixed duration of 10 seconds. The whole dataset is split into training and test sets. Preparatory results give an overall accuracy of 98%. Extensive and accurate testing show an overall accuracy of 88%.

Funder

Taif University

Publisher

Hindawi Limited

Subject

General Mathematics,General Medicine,General Neuroscience,General Computer Science

Reference45 articles.

1. Reviewing automatic language identification

2. Deep learning for spoken language identification;G. Montavon

3. Spoken language identification using hybrid feature extraction methods;P. Kumar,2010

4. A language model based approach towards large scale and lightweight language identification systems;B. M. L. Srivastava,2015

5. Conditional Generative Adversarial Nets Classifier for Spoken Language Identification

Cited by 39 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. SSTE: Syllable-Specific Temporal Encoding to FORCE-learn audio sequences with an associative memory approach;Neural Networks;2024-09

2. Integrated End-to-End Automatic Speech Recognition for Languages for Agglutinative Languages;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-06-21

3. DFNet: Decoupled Fusion Network for Dialectal Speech Recognition;Mathematics;2024-06-17

4. Speaker-based language identification for Ethio-Semitic languages using CRNN and hybrid features;Network: Computation in Neural Systems;2024-06-04

5. Utilizing Deep Learning Techniques for the Classification of Spoken Languages in India;International Journal of Scientific Research in Computer Science, Engineering and Information Technology;2024-03-11

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3