Improving Deep Learning based Automatic Speech Recognition for Gujarati

Author:

Raval Deepang1,Pathak Vyom1,Patel Muktan1,Bhatt Brijesh1

Affiliation:

1. Dharmsinh Desai University, Nadiad, Gujarat, India

Abstract

We present a novel approach for improving the performance of an End-to-End speech recognition system for the Gujarati language. We follow a deep learning-based approach that includes Convolutional Neural Network, Bi-directional Long Short Term Memory layers, Dense layers, and Connectionist Temporal Classification as a loss function. To improve the performance of the system with the limited size of the dataset, we present a combined language model (Word-level language Model and Character-level language model)-based prefix decoding technique and Bidirectional Encoder Representations from Transformers-based post-processing technique. To gain key insights from our Automatic Speech Recognition (ASR) system, we used the inferences from the system and proposed different analysis methods. These insights help us in understanding and improving the ASR system as well as provide intuition into the language used for the ASR system. We have trained the model on the Microsoft Speech Corpus, and we observe a 5.87% decrease in Word Error Rate (WER) with respect to base-model WER.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference57 articles.

1. Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition

2. Dario Amodei Rishita Anubhai Eric Battenberg Carl Case Jared Casper Bryan Catanzaro Jingdong Chen Mike Chrzanowski Adam Coates Greg Diamos Erich Elsen Jesse H. Engel Linxi Fan Christopher Fougner Tony Han Awni Y. Hannun Billy Jun Patrick LeGresley Libby Lin Sharan Narang Andrew Y. Ng Sherjil Ozair Ryan Prenger Jonathan Raiman Sanjeev Satheesh David Seetapun Shubho Sengupta Yi Wang Zhiqian Wang and Chong Wang. 2015. Deep speech 2: End-to-end speech recognition in English and Mandarin. Retrieved from http://arxiv.org/abs/1512.02595.

3. Speech Analysis and Synthesis by Linear Prediction of the Speech Wave

4. The DRAGON system--An overview

5. Automatic speech recognition for under-resourced languages: A survey

Cited by 8 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3