Improvement in Automatic Speech Recognition of South Asian Accent Using Transfer Learning of DeepSpeech2

Author:

Hassan Muhammad Ahmed1ORCID,Rehmat Asim2ORCID,Ghani Khan Muhammad Usman3ORCID,Yousaf Muhammad Haroon4ORCID

Affiliation:

1. Al-Khawarizmi Institute of Computer Science, University of Engineering and Technology, Lahore, Pakistan

2. Department of Computer Engineering, University of Engineering and Technology, Lahore, Pakistan

3. Department of Computer Science, University of Engineering and Technology, Lahore, Pakistan

4. Department of Computer Engineering, University of Engineering and Technology, Taxila, Pakistan

Abstract

Automatic speech recognition (ASR) has ensured a convenient and fast mode of communication between humans and computers. It has become more accurate over the passage of time. However, in majority of ASR systems, the models have been trained using native English accents. While they serve best for native English speakers, their accuracy drops drastically for non-native English accents. Our proposed model covers this limitation for non-native English accents. We fine-tuned the DeepSpeech2 model, pretrained on the native English accent dataset by LibriSpeech. We retrain the model on a subset of the common voice dataset having only South Asian accents using the proposed novel loss function. We experimented with three different layer configurations of model to learn the best features for South Asian accents. Three evaluation parameters, word error rate (WER), match error rate (MER), and word information loss (WIL) were used. The results show that DeepSpeech2 can perform significantly well for South Asian accents if the weights of initial convolutional layers are retained while updating weights of deeper layers in the model (i.e., RNN and fully connected layers). Our model gave WER of 18.08%, which is the minimum error achieved for non-native English accents in comparison with the original model.

Publisher

Hindawi Limited

Subject

General Engineering,General Mathematics

Reference54 articles.

1. From new media to communication;S. Rafaeli;Sage Annual Review of Communication Research: Advancing Communication Science,1988

2. A Novel Blockchain-Based Healthcare System Design and Performance Benchmarking on a Multi-Hosted Testbed

3. Alexa vs. siri vs. cortana vs. google assistant: a comparison of speech-based natural user interfaces;G. López

4. Digital forensic artifacts of the cortana device search cache on windows 10 desktop;P. Domingues

Cited by 9 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3