Performance Evaluation of Deep Neural Networks Applied to Speech Recognition: RNN, LSTM and GRU

Author:

Shewalkar Apeksha1,Nyavanandi Deepika1,Ludwig Simone A.1

Affiliation:

1. Department of Computer Science , North Dakota State University , Fargo , ND, USA

Abstract

Abstract Deep Neural Networks (DNN) are nothing but neural networks with many hidden layers. DNNs are becoming popular in automatic speech recognition tasks which combines a good acoustic with a language model. Standard feedforward neural networks cannot handle speech data well since they do not have a way to feed information from a later layer back to an earlier layer. Thus, Recurrent Neural Networks (RNNs) have been introduced to take temporal dependencies into account. However, the shortcoming of RNNs is that long-term dependencies due to the vanishing/exploding gradient problem cannot be handled. Therefore, Long Short-Term Memory (LSTM) networks were introduced, which are a special case of RNNs, that takes long-term dependencies in a speech in addition to short-term dependencies into account. Similarily, GRU (Gated Recurrent Unit) networks are an improvement of LSTM networks also taking long-term dependencies into consideration. Thus, in this paper, we evaluate RNN, LSTM, and GRU to compare their performances on a reduced TED-LIUM speech data set. The results show that LSTM achieves the best word error rates, however, the GRU optimization is faster while achieving word error rates close to LSTM.

Publisher

Walter de Gruyter GmbH

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Hardware and Architecture,Modeling and Simulation,Information Systems

Reference41 articles.

1. [1] G. E. Hinton, S. Osindero, Y. Teh, A fast learning algorithm for deep belief nets, Neural Computation 18, 1527-1554, 2006.10.1162/neco.2006.18.7.1527

2. [2] A. Rousseau, P. Deléglise, Y. Estève, Enhancing the TED-LIUM Corpus with Selected Data for Language Modeling and More TED Talks. Proceedings of Sventh Language Resources and Evaluation Conference, 3935-3939, May 2014.

3. [3] Y. Gaur, F. Metze, J. P. Bigham, Manipulating Word Lattices to Incorporate Human Corrections, Inter-speech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 2016.10.21437/Interspeech.2016-660

4. [4] E. Busseti, I. Osband, S. Wong, Deep Learning for Time Series Modeling, Seminar on Collaborative Intelligence in the TU Kaiserslautern, Germany, June 2012.

5. [5] Deep Learning for Sequential Data - Part V: Handling Long Term Temporal Dependencies, https://prateekvjoshi.com/2016/05/31/deep-learning-for-sequential-data-part-v-handling-long-term-temporal-dependencies/, last retrieved July 2017.

Cited by 251 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3