End-to-end Multi-modal Low-resourced Speech Keywords Recognition Using Sequential Conv2D Nets

Author:

Gambhir Pooja1ORCID,Dev Amita1ORCID,Bansal Poonam1ORCID,Sharma Deepak Kumar1ORCID

Affiliation:

1. Department of Information Technology, Indira Gandhi Delhi Technical University for Women, Kashmere Gate, New Delhi, India

Abstract

Advanced Neural Networks are widely used to recognize multi-modal conversational speech with significant improvements in accuracy automatically. Significantly, Convolutional Neural sheets have retreated cutting-edge performance in Automatic Voice Recognition (AVR) recently more appropriately in English; however, the Hindi language has not been explored and examined well on AVR systems. The work in this article has exposed a three-layered two-dimensional Sequential Convolutional neural architecture. The Sequential Conv2D is an end-to-end system that can instantaneously exploit speech signal spectral and temporal structures. The network has been trained and tested on different cepstral features such as Frequency and Time variant-Mel-Filters, Gamma-tone Filter Cepstral Quantities, Bark-Filter band Coefficients, and Spectrogram features of speech structures. The experiment was performed on two low-resourced speech command datasets; Hindi with 27,145 Speech Keywords developed by TIFR and 23,664 (1-s utterances) of English speech commands by Google TensorFlow and AIY English Speech Commands. The experimental outcome showed that the model achieves significant performance of Convolutional layers trained on spectrograms with 91.60% accuracy, compared to that achieved in other cepstral feature labels for English speech. However, the model achieved an accuracy of 69.65% for Hindi audio words in which bark-frequency cepstral coefficients features outperformed spectrogram features.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference44 articles.

1. P. Gambhir. 2019. Review of Chatbot design and trends. In Proceedings of the Conference on Artificial Intelligence and Speech Technology.

2. M. Chellapriyadharshini A. Toffy and V. Ramasubramanian. 2018. Semi-supervised and active-learning scenarios: Efficient acoustic model refinement for a low resource Indian language. Retrieved from https://arXiv:1810.06635

3. M. Shamsfard. 2019. Challenges and opportunities in processing low resource languages: A study on Persian. In International Conference Language Technologies for All (LT4All).

4. Acoustic Modeling in Speech Recognition: A Systematic Review

5. Poonam Bansal et al. 2015. The State-of-the-art of feature extraction techniques: An overview. In Proceedings of the Computer Society of India (CSI’15), Speech and Language Processing for Human-Machine Communications, Advances in Intelligent Systems and Computing. Springer, 195–207.

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3