Affiliation:
1. Basic Teaching Department, Inner Mongolia Vocational and Technical College of Communication, Chifeng, 024000, China
Abstract
In current society, speech recognition can perform a variety of functions, such as completing voice commands, enabling speech processing, spoken language translation and facilitating communication. Therefore, the study of speech recognition technology is of high value. However, current speech recognition techniques focus on among clearly expressed spoken words, which poses great challenges for recognition with spoken pronunciation or dialect pronunciation. Some scholars currently use a model combining time-delay neural networks and long and short-term memory networks to build speech recognition systems, but the performance in acoustic recognition is poor. Therefore, the study proposes a convolutional neural network (CNN), time-delay neural network (TDNN) and output-gate projected Gated recurrent by analyzing the deep neural network unit (OPGRU) combined with a composite English speech recognition model. The model can optimize the acoustic model after the introduction of CNN, and the model can accurately recognize pronunciation features and make the model have a wider recognition range. The proposed composite model is compared with the Word error rate (Wer) and runtime metrics in the Mozilla Common Voice dataset. The Wer result of the composite model is 23.42% and the running time is 1418 s. The Wer result of the composite model is 24.61% and the running time is 1385 s. Compared with the TDNN-OPGRU model, the Wer of the composite model decreases by 1.19% but the running time increases by 33 s. The accuracy of the composite model is higher than that of the TDNN-OPGRU model. From a comprehensive consideration, the speech recognition model accuracy has higher priority, so the composite model proposed in the study has better performance.
Publisher
Association for Computing Machinery (ACM)
Reference28 articles.
1. Two-level discriminative speech emotion recognition model with wave field dynamics: A personalized speech emotion recognition method
2. The Reconstruction of a 12-Lead Electrocardiogram from a Reduced Lead Set Using a Focus Time-Delay Neural Network;Smith G;Acta Cardiologica Sinica,2021
3. 1D-CNN: Speech Emotion Recognition System Using a Stacked Network with Dilated CNN Features
4. Macphail M E , Connell N T , Totten D J , Gray MT , Pisoni D , Yates C W , Nelson R F . Speech Recognition Outcomes in Adults with Slim Straight and Slim Modiolar Cochlear Implant Electrode Arrays. Otolaryngology–Head and Neck Surgery , 2022 , 166(5): 943-950. Macphail M E, Connell N T, Totten D J, Gray MT, Pisoni D, Yates C W, Nelson R F. Speech Recognition Outcomes in Adults with Slim Straight and Slim Modiolar Cochlear Implant Electrode Arrays. Otolaryngology–Head and Neck Surgery, 2022, 166(5):943-950.
5. Deep learning based assistive technology on audio visual speech recognition for hearing impaired
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Hybrid architecture CNN-BLSTM for automatic speech recognition;2024 3rd International Conference on Artificial Intelligence For Internet of Things (AIIoT);2024-05-03
2. An Approach to Recognize Speech Using Convolutional Neural Network for the Multilingual Language;2023 Global Conference on Information Technologies and Communications (GCITC);2023-12-01
3. English Speech Recognition Model Based on Improved Neural Network;2023 International Conference on Network, Multimedia and Information Technology (NMITCON);2023-09-01