Deep Neural Network-based Mixed Speech Recognition Technology for Chinese and English

Author:

Han Lei1ORCID

Affiliation:

1. Basic Teaching Department, Inner Mongolia Vocational and Technical College of Communication, Chifeng, 024000, China

Abstract

In the field of human-computer interaction, the current more advanced speech recognition systems are all single speech recognition, and it is urgent to adopt new in-depth learning technology to improve the existing speech recognition system. In this context, this research is based on DNN and investigates mixed speech recognition techniques for both Chinese and English. A single speech recognition algorithm based on DNN is first investigated, and then a new hybrid Chinese and English speech recognition model is constructed by fusing the attention mechanism and CTC loss function. In the construction of the hybrid speech recognition model, the end-to-end model and Transformer framework are used to combine the monotonic alignment property of the CTC loss function, which allows complex sound units to be transformed into characters for easy extraction and recognition. The performance of the constructed models was tested on Chinese speech dataset, English speech dataset and mixed Chinese and English speech dataset to determine the recognition accuracy and speed of the models. The results show that the proposed recognition model achieves 81.2% recognition accuracy and 100 recognition speed/minute on the Chinese-English mixed speech dataset, which is much better than the other three models. This study successfully addresses the need for improved speech recognition systems by introducing a novel hybrid model for mixed Chinese-English speech recognition. The experimental results confirm the superiority of the proposed model, achieving high accuracy and rapid recognition speed. The developed model holds promising potential for enhancing human-computer interaction and enabling efficient communication between Chinese and English speakers.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference20 articles.

1. DNN-based performance measures for predicting error rates in automatic speech recognition and optimizing hearing aid parameters-ScienceDirect;Amcma B;Speech Communication,2019

2. Guest Editors’ Introduction to the Special Issue on Machine Learning Architectures and Accelerators

3. An analytical method to convert between speech recognition thresholds and percentage-correct scores for speech-in-noise tests

4. Xue S , Ren H P . Single sample per person face recognition algorithm based on the robust prototype dictionary and robust variation dictionary construction. IET image processing , 2022 , 16(3):742-754. Xue S, Ren H P. Single sample per person face recognition algorithm based on the robust prototype dictionary and robust variation dictionary construction. IET image processing, 2022, 16(3):742-754.

5. Smits C , Zekveld A A . Approaches to mathematical modeling of context effects in sentence recognition. the Journal of the Acoustical Society of America , 2021 , 149(2):1371-1383. Smits C, Zekveld A A. Approaches to mathematical modeling of context effects in sentence recognition. the Journal of the Acoustical Society of America, 2021, 149(2):1371-1383.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3