Chinese Speech Enhancement and Adaptive Recognition Technology for Complex Language Environments-Reference-Cited by-同舟云学术

Chinese Speech Enhancement and Adaptive Recognition Technology for Complex Language Environments

Published:2023-07-12 Issue: Volume: Page:
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Gao Ziqi¹^ORCID

Affiliation:

1. School of Chinese Language and Literature, Soochow University, Suzhou, 215000, China

Abstract

The development of intelligent technology has also made rapid progress in relevant speech fields. In order to increase the application scenarios of speech recognition systems, the research has improved the traditional Speech enhancement algorithm, namely the Ideal Binary Mask (IBM) algorithm, and combined it with the unimproved IBM algorithm to propose an adaptive IBM algorithm. Based on this algorithm, the research has built a new speech recognition system, The system uses an FIR filter to realize pre-emphasis processing and uses Berouti spectral subtraction to preprocess speech. The Speech enhancement model is built using a deep learning network model. The results showed that the IBM algorithm had the highest score in the Perceptual Evaluation of Speech Quality (PESQ) at 3.5596, followed by the Ideal Ratio Mask (IRM) algorithm at 3.3429. The improvement of the IBM algorithm was feasible when the noise intensity coefficient was greater than 0.008. When the noise intensity coefficient was greater than 0.08, the average score of the improved IBM algorithm was 2.1079, and the average score of the unimproved IBM algorithm was 1.9418. The proposed adaptive IBM algorithm has higher performance in complex speech environments compared to the original system.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3608950

Reference20 articles.

1. A. M. Raghavan N. Lipschitz J. T. Breen R. N. Samy & G. D. Kohlberg. 2020. Visual Speech Recognition: Improving Speech Perception in Noise through Artificial Intelligence. Otolaryngology-Head and Neck Surgery 163(4) 771-777. https://doi.org/10.1177/0194599820924331 10.1177/0194599820924331

2. A. M. Raghavan N. Lipschitz J. T. Breen R. N. Samy & G. D. Kohlberg. 2020. Visual Speech Recognition: Improving Speech Perception in Noise through Artificial Intelligence. Otolaryngology-Head and Neck Surgery 163(4) 771-777. https://doi.org/10.1177/0194599820924331

3. Algorithm research of spoken English assessment based on fuzzy measure and speech recognition technology

4. An improved unsupervised single-channel speech separation algorithm for processing speech sensor signals;Jiang D.;Wireless Communications and Mobile Computing,2021

5. Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling