Affiliation:
1. School of Artificial Intelligence, Jianghan University, Wuhan 430050, China
Abstract
When using mobile communication, the voice output from the device is already relatively clear, but in a noisy environment, it is difficult for the listener to obtain the information expressed by the speaker with clarity. Consequently, speech intelligibility enhancement technology has emerged to help alleviate this problem. Speech intelligibility enhancement (IENH) is a technique that enhances speech intelligibility during the reception phase. Previous research has focused on IENH through normal versus different levels of Lombardic speech conversion, inspired by a well-known acoustic mechanism called the Lombard effect. However, these methods often lead to speech distortion and impair the overall speech quality. To address the speech quality degradation problem, we propose an improved (StarGAN)-based IENH framework by combining StarGAN networks with the dual discriminator idea to construct the conversion framework. This approach offers two main advantages: (1) Addition of a speech metric discriminator on top of StarGAN to optimize multiple intelligibility and quality-related metrics simultaneously; (2) a framework that is adaptive to different distal and proximal noise levels with different noise types. Experimental results from objective experiments and subjective preference tests show that our approach outperforms the baseline approach, and these enable IENH to be more widely used.
Funder
National Natural Science Foundation of China
Application Foundation Frontier Special Project of Wuhan Science and Technology Plan Project
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference46 articles.
1. A near-end listening enhancement system by RNN-based noise cancellation and speech modification;Li;Multimed. Tools Appl.,2019
2. Leglaive, S., Alameda-Pineda, X., Girin, L., and Horaud, R. (2020, January 4–8). A recurrent variational autoencoder for speech enhancement. Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain.
3. Yemini, Y., Chazan, S.E., Goldberger, J., and Gannot, S. (2020, January 4–8). A Composite DNN Architecture for Speech Enhancement. Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain.
4. Optimizing speech intelligibility in a noisy environment: A unified view;Kleijn;IEEE Signal Process. Mag.,2015
5. Hussain, A., Chetouani, M., Squartini, S., Bastari, A., and Piazza, F. (2007). Progress in Nonlinear Speech Processing, Springer.