Affiliation:
1. Department of Computer Science, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Coyoacán 04510, Mexico
Abstract
The Demucs-Denoiser model has been recently shown to achieve a high level of performance for online speech enhancement, but assumes that only one speech source is present in the fed mixture. In real-life multiple-speech-source scenarios, it is not certain which speech source will be enhanced. To correct this issue, two target selection strategies for the Demucs-Denoiser model are proposed and evaluated: (1) an embedding-based strategy, using a codified sample of the target speech, and (2) a location-based strategy, using a beamforming-based prefilter to select the target that is in front of a two-microphone array. In this work, it is shown that while both strategies improve the performance of the Demucs-Denoiser model when one or more speech interferences are present, they both have their pros and cons. Specifically, the beamforming-based strategy achieves overall a better performance (increasing the output SIR between 5 and 10 dB) compared to the embedding-based strategy (which only increases the output SIR by 2 dB and only in low-input-SIR scenarios). However, the beamforming-based strategy is sensitive against the location variation of the target speech source (decreasing the output SIR by 10 dB if the target speech source is located only 0.1 m from its expected position), which the embedding-based strategy does not suffers from.
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference42 articles.
1. Fundamentals, present and future perspectives of speech enhancement;Das;Int. J. Speech Technol.,2021
2. Front-end speech enhancement for commercial speaker verification systems;Eskimez;Speech Commun.,2018
3. Porov, A., Oh, E., Choo, K., Sung, H., Jeong, J., Osipov, K., and Francois, H. (2018, January 17–20). Music Enhancement by a Novel CNN Architecture. Proceedings of the Audio Engineering Society Convention 145, New York, NY, USA.
4. Improving listeners’ experience for movie playback through enhancing dialogue clarity in soundtracks;Lopatka;Digit. Signal Process.,2016
5. Guaranteed response times in a hard-real-time environment;Leinbaugh;IEEE Trans. Softw. Eng.,1980
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献