Time-Domain Joint Training Strategies of Speech Enhancement and Intent Classification Neural Models-Reference-Cited by-同舟云学术

Time-Domain Joint Training Strategies of Speech Enhancement and Intent Classification Neural Models

Published:2022-01-04 Issue:1 Volume:22 Page:374
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Ali Mohamed Nabih^ORCID,Falavigna Daniele,Brutti Alessio^ORCID

Abstract

Robustness against background noise and reverberation is essential for many real-world speech-based applications. One way to achieve this robustness is to employ a speech enhancement front-end that, independently of the back-end, removes the environmental perturbations from the target speech signal. However, although the enhancement front-end typically increases the speech quality from an intelligibility perspective, it tends to introduce distortions which deteriorate the performance of subsequent processing modules. In this paper, we investigate strategies for jointly training neural models for both speech enhancement and the back-end, which optimize a combined loss function. In this way, the enhancement front-end is guided by the back-end to provide more effective enhancement. Differently from typical state-of-the-art approaches employing on spectral features or neural embeddings, we operate in the time domain, processing raw waveforms in both components. As application scenario we consider intent classification in noisy environments. In particular, the front-end speech enhancement module is based on Wave-U-Net while the intent classifier is implemented as a temporal convolutional network. Exhaustive experiments are reported on versions of the Fluent Speech Commands corpus contaminated with noises from the Microsoft Scalable Noisy Speech Dataset, shedding light and providing insight about the most promising training approaches.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/22/1/374/pdf

Reference71 articles.

1. A review of the cocktail party effect;Arons;J. Am. Voice I/O Soc.,1992

2. Digital Speech Transmission: Enhancement, Coding and Error Concealment;Vary,2006

3. The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker conditions;Bronkhorst;Acta Acust. United Acust.,2000

4. Selective Attention in Normal and Impaired Hearing

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Blind Source Separation Method Based on Bounded Component Analysis Optimized by the Improved Beetle Antennae Search;Sensors;2023-10-08

2. Direct enhancement of pre-trained speech embeddings for speech processing in noisy conditions;Computer Speech & Language;2023-06

3. Enhancing Embeddings for Speech Classification in Noisy Conditions;Interspeech 2022;2022-09-18

4. Blind Source Separation Based on Double-Mutant Butterfly Optimization Algorithm;Sensors;2022-05-24