Lightweight End-to-End Speech Enhancement Generative Adversarial Network Using Sinc Convolutions-Reference-Cited by-同舟云学术

Lightweight End-to-End Speech Enhancement Generative Adversarial Network Using Sinc Convolutions

Published:2021-08-18 Issue:16 Volume:11 Page:7564
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Li Lujun^ORCID,Wudamu ^ORCID,Kürzinger Ludwig^ORCID,Watzel Tobias^ORCID,Rigoll Gerhard^ORCID

Abstract

Generative adversarial networks (GANs) have recently garnered significant attention for their use in speech enhancement tasks, in which they generally process and reconstruct speech waveforms directly. Existing GANs for speech enhancement rely solely on the convolution operation, which may not accurately characterize the local information of speech signals—particularly high-frequency components. Sinc convolution has been proposed in order to allow the GAN to learn more meaningful filters in the input layer, and has achieved remarkable success in several speech signal processing tasks. Nevertheless, Sinc convolution for speech enhancement is still an under-explored research direction. This paper proposes Sinc–SEGAN, a novel generative adversarial architecture for speech enhancement, which usefully merges two powerful paradigms: Sinc convolution and the speech enhancement GAN (SEGAN). There are two highlights of the proposed system. First, it works in an end-to-end manner, overcoming the distortion caused by imperfect phase estimation. Second, the system derives a customized filter bank, tuned for the desired application compactly and efficiently. We empirically study the influence of different configurations of Sinc convolution, including the placement of the Sinc convolution layer, length of input signals, number of Sinc filters, and kernel size of Sinc convolution. Moreover, we employ a set of data augmentation techniques in the time domain, which further improve the system performance and its generalization abilities. Compared to competitive baseline systems, Sinc–SEGAN overtakes all of them with drastically reduced system parameters, demonstrating its effectiveness for practical usage, e.g., hearing aid design and cochlear implants. Additionally, data augmentation methods further boost Sinc–SEGAN performance across classic objective evaluation criteria for speech enhancement.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/11/16/7564/pdf

Reference50 articles.

1. Speech Enhancement: Theory and Practice;Loizou,2017

2. Convolutional Neural Networks to Enhance Coded Speech

3. An Individualized Super-Gaussian Single Microphone Speech Enhancement for Hearing Aid Users With Smartphone as an Assistive Device

4. Speech enhancement based on neural networks improves speech intelligibility in noise for cochlear implant users

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Time domain speech enhancement with CNN and time-attention transformer;Digital Signal Processing;2024-04

2. Exploring Multi-Stage GAN with Self-Attention for Speech Enhancement;Applied Sciences;2023-08-14

3. Shuffle Attention U-Net for Speech Enhancement in Time Domain;International Journal of Image and Graphics;2023-03-31

4. Two-Branch Network with Selective Kernel Convolution for Time-Domain Speech Enhancement;2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP);2022-12-11

5. ResectNet: An Efficient Architecture for Voice Activity Detection on Mobile Devices;Interspeech 2022;2022-09-18