Performance analysis of speech enhancement using spectral gating with U-Net-Reference-Cited by-同舟云学术

Performance analysis of speech enhancement using spectral gating with U-Net

Published:2023-10-01 Issue:5 Volume:74 Page:365-373
ISSN:1339-309X
Container-title:Journal of Electrical Engineering
language:en
Short-container-title:

Author:

Agrawal Jharna¹,Gupta Manish¹,Garg Hitendra¹

Affiliation:

1. GLA University , Mathura , India

Abstract

Abstract Many speech processing systems’ crucial frontends include speech enhancement. Single-channel speech enhancement experiences a number of technological challenges. Due to the advent of cloud-based technology and the use of deep learning systems in big data, deep neural networks in particular have recently been seen as a potent means for complex classification and regression. In this work, spectral gating noise filter is combined with deep neural network U-Net to enhance the performance of speech enhancement network. Further, for performance analysis three distinct objective functions namely, Mean Square Error, Huber Loss and Mean Absolute Error are considered as loss functions. In addition, comparison of three different optimizers Adam, Adagrad and Stochastic Gradient Descent is presented. Proposed system is tested and evaluated on LibriSpeech and NOIZEUS datasets and compared to other state-of-the-art systems. It demonstrates that, in comparison to other state-of-the-art models, the proposed network outperformed them with PESQ scores of 2.737420 for training and 2.67857 for testing, along with better generalization ability.

Publisher

Walter de Gruyter GmbH

Link

https://www.sciendo.com/pdf/10.2478/jee-2023-0044

Reference28 articles.

1. Y. Masuyama, M. Togami and T. Komatsu, “Consistency-aware multi-channel speech enhancement using deep neural networks”, Proceedings 2020 IEEE International Acoustics, Speech and Signal Processing Conference (ICASSP), pp. 821-825, 2020. DOI: 10.1109/ICASSP40776.2020.9053501

2. P. C. Loizou, Speech enhancement: theory and practice, 1st ed. Boca Raton: CRC press, pp. 1-10, 2007.

3. S. Gannot, E. Vincent, S. Markovich-Golan and A. Ozerov, “A consolidated perspective on multi microphone speech enhancement and source separation”, IEEE/ACM Trans. on Audio, Speech, and Language Processing, vol. 25, no. 4, pp. 692-730, 2017. DOI: 10.1109/TASLP.2016.2647702

4. C. Rascon, “Characterization of Deep Learning-Based Speech-Enhancement Techniques in Online Audio Processing Applications”, Sensors, vol. 23, no. 9, p. 4394, 2023. DOI: https://doi.org/10.3390/s23094394

5. H. Garg, B. Sharma, S. Shekhar and R. Agarwal, “Spoofing detection system for e-health digital twin using Efficient Net Convolution Neural Network”, Multimedia Tools and Applications, vol. 81, no. 16, pp. 26873-26888, 2022. DOI: https://doi.org/10.1007/s11042-021-11578-5

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Synthesizing Lithuanian voice replacement for laryngeal cancer patients with Pareto-optimized flow-based generative synthesis network;Applied Acoustics;2024-09