Speech preprocessing and enhancement based on joint time domain and time-frequency domain analysis-Reference-Cited by-同舟云学术

Speech preprocessing and enhancement based on joint time domain and time-frequency domain analysis

Published:2024-06-01 Issue:6 Volume:155 Page:3580-3588
ISSN:0001-4966
Container-title:The Journal of the Acoustical Society of America
language:en
Short-container-title:

Author:

Zhang Wenbo¹,Xie Xuefeng¹,Du Yanling¹,Huang Dongmei²

Affiliation:

1. College of Information Technology, Shanghai Ocean University 1 , Shanghai, 201306, China

2. Shanghai University of Electric Power 2 , Shanghai, 201306, China

Abstract

Speech enhancement aims to make noisy speech signals clearer. Traditional time-frequency domain methods struggle to differentiate between speech and noise, leading to a risk of speech distortion. This paper introduces an approach that combines the time domain and time-frequency domain using the W-net module to suppress noise at the front end. The module is an improved version of Wave-U-Net, called TTF-W-Net. We conducted experiments using the TIMIT speech and NOISEX-92 noise datasets to evaluate the enhancement performance achieved by integrating preprocessing networks, specifically Wave-U-Net and our TTF-W-Net, into the baseline methods: Phase, FullSubNet+, and DB-AIAT. Experimental results show that TTF-W-Net outperforms the baseline Wave-U-Net by 15.7% on the PESQ metric and the effect of the network by using our preprocessing method is improved. Consequently, the TTF-W-Net preprocessing Net offers effective speech enhancement.

Funder

National Key Research and Development Program of China

National Natural Science Foundation of China

Shanghai Sailing Program

The Open Project of Shanghai Key Laboratory of Trustworthy Computing

Startup Foundation for Young Teachers of Shanghai Ocean University

Publisher

Acoustical Society of America (ASA)

Link

https://pubs.aip.org/asa/jasa/article-pdf/155/6/3580/19973238/3580_1_10.0026219.pdf

Reference27 articles.

1. Chan, W., Jaitly, N., Le, Q. V., and Vinyals, O. (2015). “ Listen, attend and spell,” arXiv:1508.01211.

2. Fullsubnet+: Channel attention fullsubnet with complex spectrograms for speech enhancement,2022

3. Phase-aware speech enhancement with deep complex U-Net,2018

4. Defossez, A., Synnaeve, G., and Adi, Y. (2020). “ Real time speech enhancement in the waveform domain,” arXiv:2006.12847.

5. Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., and Pallett, D. S. (1993). “ DARPA TIMIT acoustic-phonetic continous speech corpus Cd-ROM NIST speech disc 1-1.1,” NASA STI/Recon Technical Report No. 93, 27403.