Abstract
AbstractIn recent years, the speaker-independent, single-channel speech separation problem has made significant progress with the development of deep neural networks (DNNs). However, separating the speech of each interested speaker from an environment that includes the speech of other speakers, background noise, and room reverberation remains challenging. In order to solve this problem, a speech separation method for a noisy reverberation environment is proposed. Firstly, the time-domain end-to-end network structure of a deep encoder/decoder dual-path neural network is introduced in this paper for speech separation. Secondly, to make the model not fall into local optimum during training, a loss function stretched optimal scale-invariant signal-to-noise ratio (SOSISNR) was proposed, inspired by the scale-invariant signal-to-noise ratio (SISNR). At the same time, in order to make the training more appropriate to the human auditory system, the joint loss function is extended based on short-time objective intelligibility (STOI). Thirdly, an alignment operation is proposed to reduce the influence of time delay caused by reverberation on separation performance. Combining the above methods, the subjective and objective evaluation metrics show that this study has better separation performance in complex sound field environments compared to the baseline methods.
Funder
Innovative Research Group Project of the National Natural Science Foundation of China
Publisher
Springer Science and Business Media LLC
Subject
Electrical and Electronic Engineering,Acoustics and Ultrasonics
Reference52 articles.
1. A.W. Bronkhorst, The cocktail party phenomenon: a review of research on speech intelligibility in multiple-talker conditions. Acta Acust. Acust. 86(1), 117–128 (2000)
2. S. Haykin, Z. Chen, The cocktail party problem. Neural Comput. 17(9), 1875–1902 (2005)
3. P. Comom, C. Jutten, Handbook of blind source separation: independent component (Academic Press, Elsevier, Burlington, Analysis and Applications, 2010)
4. S.F. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics Speech and Signal Processing 27(2), 113–120 (1979)
5. K. Yoshii, R. Tomioka, D. Mochihashi, M. Goto, Beyond nmf: time-domain audio source separation without phase reconstruction,” ISMIR (2013), pp.369–374
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献