Abstract
The multichannel variational autoencoder (MVAE) integrates the rule-based update of a separation matrix and the deep generative model and proves to be a competitive speech separation method. However, the output (global) permutation ambiguity still exists and turns out to be a fundamental problem in applications. In this paper, we address this problem by employing two dedicated encoders. One encodes the speaker identity for the guidance of the output sorting, and the other encodes the linguistic information for the reconstruction of the source signals. The instance normalization (IN) and the adaptive instance normalization (adaIN) are applied to the networks to disentangle the speaker representations from the content representations. The separated sources are arranged in designated order by a symmetric permutation alignment scheme. In the experiments, we test the proposed method in different gender combinations and various reverberant conditions and generalize it to unseen speakers. The results validate its reliable sorting accuracy and good separation performance. The proposed method outperforms the other baseline methods and maintains stable performance, achieving over 20 dB SIR improvement even in high reverberant environments.
Funder
National Natural Science Foundation of China
Subject
Physics and Astronomy (miscellaneous),General Mathematics,Chemistry (miscellaneous),Computer Science (miscellaneous)
Reference35 articles.
1. Makino, S., Lee, T.-W., and Sawada, H. (2007). Blind Speech Separation, Springer.
2. Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces;Hyvärinen;Neural Comput.,2000
3. Lee, I., Hao, J., and Lee, T.-W. (April, January 31). Adaptive Independent Vector Analysis for the Separation of Convoluted Mixtures Using EM Algorithm. Proceedings of the 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, Las Vegas, NV, USA.
4. Independent Vector Analysis for Source Separation Using a Mixture of Gaussians Prior;Hao;Neural Comput.,2010
5. Gu, Z., Lu, J., and Chen, K. (2019, January 15–19). Speech Separation Using Independent Vector Analysis with an Amplitude Variable Gaussian Mixture Model. Proceedings of the INTERSPEECH 2019, Graz, Austria.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献