Abstract
AbstractA three-stage approach is proposed for speaker counting and speech separation in noisy and reverberant environments. In the spatial feature extraction, a spatial coherence matrix (SCM) is computed using whitened relative transfer functions (wRTFs) across time frames. The global activity functions of each speaker are estimated from a simplex constructed using the eigenvectors of the SCM, while the local coherence functions are computed from the coherence between the wRTFs of a time-frequency bin and the global activity function-weighted RTF of the target speaker. In speaker counting, we use the eigenvalues of the SCM and the maximum similarity of the interframe global activity distributions between two speakers as the input features to the speaker counting network (SCnet). In speaker separation, a global and local activity-driven network (GLADnet) is used to extract each independent speaker signal, which is particularly useful for highly overlapping speech signals. Experimental results obtained from the real meeting recordings show that the proposed system achieves superior speaker counting and speaker separation performance compared to previous publications without the prior knowledge of the array configurations.
Funder
National Science and Technology Council (NSTC), Taiwan
Publisher
Springer Science and Business Media LLC
Subject
Electrical and Electronic Engineering,Acoustics and Ultrasonics
Reference60 articles.
1. E. Vincent, T. Virtanen, S. Gannot, Audio source separation and speech enhancement (Wiley, USA, 2018)
2. M. Kawamoto, K. Matsuoka, N. Ohnishi, A method of blind separation for convolved nonstationary signals. Neurocomputing 22, 157–171 (1998)
3. H. Buchner, R. Aichner, W. Kellermann, A generalization of blind source separation algorithms for convolutive mixtures based on second-order statistics. IEEE Trans Audio Speech Lang Process 13(1), 120–134 (2005)
4. Z. Koldovsky, P. Tichavsky, Time-domain blind separation of audio sources on the basis of a complete ICA decomposition of an observation space. IEEE Trans Audio Speech Lang Process 19(2), 406–416 (2011)
5. T. Kim, T. Eltoft, T.W. Lee, Independent vector analysis: an extension of ICA to multivariate components, in International Conference on Independent Component Analysis and Signal Separation. (2006), pp.165–172
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献