Learning-based robust speaker counting and separation with the aid of spatial coherence-Reference-Cited by-同舟云学术

Learning-based robust speaker counting and separation with the aid of spatial coherence

Published:2023-09-20 Issue:1 Volume:2023 Page:
ISSN:1687-4722
Container-title:EURASIP Journal on Audio, Speech, and Music Processing
language:en
Short-container-title:J AUDIO SPEECH MUSIC PROC.

Author:

Hsu Yicheng,Bai Mingsian R.^ORCID

Abstract

AbstractA three-stage approach is proposed for speaker counting and speech separation in noisy and reverberant environments. In the spatial feature extraction, a spatial coherence matrix (SCM) is computed using whitened relative transfer functions (wRTFs) across time frames. The global activity functions of each speaker are estimated from a simplex constructed using the eigenvectors of the SCM, while the local coherence functions are computed from the coherence between the wRTFs of a time-frequency bin and the global activity function-weighted RTF of the target speaker. In speaker counting, we use the eigenvalues of the SCM and the maximum similarity of the interframe global activity distributions between two speakers as the input features to the speaker counting network (SCnet). In speaker separation, a global and local activity-driven network (GLADnet) is used to extract each independent speaker signal, which is particularly useful for highly overlapping speech signals. Experimental results obtained from the real meeting recordings show that the proposed system achieves superior speaker counting and speaker separation performance compared to previous publications without the prior knowledge of the array configurations.

Funder

National Science and Technology Council (NSTC), Taiwan

Publisher

Springer Science and Business Media LLC

Subject

Electrical and Electronic Engineering,Acoustics and Ultrasonics

Link

https://link.springer.com/content/pdf/10.1186/s13636-023-00298-3.pdf

Reference60 articles.

1. E. Vincent, T. Virtanen, S. Gannot, Audio source separation and speech enhancement (Wiley, USA, 2018)

2. M. Kawamoto, K. Matsuoka, N. Ohnishi, A method of blind separation for convolved nonstationary signals. Neurocomputing 22, 157–171 (1998)

3. H. Buchner, R. Aichner, W. Kellermann, A generalization of blind source separation algorithms for convolutive mixtures based on second-order statistics. IEEE Trans Audio Speech Lang Process 13(1), 120–134 (2005)

4. Z. Koldovsky, P. Tichavsky, Time-domain blind separation of audio sources on the basis of a complete ICA decomposition of an observation space. IEEE Trans Audio Speech Lang Process 19(2), 406–416 (2011)

5. T. Kim, T. Eltoft, T.W. Lee, Independent vector analysis: an extension of ICA to multivariate components, in International Conference on Independent Component Analysis and Signal Separation. (2006), pp.165–172

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Development of a speech separation system using frequency domain blind source separation technique;Multimedia Tools and Applications;2023-09-23