A Multi-Source Separation Approach Based on DOA Cue and DNN-Reference-Cited by-同舟云学术

A Multi-Source Separation Approach Based on DOA Cue and DNN

Published:2022-06-19 Issue:12 Volume:12 Page:6224
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Zhang Yu,Jia Maoshen^ORCID,Jia Xinyu,Pai Tun-Wen

Abstract

Multiple sound source separation in a reverberant environment has become popular in recent years. To improve the quality of the separated signal in a reverberant environment, a separation method based on a DOA cue and a deep neural network (DNN) is proposed in this paper. Firstly, a pre-processing model based on non-negative matrix factorization (NMF) is utilized for recorded signal dereverberation, which makes source separation more efficient. Then, we propose a multi-source separation algorithm combining sparse and non-sparse component points recovery to obtain each sound source signal from the dereverberated signal. For sparse component points, the dominant sound source for each sparse component point is determined by a DOA cue. For non-sparse component points, a DNN is used to recover each sound source signal. Finally, the signals separated from the sparse and non-sparse component points are well matched by temporal correlation to obtain each sound source signal. Both objective and subjective evaluation results indicate that compared with the existing method, the proposed separation approach shows a better performance in the case of a high-reverberation environment.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/12/12/6224/pdf

Reference36 articles.

1. End-to-end attention-based large vocabulary speech recognition;Bahdanau;Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),2016

2. Deep complementary bottleneck features for visual speech recognition;Petridis;Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),2016

3. Binary Sparse Coding of Convolutive Mixtures for Sound Localization and Separation via Spatialization

4. Non-negative hidden Markov modeling of audio with application to source separation;Mysore;Proceedings of the 9th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA’10),2010

5. Single channel speech separation and recognition using loopy belief propagation;Rennie;Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing,2009