Sound Source Separation Mechanisms of Different Deep Networks Explained from the Perspective of Auditory Perception-Reference-Cited by-同舟云学术

Sound Source Separation Mechanisms of Different Deep Networks Explained from the Perspective of Auditory Perception

Published:2022-01-14 Issue:2 Volume:12 Page:832
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Li Han,Chen Kean^ORCID,Wang Lei,Liu Jianben,Wan Baoquan,Zhou Bing

Abstract

Thanks to the development of deep learning, various sound source separation networks have been proposed and made significant progress. However, the study on the underlying separation mechanisms is still in its infancy. In this study, deep networks are explained from the perspective of auditory perception mechanisms. For separating two arbitrary sound sources from monaural recordings, three different networks with different parameters are trained and achieve excellent performances. The networks’ output can obtain an average scale-invariant signal-to-distortion ratio improvement (SI-SDRi) higher than 10 dB, comparable with the human performance to separate natural sources. More importantly, the most intuitive principle—proximity—is explored through simultaneous and sequential organization experiments. Results show that regardless of network structures and parameters, the proximity principle is learned spontaneously by all networks. If components are proximate in frequency or time, they are not easily separated by networks. Moreover, the frequency resolution at low frequencies is better than at high frequencies. These behavior characteristics of all three networks are highly consistent with those of the human auditory system, which implies that the learned proximity principle is not accidental, but the optimal strategy selected by networks and humans when facing the same task. The emergence of the auditory-like separation mechanisms provides the possibility to develop a universal system that can be adapted to all sources and scenes.

Funder

Open Fund of State Key Laboratory of Power Grid Environmental Protection

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/12/2/832/pdf

Reference36 articles.

1. Investigation of Speech Separation as a Front-End for Noise Robust Speech Recognition

2. Looking to listen at the cocktail party

3. Speaker separation in realistic noise environments with applications to a cognitively-controlled hearing aid

4. Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks

5. Machine Learning Inspired Sound-Based Amateur Drone Detection for Public Safety Applications

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multilingual Meeting Management with NLP: Automated Minutes, Transcription, and Translation;Lecture Notes in Networks and Systems;2024

2. Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis;Artificial Intelligence Review;2023-10-25

3. Monaural speech separation using WT-Conv-TasNet for hearing aids;International Journal of Speech Technology;2023-09

4. On Neural Architectures for Deep Learning-Based Source Separation of Co-Channel OFDM Signals;ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2023-06-04

5. Trends of Microwave Devices Design Based on Artificial Neural Networks: A Review;Electronics;2022-07-28