Polyphonic sound event localization and detection based on Multiple Attention Fusion ResNet
-
Published:2024
Issue:2
Volume:21
Page:2004-2023
-
ISSN:1551-0018
-
Container-title:Mathematical Biosciences and Engineering
-
language:
-
Short-container-title:MBE
Author:
Zhang Shouming1, Zhang Yaling12, Liao Yixiao2, Pang Kunkun2, Wan Zhiyong2, Zhou Songbin2
Affiliation:
1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China 2. Institute of Intelligent Manufacturing, Guangdong Academy of Science, Guangdong Key Laboratory of Modern Control Technology, Guangzhou 510030, China
Abstract
<abstract>
<p>Sound event localization and detection have been applied in various fields. Due to the polyphony and noise interference, it becomes challenging to accurately predict the sound event and their occurrence locations. Aiming at this problem, we propose a Multiple Attention Fusion ResNet, which uses ResNet34 as the base network. Given the situation that the sound duration is not fixed, and there are multiple polyphonic and noise, we introduce the Gated Channel Transform to enhance the residual basic block. This enables the model to capture contextual information, evaluate channel weights, and reduce the interference caused by polyphony and noise. Furthermore, Split Attention is introduced to the model for capturing cross-channel information, which enhances the ability to distinguish the polyphony. Finally, Coordinate Attention is introduced to the model so that the model can focus on both the channel information and spatial location information of sound events. Experiments were conducted on two different datasets, TAU-NIGENS Spatial Sound Events 2020, and TAU-NIGENS Spatial Sound Events 2021. The results demonstrate that the proposed model significantly outperforms state-of-the-art methods under multiple polyphonic and noise-directional interference environments and it achieves competitive performance under a single polyphonic environment.</p>
</abstract>
Publisher
American Institute of Mathematical Sciences (AIMS)
Subject
Applied Mathematics,Computational Mathematics,General Agricultural and Biological Sciences,Modeling and Simulation,General Medicine
Reference44 articles.
1. T. K. Chan, C. S. Chin, A comprehensive review of polyphonic sound event detection, IEEE Access, 8 (2020), 103339–103373. https://doi.org/10.1109/ACCESS.2020.2999388 2. A. Mesaros, T. Heittola, T. Virtanen, M. D. Plumbley, Sound event detection: A tutorial, IEEE Signal Process Mag., 38 (2021), 67–83. https://doi.org/10.1109/MSP.2021.3090678 3. J. P. Bello, C. Silva, O. Nov, R. L. Dubois, A. Arora, J. Salamon, et al., Sonyc: A system for monitoring, analyzing, and mitigating urban noise pollution, Commun. ACM, 62 (2019), 68–77. https://doi.org/10.1145/3224204 4. T. Hu, C. Zhang, B. Cheng, X. P. Wu, Research on abnormal audio event detection based on convolutional neural network (in Chinese), J. Signal Process., 34 (2018), 357–367. https://doi.org/10.16798/j.issn.1003-0530.2018.03.013 5. D. Stowell, M. Wood, Y. Stylianou, H. Glotin, Bird detection in audio: A survey and a challenge, in 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), (2016), 1–6. https://doi.org/10.1109/MLSP.2016.7738875
|
|