Affiliation:
1. Ocean University of China
Abstract
Abstract
Sound event location is a critical aspect of two-dimensional direction-of-arrival (2D-DOA) estimation, predicting azimuth and elevation angles in 3D Cartesian coordinates for active sound events using multi-label regression. Challenges with conventional methods like the multi-signal classification (MUSIC) algorithm and baseline convolution recurrent neural network (BCRNN) include decreased precision and high computational demands, particularly in low signal-to-noise ratio (SNR) environments (SNR\textless-5 dB). Our work introduces an innovative solution, the effective residual self-attention recurrent neural network (ESRNN). ESRNN addresses distortion problems in low SNR conditions caused by the MUSIC algorithm, also enhancing 2D-DOA prediction accuracy in various SNR-reverberation scenarios. We propose two filter structures, ESRNN-L and ESRNN-G, tailored for SNRs above 0 dB and below -5 dB, respectively. Evaluating on TAU Spatial Sound Events 2019 datasets with synthetic SNRs from -10 dB to 30 dB, our experiments demonstrate ESRNN-L achieves a 21 \(%\) lower 2D-DOA error than BCRNN at SNRs below -5 dB. Additionally, ESRNN-G exhibits a 15$%$ lower error with a 10$%$ parameter reduction when SNRs exceed 0 dB. When compared with other principal attention methods through ablation study, it also showcases the model's efficiency and robustness.
Publisher
Research Square Platform LLC