CNN-based Robust Sound Source Localization with SRP-PHAT for the Extreme Edge-Reference-Cited by-同舟云学术

CNN-based Robust Sound Source Localization with SRP-PHAT for the Extreme Edge

Published:2023-04-19 Issue:3 Volume:22 Page:1-27
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

Yin Jun¹^ORCID,Verhelst Marian¹^ORCID

Affiliation:

1. ESAT-MICAS KU Leuven, Leuven, Belgium

Abstract

Robust sound source localization for environments with noise and reverberation are increasingly exploiting deep neural networks fed with various acoustic features. Yet, state-of-the-art research mainly focuses on optimizing algorithmic accuracy, resulting in huge models preventing edge-device deployment. The edge, however, urges for real-time low-footprint acoustic reasoning for applications such as hearing aids and robot interactions. Hence, we set off from a robust CNN-based model using SRP-PHAT features, Cross3D [ 16 ], to pursue an efficient yet compact model architecture for the extreme edge. For both the SRP feature representation and neural network, we propose respectively our scalable LC-SRP-Edge and Cross3D-Edge algorithms which are optimized towards lower hardware overhead. LC-SRP-Edge halves the complexity and on-chip memory overhead for the sinc interpolation compared to the original LC-SRP [ 19 ]. Over multiple SRP resolution cases, Cross3D-Edge saves 10.32%~73.71% computational complexity and 59.77%~94.66% neural network weights against the Cross3D baseline. In terms of the accuracy-efficiency tradeoff, the most balanced version ( EM ) requires only 127.1 MFLOPS computation, 3.71 MByte/s bandwidth, and 0.821 MByte on-chip memory in total, while still retaining competitiveness in state-of-the-art accuracy comparisons. It achieves 8.59 ms/frame end-to-end latency on a Rasberry Pi 4B, which is 7.26× faster than the corresponding baseline.

Funder

European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3586996

Reference93 articles.

1. Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks

2. Sharath Adavanne, Archontis Politis, and Tuomas Virtanen. 2018. Direction of arrival estimation for multiple sound sources using convolutional recurrent neural network. In Proceedings of the 2018 26th European Signal Processing Conference. IEEE, 1462–1466.

3. Sharath Adavanne Archontis Politis and Tuomas Virtanen. 2019. Localization detection and tracking of multiple moving sound sources with a convolutional recurrent neural network. In Workshop on Detection and Classification of Acoustic Scenes and Events .

4. Sharath Adavanne Archontis Politis and Tuomas Virtanen. 2019. A multi-room reverberant dataset for sound event localization and detection. In Workshop on Detection and Classification of Acoustic Scenes and Events .

5. Image method for efficiently simulating small‐room acoustics