Speech Localization at Low Bitrates in Wireless Acoustics Sensor Networks-Reference-Cited by-同舟云学术

Speech Localization at Low Bitrates in Wireless Acoustics Sensor Networks

Published:2022-03-17 Issue: Volume:2 Page:
ISSN:2673-8198
Container-title:Frontiers in Signal Processing
language:
Short-container-title:Front. Signal Process.

Author:

Mansali Mariem Bouafif,Zarazaga Pablo Pérez,Bäckström Tom,Lachiri Zied

Abstract

The use of speech source localization (SSL) and its applications offer great possibilities for the design of speaker local positioning systems with wireless acoustic sensor networks (WASNs). Recent works have shown that data-driven front-ends can outperform traditional algorithms for SSL when trained to work in specific domains, depending on factors like reverberation and noise levels. However, such localization models consider localization directly from raw sensor observations, without consideration for transmission losses in WASNs. In contrast, when sensors reside in separate real-life devices, we need to quantize, encode and transmit sensor data, decreasing the performance of localization, especially when the transmission bitrate is low. In this work, we investigate the effect of low bitrate transmission on a Direction of Arrival (DoA) estimator. We analyze a deep neural network (DNN) based framework performance as a function of the audio encoding bitrate for compressed signals by employing recent communication codecs including PyAWNeS, Opus, EVS, and Lyra. Experimental results show that training the DNN on input encoded with the PyAWNeS codec at 16.4 kB/s can improve the accuracy significantly, and up to 50% of accuracy degradation at a low bitrate for almost all codecs can be recovered. Our results further show that for the best accuracy of the trained model when one of the two channels can be encoded with a bitrate higher than 32 kB/s, it is optimal to have the raw data for the second channel. However, for a lower bitrate, it is preferable to similarly encode the two channels. More importantly, for practical applications, a more generalized model trained with a randomly selected codec for each channel, shows a large accuracy gain when at least one of the two channels is encoded with PyAWNeS.

Publisher

Frontiers Media SA

Reference30 articles.

1. PyAWNeS-Codec: Speech and Audio Codec for Ad-Hoc Acoustic Wireless Sensor Networks;Bäckström,2021

2. Greedy Layerwise Training of Deep Networks;Bengio;Adv. Neural Inf. Process. Syst.,2007

3. Representation Learning: A Review and New Perspectives;Bengio;IEEE Trans. Pattern Anal. Mach. Intell.,2013

4. Codec for Enhanced Voice Services (Evs)-The New 3gpp Codec for Communication;Bruhn,2012