Author:
Tang Duowei,Taseska Maja,van Waterschoot Toon
Abstract
Recent deep neural network based methods provide accurate binaural source localization performance. These data-driven models map measured binaural cues directly to source locations hence their performance highly depend on the training data distribution. In this paper, we propose a parametric embedding that maps the binaural cues to a low-dimensional space where localization can be done with a nearest-neighbor regression. We implement the embedding using a neural network, optimized to map points that are close to each other in the latent space (the space of source azimuths or elevations) to nearby points in the embedding space, thus the Euclidean distances between the embeddings reflect their source proximities, and the structure of the embeddings forms a manifold, which provides interpretability to the embeddings. We show that the proposed embedding generalizes well in various acoustic conditions (with reverberation) different from those encountered during training, and provides better performance than unsupervised embeddings previously used for binaural localization. In addition, the proposed method performs better than or equally well as a feed-forward neural network based model that directly estimates the source locations from the binaural cues, and it has better results than the feed-forward model when a small amount of training data is used. Moreover, we also compare the proposed embedding using both supervised and weakly supervised learning, and show that in both conditions, the resulting embeddings perform similarly well, but the weakly supervised embedding allows to estimate source azimuth and elevation simultaneously.
Funder
Fonds Wetenschappelijk Onderzoek
KU Leuven
HORIZON EUROPE European Research Council
Subject
Computer Science Applications,Biomedical Engineering,Neuroscience (miscellaneous)
Reference46 articles.
1. “The CIPIC HRTF database,”;Algazi,2001
2. Image method for efficiently simulating small-room acoustics prediction of energy decay in room impulse responses simulated with an image-source model image method for efficiently simulating small-room acoustics;Allen;J. Acoust. Soc. Am,1979
3. A survey on sound source localization in robotics: from binaural to array processing methods;Argentieri;Comput. Speech Lang,2015
4. Laplacian eigenmaps for dimensionality reduction and data representation;Belkin;Neural Comput.,2003
5. “Out-of-sample extensions for LLE, Isomap, MDS, Eigenmaps and spectral clustering,”;Bengio,2003