Points2Sound: from mono to binaural audio using 3D point cloud scenes-Reference-Cited by-同舟云学术

Points2Sound: from mono to binaural audio using 3D point cloud scenes

Published:2022-12-29 Issue:1 Volume:2022 Page:
ISSN:1687-4722
Container-title:EURASIP Journal on Audio, Speech, and Music Processing
language:en
Short-container-title:J AUDIO SPEECH MUSIC PROC.

Author:

Lluís Francesc,Chatziioannou Vasileios,Hofmann Alex

Abstract

AbstractFor immersive applications, the generation of binaural sound that matches its visual counterpart is crucial to bring meaningful experiences to people in a virtual environment. Recent studies have shown the possibility of using neural networks for synthesizing binaural audio from mono audio by using 2D visual information as guidance. Extending this approach by guiding the audio with 3D visual information and operating in the waveform domain may allow for a more accurate auralization of a virtual audio scene. We propose Points2Sound, a multi-modal deep learning model which generates a binaural version from mono audio using 3D point cloud scenes. Specifically, Points2Sound consists of a vision network and an audio network. The vision network uses 3D sparse convolutions to extract a visual feature from the point cloud scene. Then, the visual feature conditions the audio network, which operates in the waveform domain, to synthesize the binaural version. Results show that 3D visual information can successfully guide multi-modal deep learning models for the task of binaural synthesis. We also investigate how 3D point cloud attributes, learning objectives, different reverberant conditions, and several types of mono mixture signals affect the binaural audio synthesis performance of Points2Sound for the different numbers of sound sources present in the scene.

Funder

Horizon 2020

Publisher

Springer Science and Business Media LLC

Subject

Electrical and Electronic Engineering,Acoustics and Ultrasonics

Link

https://link.springer.com/content/pdf/10.1186/s13636-022-00265-4.pdf

Reference53 articles.

1. J.F. Culling, M.A. Akeroyd, Spatial hearing. Oxf. Handb. Audit. Sci. Hear. 3, 123–144 (2010)

2. C.W. Robinson, V.M. Sloutsky, When audition dominates vision. Experimental psychology 60(2), 113 (2013)

3. J. Blauert, Spatial Hearing: the Psychophysics of Human Sound Localization (MIT press, Cambridge, 1997)

4. E. Shaw, External ear response and sound localization. Localization of sound: Theory Appl. 3, 30–41 (1982)

5. H. Zhou, X. Xu, D. Lin, X. Wang, Z. Liu, Sep-stereo: visually guided stereophonic audio generation by associating source separation. in European Conference on Computer Vision (Springer, Cham, 2020), pp. 52–69

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Cross-modal generative model for visual-guided binaural stereo generation;Knowledge-Based Systems;2024-07

2. An Context-Aware Intelligent System to Automate the Conversion of 2D Audio to 3D Audio using Signal Processing and Machine Learning;Artificial Intelligence and Fuzzy Logic System;2022-09-24