Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data-Reference-Cited by-同舟云学术

Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data

Published:2020-07 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence
language:
Short-container-title:

Author:

Fayek Haytham M.¹,Kumar Anurag¹

Affiliation:

1. Facebook Reality Labs

Abstract

Recognizing sounds is a key aspect of computational audio scene analysis and machine perception. In this paper, we advocate that sound recognition is inherently a multi-modal audiovisual task in that it is easier to differentiate sounds using both the audio and visual modalities as opposed to one or the other. We present an audiovisual fusion model that learns to recognize sounds from weakly labeled video recordings. The proposed fusion model utilizes an attention mechanism to dynamically combine the outputs of the individual audio and visual models. Experiments on the large scale sound events dataset, AudioSet, demonstrate the efficacy of the proposed model, which outperforms the single-modal models, and state-of-the-art fusion and multi-modal models. We achieve a mean Average Precision (mAP) of 46.16 on Audioset, outperforming prior state of the art by approximately +4.35 mAP (relative: 10.4%).

Publisher

International Joint Conferences on Artificial Intelligence Organization

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Multimodal Benchmark and Improved Architecture for Zero Shot Learning;2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV);2024-01-03

2. Text-to-Feature Diffusion for Audio-Visual Few-Shot Learning;Lecture Notes in Computer Science;2024

3. Audiovisual Masked Autoencoders;2023 IEEE/CVF International Conference on Computer Vision (ICCV);2023-10-01

4. RNA trafficking and subcellular localization—a review of mechanisms, experimental and predictive methodologies;Briefings in Bioinformatics;2023-07-18

5. A Dataset for Audio-Visual Sound Event Detection in Movies;ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2023-06-04