Multimodal Network with Cross-Modal Attention for Audio-Visual Event Localization-Reference-Cited by-同舟云学术

Multimodal Network with Cross-Modal Attention for Audio-Visual Event Localization

Published:2022-10-10 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the 3rd International Workshop on Human-Centric Multimedia Analysis
language:
Short-container-title:

Author:

Tan Qianchao¹

Affiliation:

1. University of Electronic Science and Technology of China, Chengdu, China

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3552458.3556450

Reference42 articles.

1. Look, Listen and Learn

2. Objects that Sound

3. Yusuf Aytar , Carl Vondrick , and Antonio Torralba . 2016 . Soundnet: Learning sound representations from unlabeled video. Advances in neural information processing systems , Vol. 29 (2016). Yusuf Aytar, Carl Vondrick, and Antonio Torralba. 2016. Soundnet: Learning sound representations from unlabeled video. Advances in neural information processing systems , Vol. 29 (2016).

4. David A Bulkin and Jennifer M Groh . 2006. Seeing sounds: visual and auditory interactions in the brain. Current opinion in neurobiology , Vol. 16 , 4 ( 2006 ), 415--419. David A Bulkin and Jennifer M Groh. 2006. Seeing sounds: visual and auditory interactions in the brain. Current opinion in neurobiology , Vol. 16, 4 (2006), 415--419.

5. Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Dual-Branch Audio-Visual Event Localization Network Based on The Scatter Loss of Audio-Visual Similarity;2023 IEEE 6th International Conference on Pattern Recognition and Artificial Intelligence (PRAI);2023-08-18