Non-Autoregressive Cross-Modal Coherence Modelling-Reference-Cited by-同舟云学术

Non-Autoregressive Cross-Modal Coherence Modelling

Published:2022-10-10 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the 30th ACM International Conference on Multimedia
language:
Short-container-title:

Author:

Bin Yi¹,Shi Wenhao¹,Zhang Jipeng²,Ding Yujuan³,Yang Yang⁴,Shen Heng Tao⁵

Affiliation:

1. University of Electronic Science and Technology of China, Chengdu, China

2. The Hong Kong University of Science and Technology, Hong Kong SAR, China

3. The Hong Kong Polytechnic University, Hong Kong SAR, China

4. University of Electronic Science and Technology of China & Institute of Electronic and Information Engineering of UESTC in Guangdong, Chengdu, Dongguan, China

5. University of Electronic Science and Technology of China & Peng Cheng Laboratory, Chengdu, Shenzhen, China

Funder

China Postdoctoral Science Foundation

Dongguan Songshan Lake Introduction Program of Leading Innovative and Entrepreneurial Talents

National Natural Science Foundation of China

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3503161.3548184

Reference55 articles.

1. Samuel Albanie Arsha Nagrani Andrea Vedaldi and Andrew Zisserman. 2018. Emotion recognition in speech using cross-modal transfer in the wild. In ACM Multimedia. 292--301. Samuel Albanie Arsha Nagrani Andrea Vedaldi and Andrew Zisserman. 2018. Emotion recognition in speech using cross-modal transfer in the wild. In ACM Multimedia. 292--301.

2. Humam Alwassel Dhruv Mahajan Bruno Korbar Lorenzo Torresani Bernard Ghanem and Du Tran. 2020. Self-supervised learning by cross-modal audio-video clustering. In NeurIPS. 9758--9770. Humam Alwassel Dhruv Mahajan Bruno Korbar Lorenzo Torresani Bernard Ghanem and Du Tran. 2020. Self-supervised learning by cross-modal audio-video clustering. In NeurIPS. 9758--9770.

3. Stanislaw Antol , Aishwarya Agrawal , Jiasen Lu , Margaret Mitchell , Dhruv Batra , C Lawrence Zitnick , and Devi Parikh . 2015 . Vqa: Visual question answering. In ICCV. 2425--2433. Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. 2015. Vqa: Visual question answering. In ICCV. 2425--2433.

4. Cross-Modal Scene Networks

5. Dzmitry Bahdanau , Kyung Hyun Cho, and Yoshua Bengio . 2015 . Neural machine translation by jointly learning to align and translate. In ICLR. Dzmitry Bahdanau, Kyung Hyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In ICLR.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Non-autoregressive personalized bundle generation;Information Processing & Management;2024-09

2. More Modalities Mean Better: Vessel Target Recognition and Localization Through Symbiotic Transformer and Multiview Regression;IEEE Transactions on Geoscience and Remote Sensing;2024

3. Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination;Proceedings of the 31st ACM International Conference on Multimedia;2023-10-26