Fast-Slow Transformer for Visually Grounding Speech-Reference-Cited by-同舟云学术

Fast-Slow Transformer for Visually Grounding Speech

Published:2022-05-23 Issue: Volume: Page:
ISSN:
Container-title:ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
language:
Short-container-title:

Author:

Peng Puyuan¹,Harwath David¹

Affiliation:

1. The University of Texas at Austin,Department of Computer Science,Austin,Texas,USA,78712

Publisher

IEEE

Link

Reference43 articles.

2. Bert: Pre-training of deep bidirectional transformers for language understanding;devlin;NAACL,2019

4. Microsoft coco: Common objects in context;lin;ECCV,2014

5. Learning deep features for scene recognition using places database;zhou;NeurIPS,2014

Cited by 13 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. SpeechCLIP+: Self-Supervised Multi-Task Representation Learning for Speech Via Clip and Speech-Image Data;2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW);2024-04-14

2. Integrating Self-Supervised Speech Model with Pseudo Word-Level Targets from Visually-Grounded Speech Model;2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW);2024-04-14

3. Speech Guided Masked Image Modeling for Visually Grounded Speech;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14

4. Perceptual Synchronization Scoring of Dubbed Content using Phoneme-Viseme Agreement;2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW);2024-01-01

5. Visually Grounded Speech Models Have a Mutual Exclusivity Bias;Transactions of the Association for Computational Linguistics;2024