Author:
Rouditchenko Andrew,Boggust Angie,Harwath David,Chen Brian,Joshi Dhiraj,Thomas Samuel,Audhkhasi Kartik,Kuehne Hilde,Panda Rameswar,Feris Rogerio,Kingsbury Brian,Picheny Michael,Torralba Antonio,Glass James
Cited by
32 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Multi‐modal video search by examples—A video quality impact analysis;IET Computer Vision;2024-07-27
2. Speech Guided Masked Image Modeling for Visually Grounded Speech;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14
3. Fine-Grained Features Alignment and Fusion for Text-Video Cross-Modal Retrieval;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14
4. FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild;International Journal of Computer Vision;2024-02-23
5. Cross-Modal Learning for CTC-Based ASR: Leveraging CTC-Bertscore and Sequence-Level Training;2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU);2023-12-16