Author:
Harwath David,Glass James
Cited by
59 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Recent Advances in Synthesis and Interaction of Speech, Text, and Vision;Electronics;2024-04-30
2. ViLaS: Exploring the Effects of Vision and Language Context in Automatic Speech Recognition;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14
3. Speech Guided Masked Image Modeling for Visually Grounded Speech;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14
4. Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-Training and Multi-Modal Tokens;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14
5. Automatic Speech Recognition Based on Improved Deep Learning;Automatic Speech Recognition and Translation for Low Resource Languages;2024-03-29