M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval-Reference-Cited by-同舟云学术

M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval

Published:2023-06-04 Issue: Volume: Page:
ISSN:
Container-title:ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
language:
Short-container-title:

Author:

Berry Layne¹,Shih Yi-Jen²,Wang Hsuan-Fu²,Chang Heng-Jui³,Lee Hung-Yi²,Harwath David¹

Affiliation:

1. University of Texas at Austin

2. National Taiwan University

3. MIT CSAIL

Funder

Johns Hopkins University

Publisher

IEEE

Link

http://xplorestaging.ieee.org/ielx7/10094559/10094560/10096882.pdf?arnumber=10096882

Reference25 articles.

1. Learning deep features for scene recognition using places database;zhou;Advances in neural information processing systems,2014

2. Deep multimodal semantic embeddings for speech and images

3. Vision as an interlingua: Learning multilingual semantic embeddings of untranscribed speech;harwath;IEEE International Conference on Acoustics Speech and Signal Processing,2018

4. Unsupervised learning of spoken language with visual context;harwath;Advances in neural information processing systems,2016

5. Text-Free Image-to-Speech Synthesis Using Learned Segmental Units

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. SpeechCLIP+: Self-Supervised Multi-Task Representation Learning for Speech Via Clip and Speech-Image Data;2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW);2024-04-14

2. Integrating Self-Supervised Speech Model with Pseudo Word-Level Targets from Visually-Grounded Speech Model;2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW);2024-04-14

3. Visually Grounded Few-Shot Word Learning in Low-Resource Settings;IEEE/ACM Transactions on Audio, Speech, and Language Processing;2024

4. Visually Grounded Speech Models Have a Mutual Exclusivity Bias;Transactions of the Association for Computational Linguistics;2024