Av-Data2Vec: Self-Supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations-Reference-Cited by-同舟云学术

Av-Data2Vec: Self-Supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations

Published:2023-12-16 Issue: Volume: Page:
ISSN:
Container-title:2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
language:
Short-container-title:

Author:

Lian Jiachen¹,Baevski Alexei²,Hsu Wei-Ning³,Auli Michael³

Affiliation:

1. UC Berkeley

2. Character.AI

3. FAIR, Meta

Publisher

IEEE

Link

Reference46 articles.

3. Learning audio-visual speech representation by masked multimodal cluster prediction;Shi

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

2. Enhancing GAN-based Vocoders with Contrastive Learning Under Data-Limited Condition;2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW);2024-04-14

3. BRAVEn: Improving Self-supervised pre-training for Visual and Auditory Speech Recognition;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14

4. Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14