1. Deep Audio-visual Speech Recognition
2. Lrs3-ted: a large-scale dataset for visual speech recognition;Afouras,2018
3. ASR is All You Need: Cross-Modal Distillation for Lip Reading
4. vqwav2vec: Self-supervised learning of discrete speech representations;Baevski,2019
5. wav2vec 2.0: A framework for self-supervised learning of speech representations;Baevski;Advances in neural information processing systems,2020