1. LipSound 2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading;qu;IEEE Transactions on Neural Networks and Learning Systems,2021
2. Deep Audio-Visual Speech Recognition;afouras;IEEE Transactions on Pattern Analysis and Machine Intelligence,2018
3. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding;devlin;ArXiv,2019