1. Deep audio-visual speech recognition;Afouras,2018
2. Mobilestylegan: A lightweight convolutional neural network for high-fidelity image synthesis;Belousov,2021
3. Talking-head generation with rhythmic head motion;Chen,2020
4. A simple framework for contrastive learning of visual representations;Chen,2020
5. Chen, L., Maddox, R.K., Duan, Z., Xu, C., 2019. Hierarchical cross-modal talking face generation with dynamic pixel-wise loss. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. CVPR, pp. 7832–7841.