1. Triantafyllos Afouras , Joon Son Chung , Andrew Senior , Oriol Vinyals , and Andrew Zisserman . 2018 . LRS3-TED: A large-Scale dataset for visual speech recognition . In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP). 66--71 . Triantafyllos Afouras, Joon Son Chung, Andrew Senior, Oriol Vinyals, and Andrew Zisserman. 2018. LRS3-TED: A large-Scale dataset for visual speech recognition. In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP). 66--71.
2. Alexei Baevski , Steffen Schneider , and Michael Auli . 2019. Vq-wav2vec: self-supervised learning of discrete speech representations. arXiv preprint arXiv:1910.05453 ( 2019 ). Alexei Baevski, Steffen Schneider, and Michael Auli. 2019. Vq-wav2vec: self-supervised learning of discrete speech representations. arXiv preprint arXiv:1910.05453 (2019).
3. Speech Fusion to Face: Bridging the Gap Between Human's Vocal Characteristics and Facial Imaging
4. GPU accelerated t-distributed stochastic neighbor embedding
5. Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss