1. Triantafyllos Afouras , Joon Son Chung, and Andrew Zisserman . 2018 . LRS 3-TED: a large-scale dataset for visual speech recognition. arXiv preprint arXiv:1809.00496 (2018). Triantafyllos Afouras, Joon Son Chung, and Andrew Zisserman. 2018. LRS3-TED: a large-scale dataset for visual speech recognition. arXiv preprint arXiv:1809.00496 (2018).
2. Yannis M Assael , Brendan Shillingford , Shimon Whiteson , and Nando De Freitas . 2016 . Lipnet: End-to-end sentence-level lipreading. arXiv preprint arXiv:1611.01599 (2016). Yannis M Assael, Brendan Shillingford, Shimon Whiteson, and Nando De Freitas. 2016. Lipnet: End-to-end sentence-level lipreading. arXiv preprint arXiv:1611.01599 (2016).
3. Alexei Baevski , Yuhao Zhou , Abdelrahman Mohamed , and Michael Auli . 2020. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems 33 ( 2020 ), 12449--12460. Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems 33 (2020), 12449--12460.
4. Linchao Bao , Haoxian Zhang , Yue Qian , Tangli Xue , Changhai Chen , Xuefei Zhe , and Di Kang . 2023. Learning Audio-Driven Viseme Dynamics for 3D Face Animation. arXiv preprint arXiv:2301.06059 ( 2023 ). Linchao Bao, Haoxian Zhang, Yue Qian, Tangli Xue, Changhai Chen, Xuefei Zhe, and Di Kang. 2023. Learning Audio-Driven Viseme Dynamics for 3D Face Animation. arXiv preprint arXiv:2301.06059 (2023).
5. Authentic volumetric avatars from a phone scan;Cao Chen;ACM Transactions on Graphics (TOG),2022