1. Taylor, S.L., Mahler, M., Theobald, B.J., et al. (2012) Dynamic Units of Visual Speech. Proceedings of the 11th ACM SIGGRAPH/Eurographics Conference on Computer Animation, Lausanne, 29-31 July 2012, 275-284.
2. A Practical and Configurable Lip Sync Method for Games
3. Chen, N., Zhang, Y., Zen, H., et al. (2020) WaveGrad: Estimating Gradients for Waveform Generation. arXiv: 2009.00713.
4. AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis
5. EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation