Author:
Wang Yuxin,Song Linsen,Wu Wayne,Qian Chen,He Ran,Loy Chen Change
Abstract
AbstractTalking face generation aims at synthesizing coherent and realistic face sequences given an input speech. The task enjoys a wide spectrum of downstream applications, such as teleconferencing, movie dubbing, and virtual assistant. The emergence of deep learning and cross-modality research has led to many interesting works that address talking face generation. Despite great research efforts in talking face generation, the problem remains challenging due to the need for fine-grained control of face components and the generalization to arbitrary sentences. In this chapter, we first discuss the definition and underlying challenges of the problem. Then, we present an overview of recent progress in talking face generation. In addition, we introduce some widely used datasets and performance metrics. Finally, we discuss open questions, potential future directions, and ethical considerations in this task.
Publisher
Springer International Publishing
Reference120 articles.
1. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Proceedings of the advances in neural information processing systems, vol 27
2. Radford A, Metz L, Chintala S (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. In: Proceedings of the international conference on learning representations
3. Mirza M, Osindero S (2014) Conditional generative adversarial nets. CoRR arXiv:abs/1411.1784
4. Chen L, Li Z, Maddox RK, Duan Z, Xu C (2018) Lip movements generation at a glance. In: Proceedings of the European conference on computer vision, pp 520–535
5. Zhou H, Liu Y, Liu Z, Luo P, Wang X (2019) Talking face generation by adversarially disentangled audio-visual representation. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, no 1, pp 9299–9306
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Audio-Driven Facial Landmark Generation in Violin Performance using 3DCNN Network with Self Attention Model;ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2023-06-04
2. Talking Head Generation for Media Interaction System with Feature Disentanglement;2022 IEEE 28th International Conference on Parallel and Distributed Systems (ICPADS);2023-01