Abstract
Virtual human is widely employed in various industries, including personal assistance, intelligent customer service, and online education, thanks to the rapid development of artificial intelligence. An anthropomorphic digital human can quickly contact people and enhance user experience in human–computer interaction. Hence, we design the human–computer interaction system framework, which includes speech recognition, text-to-speech, dialogue systems, and virtual human generation. Next, we classify the model of talking-head video generation by the virtual human deep generation framework. Meanwhile, we systematically review the past five years’ worth of technological advancements and trends in talking-head video generation, highlight the critical works and summarize the dataset.
Funder
National Key Research and Development Program of China
Fundamental Research Funds for the Central Universities
State Key Laboratory of Media Convergence Production Technology and Systems
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference83 articles.
1. Vdub: Modifying face video of actors for plausible visual alignment to a dubbed audio track;Garrido;Computer Graphics Forum,2015
2. Garrido, P., Valgaerts, L., Rehmsen, O., Thormahlen, T., Perez, P., and Theobalt, C. (2014, January 23–28). Automatic face reenactment. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, USA.
3. Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., and Nießner, M. (July, January 26). Face2face: Real-time face capture and reenactment of rgb videos. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.
4. Bregler, C., Covell, M., and Slaney, M. (1997, January 3–8). Video rewrite: Driving visual speech with audio. Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques, Los Angeles, CA, USA.
5. Realistic mouth-synching for speech-driven talking face using articulatory modelling;Xie;IEEE Trans. Multimed.,2007
Cited by
16 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Human–object interaction detection algorithm based on graph structure and improved cascade pyramid network;Computer Vision and Image Understanding;2024-12
2. A Survey of Cross-Modal Visual Content Generation;IEEE Transactions on Circuits and Systems for Video Technology;2024-08
3. Research on Color Segmentation Algorithms in Visual Communication Technology for Human-Computer Interaction Systems;2024 6th International Conference on Communications, Information System and Computer Engineering (CISCE);2024-05-10
4. Practical Approach Towards Integrating Face Perception and Voice Representation;2024 International Conference on Communication, Computer Sciences and Engineering (IC3SE);2024-05-09
5. Toward Photo-Realistic Facial Animation Generation Based on Keypoint Features;Proceedings of the 2024 16th International Conference on Machine Learning and Computing;2024-02-02