Human-Computer Interaction System: A Survey of Talking-Head Generation-Reference-Cited by-同舟云学术

Human-Computer Interaction System: A Survey of Talking-Head Generation

Published:2023-01-01 Issue:1 Volume:12 Page:218
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Zhen Rui,Song Wenchao^ORCID,He Qiang,Cao Juan,Shi Lei^ORCID,Luo Jia

Abstract

Virtual human is widely employed in various industries, including personal assistance, intelligent customer service, and online education, thanks to the rapid development of artificial intelligence. An anthropomorphic digital human can quickly contact people and enhance user experience in human–computer interaction. Hence, we design the human–computer interaction system framework, which includes speech recognition, text-to-speech, dialogue systems, and virtual human generation. Next, we classify the model of talking-head video generation by the virtual human deep generation framework. Meanwhile, we systematically review the past five years’ worth of technological advancements and trends in talking-head video generation, highlight the critical works and summarize the dataset.

Funder

National Key Research and Development Program of China

Fundamental Research Funds for the Central Universities

State Key Laboratory of Media Convergence Production Technology and Systems

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/12/1/218/pdf

Reference83 articles.

1. Vdub: Modifying face video of actors for plausible visual alignment to a dubbed audio track;Garrido;Computer Graphics Forum,2015

2. Garrido, P., Valgaerts, L., Rehmsen, O., Thormahlen, T., Perez, P., and Theobalt, C. (2014, January 23–28). Automatic face reenactment. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, USA.

3. Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., and Nießner, M. (July, January 26). Face2face: Real-time face capture and reenactment of rgb videos. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.

4. Bregler, C., Covell, M., and Slaney, M. (1997, January 3–8). Video rewrite: Driving visual speech with audio. Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques, Los Angeles, CA, USA.

5. Realistic mouth-synching for speech-driven talking face using articulatory modelling;Xie;IEEE Trans. Multimed.,2007

Cited by 16 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Human–object interaction detection algorithm based on graph structure and improved cascade pyramid network;Computer Vision and Image Understanding;2024-12

2. A Survey of Cross-Modal Visual Content Generation;IEEE Transactions on Circuits and Systems for Video Technology;2024-08

3. Research on Color Segmentation Algorithms in Visual Communication Technology for Human-Computer Interaction Systems;2024 6th International Conference on Communications, Information System and Computer Engineering (CISCE);2024-05-10

4. Practical Approach Towards Integrating Face Perception and Voice Representation;2024 International Conference on Communication, Computer Sciences and Engineering (IC3SE);2024-05-09

5. Toward Photo-Realistic Facial Animation Generation Based on Keypoint Features;Proceedings of the 2024 16th International Conference on Machine Learning and Computing;2024-02-02