Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion-Reference-Cited by-同舟云学术

Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion

Published:2021-08 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence
language:
Short-container-title:

Author:

Wang Suzhen¹,Li Lincheng¹,Ding Yu¹,Fan Changjie¹,Yu Xin²

Affiliation:

1. Virtual Human Group, Netease Fuxi AI Lab, China

2. University of Technology Sydney

Abstract

We propose an audio-driven talking-head method to generate photo-realistic talking-head videos from a single reference image. In this work, we tackle two key challenges: (i) producing natural head motions that match speech prosody, and (ii)} maintaining the appearance of a speaker in a large head motion while stabilizing the non-face regions. We first design a head pose predictor by modeling rigid 6D head movements with a motion-aware recurrent neural network (RNN). In this way, the predicted head poses act as the low-frequency holistic movements of a talking head, thus allowing our latter network to focus on detailed facial movement generation. To depict the entire image motions arising from audio, we exploit a keypoint based dense motion field representation. Then, we develop a motion field generator to produce the dense motion fields from input audio, head poses, and a reference image. As this keypoint based representation models the motions of facial regions, head, and backgrounds integrally, our method can better constrain the spatial and temporal consistency of the generated videos. Finally, an image generation network is employed to render photo-realistic talking-head videos from the estimated keypoint based motion fields and the input reference image. Extensive experiments demonstrate that our method produces videos with plausible head motions, synchronized facial expressions, and stable backgrounds and outperforms the state-of-the-art.

Publisher

International Joint Conferences on Artificial Intelligence Organization

Cited by 45 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Deep Learning for Visual Speech Analysis: A Survey;IEEE Transactions on Pattern Analysis and Machine Intelligence;2024-09

2. A survey on deep learning based reenactment methods for deepfake applications;IET Image Processing;2024-08-19

3. Generative artificial intelligence: a systematic review and applications;Multimedia Tools and Applications;2024-08-14

4. OSM-Net: One-to-Many One-Shot Talking Head Generation With Spontaneous Head Motions;IEEE Transactions on Circuits and Systems for Video Technology;2024-08

5. Talking Head Generation Based on 3D Morphable Facial Model;2024 Picture Coding Symposium (PCS);2024-06-12