StyleTalk: One-Shot Talking Head Generation with Controllable Speaking Styles-Reference-Cited by-同舟云学术

StyleTalk: One-Shot Talking Head Generation with Controllable Speaking Styles

Published:2023-06-26 Issue:2 Volume:37 Page:1896-1904
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Ma Yifeng,Wang Suzhen,Hu Zhipeng,Fan Changjie,Lv Tangjie,Ding Yu,Deng Zhidong,Yu Xin

Abstract

Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Unmasking deepfakes: A systematic review of deepfake detection and generation techniques using artificial intelligence;Expert Systems with Applications;2024-10

2. OSM-Net: One-to-Many One-Shot Talking Head Generation With Spontaneous Head Motions;IEEE Transactions on Circuits and Systems for Video Technology;2024-08

3. Talking Face Generation via Face Mesh - Controllability without Reference Videos;2024 IEEE Conference on Artificial Intelligence (CAI);2024-06-25

4. StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads;IEEE Transactions on Pattern Analysis and Machine Intelligence;2024-06

5. Face Generation and Editing With StyleGAN: A Survey;IEEE Transactions on Pattern Analysis and Machine Intelligence;2024-05