Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning-Reference-Cited by-同舟云学术

Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning

Published:2020-07 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence
language:
Short-container-title:

Author:

Zhu Hao¹²,Huang Huaibo²³,Li Yi²³,Zheng Aihua¹,He Ran²³

Affiliation:

1. School of Computer Science and Technology, Anhui University, Hefei

2. NLPR&CEBSIT&CRIPAC, Institute of Automation, CAS, Beijing, China

3. School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China

Abstract

Talking face generation aims to synthesize a face video with precise lip synchronization as well as a smooth transition of facial motion over the entire video via the given speech clip and facial image. Most existing methods mainly focus on either disentangling the information in a single image or learning temporal information between frames. However, cross-modality coherence between audio and video information has not been well addressed during synthesis. In this paper, we propose a novel arbitrary talking face generation framework by discovering the audio-visual coherence via the proposed Asymmetric Mutual Information Estimator (AMIE). In addition, we propose a Dynamic Attention (DA) block by selectively focusing the lip area of the input image during the training stage, to further enhance lip synchronization. Experimental results on benchmark LRW dataset and GRID dataset transcend the state-of-the-art methods on prevalent metrics with robust high-resolution synthesizing on gender and pose variations.

Publisher

International Joint Conferences on Artificial Intelligence Organization

Cited by 18 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Deep Learning for Visual Speech Analysis: A Survey;IEEE Transactions on Pattern Analysis and Machine Intelligence;2024-09

2. Adaptive Texture and Spectrum Clue Mining for Generalizable Face Forgery Detection;IEEE Transactions on Information Forensics and Security;2024

3. HDTR-Net: A Real-Time High-Definition Teeth Restoration Network for Arbitrary Talking Face Generation Methods;Pattern Recognition and Computer Vision;2023-12-28

4. Talking face generation driven by time–frequency domain features of speech audio;Displays;2023-12

5. Context-Aware Talking-Head Video Editing;Proceedings of the 31st ACM International Conference on Multimedia;2023-10-26