Improved First-Order Motion Model of Image Animation with Enhanced Dense Motion and Repair Ability-Reference-Cited by-同舟云学术

Improved First-Order Motion Model of Image Animation with Enhanced Dense Motion and Repair Ability

Published:2023-03-24 Issue:7 Volume:13 Page:4137
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Xu Yu¹,Xu Feng¹,Liu Qiang¹^ORCID,Chen Jianwen²

Affiliation:

1. Academy of Artificial Intelligence, Beijing Institute of Petrochemical Technology, Beijing 102617, China

2. School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China

Abstract

Image animation aims to transfer the posture change of a driving video to the static object of the source image, and has potential applications in various domains, such as film and game industries. The essential part in this task is to generate a video by learning the motion from the driving video while preserving the appearance from the source image. As a result, a new object with the same motion will be generated in the animated video. However, it is a significant challenge if the object pose shows large-scale change. Even the most recent method failed to achieve this correctly with good visual effects. In order to solve the problem of poor visual effects in the videos with the large-scale pose change, a novel method based on an improved first-order motion model (FOMM) with enhanced dense motion and repair ability was proposed in this paper. Firstly, when generating optical flow, we propose an attention mechanism that optimizes the feature representation of the image in both channel and spatial domains through maximum pooling. This enables better distortion of the source image into the feature domain of the driving image. Secondly, we further propose a multi-scale occlusion restoration module that generates a multi-resolution occlusion map by upsampling the low-resolution occlusion map. Following this, the generator redraws the occluded part of the reconstruction result across multiple scales through the multi-resolution occlusion map to achieve more accurate and vivid visual effects. In addition, the proposed model can be trained effectively in an unsupervised manner. We evaluated the proposed model on three benchmark datasets. The experimental results showed that multiple evaluation indicators were improved by our proposed method, and the visual effect of the animated videos obviously outperformed the FOMM. On the Voxceleb1 dataset, the pixel error, average keypoints distance and average Euclidean distance by our proposed method were reduced by 6.5%, 5.1% and 0.7%, respectively. On the TaiChiHD dataset, the pixel error, average keypoints distance and missing keypoints rate measured by our proposed method were reduced by 4.9%, 13.5% and 25.8%, respectively.

Funder

Beijing Municipal Education Commission, China

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/7/4137/pdf

Reference25 articles.

1. AlZu’bi, S., and Jararweh, Y. (2020, January 20–23). Data fusion in autonomous vehicles research, literature tracing from imaginary idea to smart surrounding community. Proceedings of the 2020 Fifth International Conference on Fog and Mobile Edge Computing (FMEC), Paris, France.

2. Displaced dynamic expression regression for real-time facial tracking and animation;Cao;ACM Trans. Graph.,2014

3. Blanz, V., and Vetter, T. (1999, January 8–13). A morphable model for the synthesis of 3d faces. Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, Los Angeles, CA, USA.

4. Thies, J., Zollhöfer, M., Stamminger, M., Theobalt, C., and Nießner, M. (2016, January 27–30). Face2face: Real-time face capture and reenactment of rgb videos. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.

5. Rezaee, H., Aghagolzadeh, A., Seyedarabi, M.H., and AlZu’bi, S. (2011, January 19–22). Tracking and occlusion handling in multi-sensor networks by particle filter. Proceedings of the 2011 IEEE GCC Conference and Exhibition (GCC), Dubai, United Arab Emirates.