Affiliation:
1. School of Electrical Engineering and Computing, Department of Computer Science and Engineering, Adama Science and Technology University, Adama, Ethiopia
2. School of Electrical Engineering and Computing, Department of Electronics and Communications Engineering, Adama Science and Technology University, Adama 1888, Ethiopia
Abstract
The fundamental challenge in video generation is not only generating high-quality image sequences but also generating consistent frames with no abrupt shifts. With the development of generative adversarial networks (GANs), great progress has been made in image generation tasks which can be used for facial expression synthesis. Most previous works focused on synthesizing frontal and near frontal faces and manual annotation. However, considering only the frontal and near frontal area is not sufficient for many real-world applications, and manual annotation fails when the video is incomplete. AffineGAN, a recent study, uses affine transformation in latent space to automatically infer the expression intensity value; however, this work requires extraction of the feature of the target ground truth image, and the generated sequence of images is also not sufficient. To address these issues, this study is proposed to infer the expression of intensity value automatically without the need to extract the feature of the ground truth images. The local dataset is prepared with frontal and with two different face positions (the left and right sides). Average content distance metrics of the proposed solution along with different experiments have been measured, and the proposed solution has shown improvements. The proposed method has improved the ACD-I of affine GAN from 1.606 ± 0.018 to 1.584 ± 0.00, ACD-C of affine GAN from 1.452 ± 0.008 to 1.430 ± 0.009, and ACD-G of affine GAN from 1.769 ± 0.007 to 1.744 ± 0.01, which is far better than AffineGAN. This work concludes that integrating self-attention into the generator network improves a quality of the generated images sequences. In addition, evenly distributing values based on frame size to assign expression intensity value improves the consistency of image sequences being generated. It also enables the generator to generate different frame size videos while remaining within the range [0, 1].
Funder
Adama Science and Technology University
Subject
Electrical and Electronic Engineering,General Computer Science,Signal Processing
Reference34 articles.
1. Generative adversarial networks;I. Goodfellow;Communications of the ACM,2020
2. Image super-resolution using very deep residual channel attention networks;Y. Zhang
3. Joint Deep Learning of Facial Expression Synthesis and Recognition
4. Self-attention generative adversarial networks;H. Zhang
5. Image-to-image translation with conditional adversarial networks;P. Isola
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Revitalizing Nash Equilibrium in GANs for Human Face Image Generation;2024 International Joint Conference on Neural Networks (IJCNN);2024-06-30