Face shape transfer via semantic warping-Reference-Cited by-同舟云学术

Face shape transfer via semantic warping

Published:2024-09-03 Issue:1 Volume:2 Page:
ISSN:2731-9008
Container-title:Visual Intelligence
language:en
Short-container-title:Vis. Intell.

Author:

Li Zonglin,Lv Xiaoqian,Yu Wei,Liu Qinglin,Lin Jingbo,Zhang Shengping

Abstract

AbstractFace reshaping aims to adjust the shape of a face in a portrait image to make the face aesthetically beautiful, which has many potential applications. Existing methods 1) operate on the pre-defined facial landmarks, leading to artifacts and distortions due to the limited number of landmarks, 2) synthesize new faces based on segmentation masks or sketches, causing generated faces to look dissatisfied due to the losses of skin details and difficulties in dealing with hair and background blurring, and 3) project the positions of the deformed feature points from the 3D face model to the 2D image, making the results unrealistic because of the misalignment between feature points. In this paper, we propose a novel method named face shape transfer (FST) via semantic warping, which can transfer both the overall face and individual components (e.g., eyes, nose, and mouth) of a reference image to the source image. To achieve controllability at the component level, we introduce five encoding networks, which are designed to learn feature embedding specific to different face components. To effectively exploit the features obtained from semantic parsing maps at different scales, we employ a straightforward method of directly connecting all layers within the global dense network. This direct connection facilitates maximum information flow between layers, efficiently utilizing diverse scale semantic parsing information. To avoid deformation artifacts, we introduce a spatial transformer network, allowing the network to handle different types of semantic warping effectively. To facilitate extensive evaluation, we construct a large-scale high-resolution face dataset, which contains 14,000 images with a resolution of 1024 × 1024. Superior performance of our method is demonstrated by qualitative and quantitative experiments on the benchmark dataset.

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s44267-024-00058-7.pdf

Reference60 articles.

1. Suryanarayana, G. K., & Dubey, R. (2017). Image analyses of supersonic air-intake buzz and control by natural ventilation. Journal of Visualization, 20(4), 711–727.

2. Liu, L., Yu, H., Wang, S., Wan, L., & Han, S. (2021). Learning shape and texture progression for young child face aging. Signal Processing. Image Communication, 93, 116127.

3. Fan, X., Chai, Z., Feng, Y., Wang, Y., Wang, S., & Luo, Z. (2016). An efficient mesh-based face beautifier on mobile devices. Neurocomputing, 172, 134–142.

4. Zhang, J., Shan, S., Kan, M., & Chen, X. (2014). Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment. In D. J. Fleet, T. Pajdla, B. Schiele, et al. (Eds.), Proceedings of the 13th European conference on computer vision (pp. 1–16). Cham: Springer.

5. Alvarez, F. J. A., Parra, E. B. B., & Tubio, F. M. (2017). Improving graphic expression training with 3D models. Journal of Visualization, 20(4), 889–904.