FA-VTON: A Feature Alignment-Based Model for Virtual Try-On
-
Published:2024-06-17
Issue:12
Volume:14
Page:5255
-
ISSN:2076-3417
-
Container-title:Applied Sciences
-
language:en
-
Short-container-title:Applied Sciences
Author:
Wan Yan1, Ding Ning1, Yao Li1
Affiliation:
1. School of Computer Science and Technology, Donghua University, 2999 North Renmin Road, Shanghai 201620, China
Abstract
The virtual try-on technology based on 2D images aims to seamlessly transfer provided garments onto target person images. Prior methods mainly concentrated on warping garments and generating images, overlooking the influence of feature alignment on the try-on results. In this study, we initially analyze the distortions present by existing methods and elucidate the critical role of feature alignment in the extraction stage. Building on this, we propose a novel feature alignment-based model (FA-VTON). Specifically, FA-VTON aligns the upsampled higher-level features from both person and garment images to acquire precise boundary information, which serves as guidance for subsequent garment warping. Concurrently, the Efficient Channel Attention mechanism (ECA) is introduced to generate the final result in the try-on generation module. This mechanism enables adaptive adjustment of channel feature weights to extract important features and reduce artifact generation. Furthermore, to make the student network focus on salient regions of each channel, we utilize channel-wise distillation (CWD) to minimize the Kullback–Leibler (KL) divergence between the channel probability maps of the two networks. The experiments show that our model achieves better results in both qualitative and quantitative analyses compared to current methods on the popular virtual try-on datasets.
Reference48 articles.
1. Bhatnagar, B.L., Tiwari, G., Theobalt, C., and Pons-Moll, G. (November, January 27). Multi-garment net: Learning to dress 3d people from images. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea. 2. Mir, A., Alldieck, T., and Pons-Moll, G. (2020, January 13–19). Learning to transfer texture from clothing images to 3d humans. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA. 3. Saito, S., Simon, T., Saragih, J., and Joo, H. (2020, January 13–19). Pifuhd: Multi-level pixel-aligned implicit function for high-resolution 3d human digitization. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA. 4. Han, X., Hu, X., Huang, W., and Scott, M.R. (November, January 27). Clothflow: A flow-based model for clothed person generation. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea. 5. Han, X., Wu, Z., Wu, Z., Yu, R., and Davis, L.S. (2018, January 18–23). Viton: An image-based virtual try-on network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.
|
|