Affiliation:
1. University of Trento and University of Pisa, Italy
2. ETH Zürich, Switzerland
3. Peng Cheng Laboratory, China
4. University of Oxford, UK
5. University of Trento, Italy
Abstract
The 2D image-based virtual try-on has aroused increased interest from the multimedia and computer vision fields due to its enormous commercial value. Nevertheless, most existing image-based virtual try-on approaches directly combine the person-identity representation and the in-shop clothing items without taking their mutual correlations into consideration. Moreover, these methods are commonly established on pure convolutional neural networks (CNNs) architectures which are not simple to capture the long-range correlations among the input pixels. As a result, it generally results in inconsistent results. To alleviate these issues, in this article, we propose a novel two-stage cloth interactive transformer (CIT) method for the virtual try-on task. During the first stage, we design a CIT matching block, aiming at precisely capturing the long-range correlations between the cloth-agnostic person information and the in-shop cloth information. Consequently, it makes the warped in-shop clothing items look more natural in appearance. In the second stage, we put forth a CIT reasoning block for establishing global mutual interactive dependencies among person representation, the warped clothing item, and the corresponding warped cloth mask. The empirical results, based on mutual dependencies, demonstrate that the final try-on results are more realistic. Substantial empirical results on a public fashion dataset illustrate that the suggested CIT attains competitive virtual try-on performance.
Funder
National Ph.D. in Artificial Intelligence for Society Program of Italy, the MUR PNRR project FAIR
NextGenerationEU and the EU H2020 AI4Media Project
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications,Hardware and Architecture
Reference63 articles.
1. Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. 2018. Video based reconstruction of 3d people models. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 8387–8397.
2. Shuai Bai, Huiling Zhou, Zhikang Li, Chang Zhou, and Hongxia Yang. 2022. Single stage virtual try-on via deformable attention flows. In Proceedings of the European Conference on Computer Vision. Springer, 409–425.
3. Shape matching and object recognition using shape contexts
4. Principal warps: thin-plate splines and the decomposition of deformations
5. Design preserving garment transfer;Brouet Remi;ACM Transactions on Graphics,2012