Affiliation:
1. Wangxuan Institute of Computer Technology, Peking University, Beijing, China
2. Wangxuan Institute of Computer Technology, Peking University, Beijing, China and Peng Cheng Laboratorym, China
Abstract
Fashion image retrieval with text feedback aims to find the target image according to the reference image and the modification from the user. This is a challenging task, as it requires not only the synergistic understanding of both visual and textual modalities but also the ability to model a wide variety of styles that fashion images contain. Hence, the crucial aspect of addressing this problem lies in exploiting the abundant semantic information inherent in fashion images and correlating it with the textual description of style. Recognizing that style is generally situated at the local level, we explicitly define style as the commonalities and differences between local areas of fashion images. Building upon this, we propose a Style-guided Patch InteRaction approach for fashion Image retrieval with Text feedback (SPIRIT), which focuses on the decisive influence of local details of fashion images on their style. Three corresponding networks are designed pertinently. The Patch-level Style Commonality network is introduced to fully leverage the semantic information among patches and compute their average as the style commonality. Subsequently, the Patch-level Style Difference network employs a graph reasoning network to model the patch-level difference and filter out insignificant patches. By considering the above two networks, mutual information about style is obtained from the interaction between patches. Finally, the Visual Textual Fusion network is utilized to integrate visual features with rich semantic information and textual features. Experimental results on four benchmark datasets demonstrate that our proposed SPIRIT achieves state-of-the-art performance. Source code is available at
https://github.com/PKU-ICST-MIPL/SPIRIT_TOMM2024
.
Funder
National Natural Science Foundation of China
Publisher
Association for Computing Machinery (ACM)
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval;Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval;2024-07-10