Affiliation:
1. School of Computer Science and Information Engineering, Hefei University of Technology, China
2. Key Laboratory of Knowledge Engineering with Big Data (Hefei University of Technology) Ministry of Education, School of Computer Science and Information Engineering, Hefei University of Technology, China
Abstract
RGB-D cross-modal person re-identification (re-id) targets at retrieving the person of interest across RGB and depth image modalities. To cope with the modal discrepancy, some existing methods generate an auxiliary mode with either inherent properties of input modes or extra deep networks. However, such useful intermediary role included in generated mode is often overlooked in these approaches, leading to insufficient exploitation of crucial bridge knowledge. By contrast, in this paper, we propose a novel approach that constructs an intermediary mode through the constraints of self-supervised intermediary learning, which is freedom from modal prior knowledge and additional module parameters. We then design a bridge network to fully mine the intermediary role of generated modality through carrying out multi-modal integration and decomposition. For one thing, this network leverages a multi-modal transformer to integrate the information of three modes via fully exploiting their heterogeneous relations with the intermediary mode as the bridge. It conducts the identification consistency constraint to promote cross-modal associations. For another, it employs circle contrastive learning to decompose the cross-modal constraint process into several subprocedures, which provides the intermediate relay during pulling two original modalities closer. Experiments on two public datasets demonstrate that the proposed method exceeds the state-of-the-arts. The effectiveness of each component in this method is verified through numerous ablation studies. Additionally, we have demonstrated the generalization ability of the proposed method through experiments.
Publisher
Association for Computing Machinery (ACM)
Reference49 articles.
1. An efficient framework for visible–infrared cross modality person re-identification
2. Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597–1607.
3. Hi-CMD: Hierarchical Cross-Modality Disentanglement for Visible-Infrared Person Re-Identification
4. Cross-modality person re-identification with generative adversarial training;Dai Pingyang;IJCAI,2018
5. ImageNet: A large-scale hierarchical image database