Intermediary-Generated Bridge Network for RGB-D Cross-modal Re-identification-Reference-Cited by-同舟云学术

Intermediary-Generated Bridge Network for RGB-D Cross-modal Re-identification

Published:2024-07-29 Issue: Volume: Page:
ISSN:2157-6904
Container-title:ACM Transactions on Intelligent Systems and Technology
language:en
Short-container-title:ACM Trans. Intell. Syst. Technol.

Author:

Wu Jingjing¹^ORCID,Hong Richang²^ORCID,Tang Shengeng²^ORCID

Affiliation:

1. School of Computer Science and Information Engineering, Hefei University of Technology, China

2. Key Laboratory of Knowledge Engineering with Big Data (Hefei University of Technology) Ministry of Education, School of Computer Science and Information Engineering, Hefei University of Technology, China

Abstract

RGB-D cross-modal person re-identification (re-id) targets at retrieving the person of interest across RGB and depth image modalities. To cope with the modal discrepancy, some existing methods generate an auxiliary mode with either inherent properties of input modes or extra deep networks. However, such useful intermediary role included in generated mode is often overlooked in these approaches, leading to insufficient exploitation of crucial bridge knowledge. By contrast, in this paper, we propose a novel approach that constructs an intermediary mode through the constraints of self-supervised intermediary learning, which is freedom from modal prior knowledge and additional module parameters. We then design a bridge network to fully mine the intermediary role of generated modality through carrying out multi-modal integration and decomposition. For one thing, this network leverages a multi-modal transformer to integrate the information of three modes via fully exploiting their heterogeneous relations with the intermediary mode as the bridge. It conducts the identification consistency constraint to promote cross-modal associations. For another, it employs circle contrastive learning to decompose the cross-modal constraint process into several subprocedures, which provides the intermediate relay during pulling two original modalities closer. Experiments on two public datasets demonstrate that the proposed method exceeds the state-of-the-arts. The effectiveness of each component in this method is verified through numerous ablation studies. Additionally, we have demonstrated the generalization ability of the proposed method through experiments.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3682066

Reference49 articles.

1. An efficient framework for visible–infrared cross modality person re-identification

2. Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597–1607.

3. Hi-CMD: Hierarchical Cross-Modality Disentanglement for Visible-Infrared Person Re-Identification

4. Cross-modality person re-identification with generative adversarial training;Dai Pingyang;IJCAI,2018

5. ImageNet: A large-scale hierarchical image database