A Feature Map is Worth a Video Frame: Rethinking Convolutional Features for Visible-Infrared Person Re-identification-Reference-Cited by-同舟云学术

A Feature Map is Worth a Video Frame: Rethinking Convolutional Features for Visible-Infrared Person Re-identification

Published:2023-10-18 Issue:2 Volume:20 Page:1-20
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

He Qiaolin¹^ORCID,Zheng Zhijie¹^ORCID,Hu Haifeng¹^ORCID

Affiliation:

1. School of Electronics and Information Technology, Sun Yat-sen University, China

Abstract

Visible-Infrared Person Re-identification (VI-ReID) aims to search for the identity of the same person across different spectra. The feature maps obtained from the convolutional layers are generally used for loss calculation in the later stages of the model in VI-ReID, but their role in the early and middle stages of the model remains unexplored. In this article, we propose a novel Rethinking Convolutional Features (ReCF) approach for VI-ReID. ReCF consists of two modules: Middle Feature Generation (MFG), which utilizes the feature maps in the early stage to reduce significant modality gap, and Temporal Feature Aggregation (TFA), which uses the feature maps in the middle stage to aggregate multi-level features for enlarging the receptive field. MFG generates middle modality features in the form of a learnable convolution layer as a bridge between RGB and IR modalities, which is more flexible than using fixed-parameter grayscale images and yields a better middle modality to further reduce the modality gap. TFA first treats the convolution process as a video sequence, and the feature map of each convolution layer can be considered a worthwhile video frame. Based on this, we can obtain a multi-level receptive field and a temporal refinement. In addition, we introduce a color-unrelated loss and a modality-unrelated loss to constrain the modality features for providing a common feature representation space. Experimental results on the challenging VI-ReID datasets demonstrate that our proposed method achieves state-of-the-art performance.

Funder

National Natural Science Foundation of China

National Key Research and Development Program of China

Natural Science Foundation of Guangdong Province

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Link

https://dl.acm.org/doi/pdf/10.1145/3617375

Reference88 articles.

1. Mahdi Alehdaghi, Arthur Josi, Rafael MO Cruz, and Eric Granger. 2023. Visible-infrared person re-identification using privileged intermediate information. In Proceedings of the Computer Vision–ECCV 2022 Workshops: Tel Aviv. Springer, 720–737.

2. Steve Branson, Oscar Beijbom, and Serge Belongie. 2013. Efficient large-scale structured learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1806–1813.

3. Dapeng Chen, Hongsheng Li, Tong Xiao, Shuai Yi, and Xiaogang Wang. 2018. Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1169–1178.

4. Yanbei Chen, Xiatian Zhu, and Shaogang Gong. 2017. Person re-identification by deep learning multi-scale representations. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 2590–2600.

5. Dahjung Chung, Khalid Tahboub, and Edward J. Delp. 2017. A two stream siamese convolutional neural network for person re-identification. In Proceedings of the IEEE International Conference on Computer Vision. 1983–1991.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A comprehensive survey of visible infrared person re-identification from an application perspective;Multimedia Tools and Applications;2024-04-24