Rethinking Person Re-Identification via Semantic-based Pretraining-Reference-Cited by-同舟云学术

Rethinking Person Re-Identification via Semantic-based Pretraining

Published:2023-12-09 Issue:3 Volume:20 Page:1-17
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Xiang Suncheng¹^ORCID,Qian Dahong¹^ORCID,Gao Jingsheng²^ORCID,Zhang Zirui²^ORCID,Liu Ting²^ORCID,Fu Yuzhuo²^ORCID

Affiliation:

1. School of Biomedical Engineering, Shanghai Jiao Tong University, China

2. School of ElectronicInformation and Electrical Engineering, Shanghai Jiao Tong University, China

Abstract

Pretraining is a dominant paradigm in computer vision. Generally, supervised ImageNet pretraining is commonly used to initialize the backbones of person re-identification (Re-ID) models. However, recent works show a surprising result that CNN-based pretraining on ImageNet has limited impacts on Re-ID system due to the large domain gap between ImageNet and person Re-ID data. To seek an alternative to traditional pretraining, here we investigate semantic-based pretraining as another method to utilize additional textual data against ImageNet pretraining. Specifically, we manually construct a diversified FineGPR-C caption dataset for the first time on person Re-ID events. Based on it, a pure semantic-based pretraining approach named VTBR is proposed to adopt dense captions to learn visual representations with fewer images. We train convolutional neural networks from scratch on the captions of FineGPR-C dataset, and then transfer them to downstream Re-ID tasks. Comprehensive experiments conducted on benchmark datasets show that our VTBR can achieve competitive performance compared with ImageNet pretraining—despite using up to 1.4× fewer images, revealing its potential in Re-ID pretraining. Our source code is also publicly available at https://github.com/JeremyXSC/VTBR .

Funder

National Natural Science Foundation of China

Startup Fund for Young Faculty at SJTU

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Link

https://dl.acm.org/doi/pdf/10.1145/3628452

Reference58 articles.

1. Zechen Bai, Zhigang Wang, Jian Wang, Di Hu, and Errui Ding. 2021. Unsupervised multi-source domain adaptation for person re-identification. In IEEE Conference on Computer Vision and Pattern Recognition. 12914–12923.

2. Binghui Chen, Weihong Deng, and Jiani Hu. 2019. Mixed high-order attention network for person re-identification. In IEEE International Conference on Computer Vision. 371–381.

3. ImageNet: A large-scale hierarchical image database

4. Weijian Deng, Liang Zheng, Qixiang Ye, Guoliang Kang, Yi Yang, and Jianbin Jiao. 2018. Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification. In IEEE Conference on Computer Vision and Pattern Recognition. 994–1003.

5. Karan Desai and Justin Johnson. 2021. Virtex: Learning visual representations from textual annotations. In IEEE Conference on Computer Vision and Pattern Recognition. 11162–11173.