TSPT ReID: Triple Streams with Pos-insertion Transformer for Person Re-identification-Reference-Cited by-同舟云学术

TSPT ReID: Triple Streams with Pos-insertion Transformer for Person Re-identification

Published:2024-04-19 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Li Xujun¹,Rao Liming¹,Zhang Tengze¹,Chang Jia¹,Duan Zhicheng¹

Affiliation:

1. Xiangtan University

Abstract

In the process of person re-identification (ReID), the disparity of various datasets caused by different cameras and viewing distances poses challenges. Additionally, the inherent limitation of convolutional neural networks (CNNs) in capturing long-range dependencies exacerbates these challenges. To address these issues, this paper proposes the Triple Streams with Pos-insertion Transformer ReID (TSPT ReID) based on the Transformer architecture. The key design of TSPT ReID, the Triple Streams Transformer Encoder (TST Encoder), establishes multi-head attention layers on three different scales to enhance the correlation between various scales. To enhance the recognizability and robustness of valid pedestrian edges, the Shift Pos-insertion Module (SPI Module) is introduced by adding shift-based position coding in the attention layers without introducing additional parameters. The experimental results show the effectiveness of the model on multiple datasets, achieving 82.9% mAP and 91.4% Rank-1 accuracy on the DukeMTMC, and 83.5% mAP and 85.9% Rank-1 accuracy on CUKH03.

Publisher

Research Square Platform LLC

Reference49 articles.

1. Imagenet classification with deep convolutional neural networks;Krizhevsky A;Commun ACM,2017

2. Deep residual learning for image recognition: A survey;Shafiq M;Appl Sciences-Basel,2022

3. Luo W, Li Y, Urtasun R, Zemel R (2016) Understanding the effective receptive field in deep convolutional neural networks. Advances in neural information processing systems 29

4. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). p 3431–3440

5. Zhang Z, Lan C, Zeng W, Jin X, Chen Z (2020) Relation-aware global attention for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). p 3186–3195