Pose‐guided adversarial video prediction for image‐to‐video person re‐identification-Reference-Cited by-同舟云学术

Pose‐guided adversarial video prediction for image‐to‐video person re‐identification

Published:2023-08-29 Issue:14 Volume:17 Page:4000-4013
ISSN:1751-9659
Container-title:IET Image Processing
language:en
Short-container-title:IET Image Processing

Author:

He Yunqi¹,Chen Liqiu²,Pan Honghu²^ORCID

Affiliation:

1. School of Information and Computer Engineering Northeast Forestry University Harbin China

2. School of Computer Science and Technology Harbin Institute of Technology Shenzhen Shenzhen China

Abstract

AbstractThe image‐to‐video (I2V) person re‐identification (Re‐ID) is a cross‐modality pedestrian retrieval task, whose crux is to reduce the large modality discrepancy between images and videos. To this end, this paper proposes to predict the following video frames from a single image. Thus, the I2V person Re‐ID can be transformed to video‐to‐video (V2V) Re‐ID. Considering that predicting video frames from a single image is an ill‐posed problem, this paper proposes two strategies to improve the quality of the predicted videos. First, a pose‐guided video prediction pipeline is proposed. The given single image and pedestrian pose are encoded via image encoder and pose encoder, respectively; then, the image feature and pose feature are concatenated as the input of the video decoder. The authors minimize the difference between the predicted video and true video, and simultaneously minimize the difference between the true pose and predicted pose. Second, the conditional adversarial training strategy is employed to generate high‐quality video frames. Specifically, the discriminator takes the source image as condition and distinguishes whether the input frames are fake or true following frames of the source image. Experimental results demonstrate that the pose‐guided adversarial video prediction can effectively improve accuracy of I2V Re‐ID.

Funder

National Natural Science Foundation of China

Publisher

Institution of Engineering and Technology (IET)

Subject

Electrical and Electronic Engineering,Computer Vision and Pattern Recognition,Signal Processing,Software

Reference60 articles.

1. Zheng L. Bie Z. et al.:Mars: a video benchmark for large‐scale person re‐identification. In:Proceedings of the European Conference on Computer Vision. Lecture Notes in Computer Science vol.9910 pp.868–884.Springer Cham(2016)

2. Person Re-identification: System Design and Evaluation Overview

3. Wu S. Chen Y. et al.:An enhanced deep feature representation for person re‐identification. In:2016 IEEE Winter Conference on Applications of Computer Vision pp.1–8.IEEE Piscataway NJ(2016)

4. Hermans A. Beyer L. Leibe B.:In defense of the triplet loss for person re‐identification. arXiv preprint arXiv:1703.07737 (2017)

5. Yan Y. Qin J. et al.:Learning multi‐granular hypergraphs for video‐based person re‐identification. In:Proceedings of IEEE Conference on Computer Vision and Pattern Recognition pp.2899–2908.IEEE Piscataway NJ(2020)