Affiliation:
1. College of Computer Science Sichuan University Chengdu People's Republic of China
2. School of Information Science and Engineering Yanshan University Qinhuangdao People's Republic of China
3. The Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province Yanshan University Qinhuangdao People's Republic of China
Abstract
SummaryUnsupervised person re‐identification based on video sequences can be applied to surveillance systems and is attracting much more attention. It aims to spot specific person in other scenes captured by different cameras. This work explores an innovative strategy, namely, learning to cluster unlabeled person in the videos through graph convolutional networks. In this article, we find that the possibility of inter‐frame linkage can be inferred from context. Therefore, a pose‐guided topology linkage clustering framework is proposed. Our framework consists of three modules: (i) a pose‐guided representation module; (ii) a pose‐guided embedding module; (iii) a link prediction module. First, the representation coding alone is performed at the level of relational induction bias, embedding the implicit pose structure information in image features. Then, based on the consideration of the topology relationship between adjacent and cross‐frame, graph convolutional network is introduced to infer the likelihood of linkage between frame nodes. Experiments show that the proposed method demonstrates excellent scalability in addition to being an effective response to person clustering in case of changes, and does not need the number of clusters as a prior.