Learning the Meta Feature Transformer for Unsupervised Person Re-Identification
-
Published:2024-06-11
Issue:12
Volume:12
Page:1812
-
ISSN:2227-7390
-
Container-title:Mathematics
-
language:en
-
Short-container-title:Mathematics
Author:
Li Qing1ORCID, Yan Chuan2, Peng Xiaojiang1ORCID
Affiliation:
1. College of Big Data and Internet, Shenzhen Technology University, Shenzhen 518118, China 2. Department of Computer Science, George Mason University, Fairfax, VA 22030, USA
Abstract
Although unsupervised person re-identification (Re-ID) has drawn increasing research attention, it still faces the challenge of learning discriminative features in the absence of pairwise labels across disjoint camera views. To tackle the issue of label scarcity, researchers have delved into clustering and multilabel learning using memory dictionaries. Although effective in improving unsupervised Re-ID performance, these methods require substantial computational resources and introduce additional training complexity. To address this issue, we propose a conceptually simple yet effective and learnable module effective block, named the meta feature transformer (MFT). MFT is a streamlined, lightweight network architecture that operates without the need for complex networks or feature memory bank storage. It primarily focuses on learning interactions between sample features within small groups using a transformer mechanism in each mini-batch. It then generates a new sample feature for each group through a weighted sum. The main benefits of MFT arise from two aspects: (1) it allows for the use of numerous new samples for training, which significantly expands the feature space and enhances the network’s generalization capabilities; (2) the trainable attention weights highlight the importance of samples, enabling the network to focus on more useful or distinguishable samples. We validate our method on two popular large-scale Re-ID benchmarks, where extensive evaluations show that our MFT outperforms previous methods and significantly improves Re-ID performances.
Funder
Shenzhen Technology University School-Level Research Project National Natural Science Foundation of China Stable Support Projects for Shenzhen Higher Education Institutions Natural Science Foundation of Top Talent of SZTU
Reference43 articles.
1. Fukuda, T., Suzuki, M., Kurata, G., Thomas, S., Cui, J., and Ramabhadran, B. (2017, January 20–24). Efficient Knowledge Distillation from an Ensemble of Teachers. Proceedings of the Interspeech, Stockholm, Sweden. 2. Zhong, Z., Zheng, L., Li, S., and Yang, Y. (2018, January 8–14). Generalizing a person retrieval model hetero-and homogeneously. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany. 3. Liao, S., Hu, Y., Zhu, X., and Li, S.Z. (2015, January 7–12). Person re-identification by local maximal occurrence representation and metric learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA. 4. Person re-identification by camera correlation aware feature augmentation;Chen;IEEE Trans. Pattern Anal. Mach. Intell.,2017 5. Hermans, A., Beyer, L., and Leibe, B. (2017). In defense of the triplet loss for person re-identification. arXiv.
|
|