Affiliation:
1. School of Electronic Information, Northwestern Polytechnical University, Xi’an 710129, China
Abstract
Metric-based meta-learning methods have demonstrated remarkable success in the domain of few-shot image classification. However, their performance is significantly contingent upon the choice of metric and the feature representation for the support classes. Current approaches, which predominantly rely on holistic image features, may inadvertently disregard critical details necessary for novel tasks, a phenomenon known as “supervision collapse”. Moreover, relying solely on visual features to characterize support classes can prove to be insufficient, particularly in scenarios involving limited sample sizes. In this paper, we introduce an innovative framework named Patch Matching Metric-based Semantic Interaction Meta-Learning (PatSiML), designed to overcome these challenges. To counteract supervision collapse, we have developed a patch matching metric strategy based on the Transformer architecture to transform input images into a set of distinct patch embeddings. This approach dynamically creates task-specific embeddings, facilitated by a graph convolutional network, to formulate precise matching metrics between the support classes and the query image patches. To enhance the integration of semantic knowledge, we have also integrated a label-assisted channel semantic interaction strategy. This strategy merges word embeddings with patch-level visual features across the channel dimension, utilizing a sophisticated language model to combine semantic understanding with visual information. Our empirical findings across four diverse datasets reveal that the PatSiML method achieves a classification accuracy improvement of 0.65% to 21.15% over existing methodologies, underscoring its robustness and efficacy.
Funder
Key R & D program of Shaanxi Province
Reference37 articles.
1. Learning to learn adaptive classifier–predictor for few-shot learning;Lai;IEEE Trans. Neural Netw. Learn. Syst.,2020
2. Crosstransformers: Spatially-aware few-shot transfer;Doersch;Adv. Neural Inf. Process. Syst.,2020
3. Chen, Y., Liu, Z., Xu, H., Darrell, T., and Wang, X. (2021, January 10–17). Meta-baseline: Exploring simple meta-learning for few-shot learning. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada.
4. Kang, S., Hwang, D., Eo, M., Kim, T., and Rhee, W. (2023, January 24). Meta-learning with a geometry-adaptive preconditioner. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.
5. Deepemd: Differentiable earth mover’s distance for few-shot learning;Zhang;IEEE Trans. Pattern Anal. Mach. Intell.,2022