Affiliation:
1. School of Digital and Intelligent Industry, Inner Mongolia University of Science and Technology, Baotou 014010, China
Abstract
Few-shot learning (FSL) is a challenging problem. Transfer learning methods offer a straightforward and effective solution to FSL by leveraging pre-trained models and generalizing them to new tasks. However, pre-trained models often lack the ability to highlight and emphasize salient features, a gap that attention mechanisms can fill. Unfortunately, existing attention mechanisms encounter issues such as high complexity and incomplete attention information. To address these issues, we propose a dimensionally enhanced attention (DEA) module for FSL. This DEA module introduces minimal additional computational overhead while fully attending to both channel and spatial information. Specifically, the feature map is first decomposed into 1D tensors of varying dimensions using strip pooling. Next, a multi-dimensional collaborative learning strategy is introduced, enabling cross-dimensional information interactions through 1D convolutions with adaptive kernel sizes. Finally, the feature representation is enhanced by calculating attention weights for each dimension using a sigmoid function and weighting the original input accordingly. This approach ensures comprehensive attention to different dimensions of information, effectively characterizing data in various directions. Additionally, we have found that knowledge distillation significantly improves FSL performance. To this end, we implement a logit standardization self-distillation method tailored for FSL. This method addresses the issue of exact logit matching, which arises from the shared temperature in the self-distillation process, by employing logit standardization. We present experimental results on several benchmark datasets where the proposed method yields significant performance improvements.
Funder
National Natural Science Foundation of China
Program for Young Talents of Science and Technology in Universities of Inner Mongolia Autonomous Region
Reference47 articles.
1. He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27–30). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.
2. Dataset bias in few-shot image recognition;Jiang;IEEE Trans. Pattern Anal. Mach. Intell.,2022
3. Zhu, B., Flanagan, K., Fragomeni, A., Wray, M., and Damen, D. (2024). Video Editing for Video Retrieval. arXiv.
4. Generalized few-shot video classification with video retrieval and feature generation;Xian;IEEE Trans. Pattern Anal. Mach. Intell.,2021
5. Few-shot object detection: Research advances and challenges;Xin;Inf. Fusion,2024