Author:
Li Gang,Wang Kun,Lv Pengfei,He Pan,Zhou Zheng,Xu Chuanyun
Abstract
AbstractGenerally, the recognition performance of lightweight models is often lower than that of large models. Knowledge distillation, by teaching a student model using a teacher model, can further enhance the recognition accuracy of lightweight models. In this paper, we approach knowledge distillation from the perspective of intermediate feature-level knowledge distillation. We combine a cross-stage feature fusion symmetric framework, an attention mechanism to enhance the fused features, and a contrastive loss function for teacher and student models at the same stage to comprehensively implement a multistage feature fusion knowledge distillation method. This approach addresses the problem of significant differences in the intermediate feature distributions between teacher and student models, making it difficult to effectively learn implicit knowledge and thus improving the recognition accuracy of the student model. Compared to existing knowledge distillation methods, our method performs at a superior level. On the CIFAR100 dataset, it boosts the recognition accuracy of ResNet20 from 69.06% to 71.34%, and on the TinyImagenet dataset, it increases the recognition accuracy of ResNet18 from 66.54% to 68.03%, demonstrating the effectiveness and generalizability of our approach. Furthermore, there is room for further optimization of the overall distillation structure and feature extraction methods in this approach, which requires further research and exploration.
Funder
Chongqing University of Technology graduate education high-quality development project
the Chongqing University of Technology undergraduate education and teaching reform research project
the Postgraduate Education and Teaching Reform Research Project in Chongqing
Chongqing Science and Technology Foundation
Publisher
Springer Science and Business Media LLC
Reference45 articles.
1. Liu, Z. et al. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 11976–11986 (2022).
2. Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
3. Liu, Z. et al. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision 10012–10022 (2021).
4. Sun, P. et al. Sparse r-cnn: End-to-end object detection with learnable proposals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 14454–14463 (2021).
5. Ge, Z., Liu, S., Wang, F., Li, Z. & Sun, J. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021).