Multi-target Knowledge Distillation via Student Self-reflection
-
Published:2023-04-25
Issue:7
Volume:131
Page:1857-1874
-
ISSN:0920-5691
-
Container-title:International Journal of Computer Vision
-
language:en
-
Short-container-title:Int J Comput Vis
Author:
Gou JianpingORCID, Xiong Xiangshuo, Yu Baosheng, Du Lan, Zhan Yibing, Tao Dacheng
Abstract
AbstractKnowledge distillation is a simple yet effective technique for deep model compression, which aims to transfer the knowledge learned by a large teacher model to a small student model. To mimic how the teacher teaches the student, existing knowledge distillation methods mainly adapt an unidirectional knowledge transfer, where the knowledge extracted from different intermedicate layers of the teacher model is used to guide the student model. However, it turns out that the students can learn more effectively through multi-stage learning with a self-reflection in the real-world education scenario, which is nevertheless ignored by current knowledge distillation methods. Inspired by this, we devise a new knowledge distillation framework entitled multi-target knowledge distillation via student self-reflection or MTKD-SSR, which can not only enhance the teacher’s ability in unfolding the knowledge to be distilled, but also improve the student’s capacity of digesting the knowledge. Specifically, the proposed framework consists of three target knowledge distillation mechanisms: a stage-wise channel distillation (SCD), a stage-wise response distillation (SRD), and a cross-stage review distillation (CRD), where SCD and SRD transfer feature-based knowledge (i.e., channel features) and response-based knowledge (i.e., logits) at different stages, respectively; and CRD encourages the student model to conduct self-reflective learning after each stage by a self-distillation of the response-based knowledge. Experimental results on five popular visual recognition datasets, CIFAR-100, Market-1501, CUB200-2011, ImageNet, and Pascal VOC, demonstrate that the proposed framework significantly outperforms recent state-of-the-art knowledge distillation methods.
Funder
National Natural Science Foundation of China
Publisher
Springer Science and Business Media LLC
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Software
Reference77 articles.
1. Ahn, S., Hu, S. X., Damianou, A., Lawrence, N. D., & Dai, Z. (2019). Variational information distillation for knowledge transfer. In CVPR (pp. 9163–9171). 2. Chen, S., Hong, Z., Xie, G. S., Yang, W., Peng, Q., Wang, K., Zhao, J., & You, X. (2022). MSDN: Mutually semantic distillation network for zero-shot learning. In: CVPR (pp. 7612–7621). 3. Chen, W., Li, S., Huang, C., Yu, Y., Jiang, Y., & Dong, J. (2022). Mutual Distillation Learning Network for Trajectory-User Linking. In: IJCAI. 4. Chen, P., Liu, S., Zhao, H., & Jia, J.(2021). Distilling knowledge via knowledge review. In: CVPR (pp. 5008–5017). 5. Chen, D., Mei, J. P., Zhang, H., Wang, C., Feng, Y., & Chen, C. (2022). Knowledge distillation with the reused teacher classifier. In: CVPR (pp. 11933-11942).
Cited by
17 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|