Abstract
AbstractExisting knowledge distillation (KD) methods are mainly based on features, logic, or attention, where features and logic represent the results of reasoning at different stages of a convolutional neural network, and attention maps symbolize the reasoning process. Because of the continuity of the two in time, transferring only one of them to the student network will lead to unsatisfactory results. We study the knowledge transfer between the teacher-student network to different degrees, revealing the importance of simultaneously transferring knowledge related to the reasoning process and reasoning results to the student network, providing a new perspective for the study of KD. On this basis, we proposed the knowledge distillation method based on attention and feature transfer (AFT-KD). First, we use transformation structures to transform intermediate features into attentional and feature block (AFB) that contain both inference process information and inference outcome information, and force students to learn the knowledge in AFBs. To save computation in the learning process, we use block operations to align the teacher-student network. In addition, in order to balance the attenuation ratio between different losses, we design an adaptive loss function based on the loss optimization rate. Experiments have shown that AFT-KD achieves state-of-the-art performance in multiple benchmark tests.
Funder
Education Department of Jiangxi Province
Publisher
Springer Science and Business Media LLC
Reference42 articles.
1. Liu, Y. et al. Interaction-enhanced and time-aware graph convolutional network for successive point-of-interest recommendation in traveling enterprises. IEEE Trans. Industr. Inf. 19(1), 635–643 (2022).
2. Grabek, J. & Cyganek, B. An impact of tensor-based data compression methods on deep neural network accuracy. Ann. Comput. Sci. Inf. Syst. 26, 3–11 (2021).
3. Hameed, M. G. A. et al. Convolutional neural network compression through generalized Kronecker product decomposition. Proc. AAAI Confer. Artif. Intell. 36(1), 771–779 (2022).
4. Hua, W. et al. Channel gating neural networks. Adv. Neural Inf. Process. Syst. 32, 1 (2019).
5. Gusak, J., Kholiavchenko, M., Ponomarev, E., et al. Automated multi-stage compression of neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019).