Hierarchical Multi-Attention Transfer for Knowledge Distillation-Reference-Cited by-同舟云学术

Hierarchical Multi-Attention Transfer for Knowledge Distillation

Published:2022-10-20 Issue: Volume: Page:
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Gou Jianping¹,Sun Liyuan²,Yu Baosheng³,Wan Shaohua⁴,Tao Dacheng³

Affiliation:

1. College of Computer and Information Science, Southwest University, China and School of Computer Science and Communication Engineering, Jiangsu University, China

2. School of Computer Science and Communication Engineering, Jiangsu University, China

3. School of Computer Science, The University of Sydney, Australia

4. Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, China

Abstract

Knowledge distillation (KD) is a powerful and widely applicable technique for the compression of deep learning models. The main idea of knowledge distillation is to transfer knowledge from a large teacher model to a small student model, where the attention mechanism has been intensively explored in regard to its great flexibility for managing different teacher-student architectures. However, existing attention-based methods usually transfer similar attention knowledge from the intermediate layers of deep neural networks, leaving the hierarchical structure of deep representation learning poorly investigated for knowledge distillation. In this paper, we propose a hierarchical multi-attention transfer framework (HMAT), where different types of attention are utilized to transfer the knowledge at different levels of deep representation learning for knowledge distillation. Specifically, position-based and channel-based attention knowledge characterize the knowledge from low-level and high-level feature representations respectively, and activation-based attention knowledge characterize the knowledge from both mid-level and high-level feature representations. Extensive experiments on three popular visual recognition tasks, image classification, image retrieval, and object detection, demonstrate that the proposed hierarchical multi-attention transfer or HMAT significantly outperforms recent state-of-the-art KD methods.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Link

https://dl.acm.org/doi/pdf/10.1145/3568679

Reference55 articles.

1. Romero Adriana , Ballas Nicolas , K Samira Ebrahimi , Chassang Antoine , Gatta Carlo , and B Yoshua . 2015 . Fitnets: Hints for thin deep nets . International Conference on Learning Representations (2015), 1–13. Romero Adriana, Ballas Nicolas, K Samira Ebrahimi, Chassang Antoine, Gatta Carlo, and B Yoshua. 2015. Fitnets: Hints for thin deep nets. International Conference on Learning Representations (2015), 1–13.

2. Variational Information Distillation for Knowledge Transfer

3. Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473(2014). Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473(2014).

4. Zhao Borui , Cui Quan , Song Renjie , Qiu Yiyu , and Liang Jiajun . 2022 . Decoupled Knowledge Distillation . In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Zhao Borui, Cui Quan, Song Renjie, Qiu Yiyu, and Liang Jiajun. 2022. Decoupled Knowledge Distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

5. Cross-Layer Distillation with Semantic Calibration

Cited by 25 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multi-level knowledge distillation via dynamic decision boundaries exploration and exploitation;Information Fusion;2024-12

2. A Multi-Level Adaptive Lightweight Net for Damaged Road Marking Detection Based on Knowledge Distillation;Remote Sensing;2024-07-16

3. Multi-receptive Field Distillation Network for seismic velocity model building;Engineering Applications of Artificial Intelligence;2024-07

4. A progressive distillation network for practical image-based virtual try-on;Expert Systems with Applications;2024-07

5. Fine-Tuning Optimization of Small Language Models: A Novel Graph-Theoretical Approach for Efficient Prompt Engineering;2024 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB);2024-06-19