Matching the Ideal Pruning Method with Knowledge Distillation for Optimal Compression
-
Published:2024-06-29
Issue:4
Volume:7
Page:56
-
ISSN:2571-5577
-
Container-title:Applied System Innovation
-
language:en
-
Short-container-title:ASI
Author:
Malihi Leila1, Heidemann Gunther1
Affiliation:
1. Department of Computer Vision, Institute of Cognitive Science, Osnabrück University, 49074 Osnabrück, Germany
Abstract
In recent years, model compression techniques have gained significant attention as a means to reduce the computational and memory requirements of deep neural networks. Knowledge distillation and pruning are two prominent approaches in this domain, each offering unique advantages in achieving model efficiency. This paper investigates the combined effects of knowledge distillation and two pruning strategies, weight pruning and channel pruning, on enhancing compression efficiency and model performance. The study introduces a metric called “Performance Efficiency” to evaluate the impact of these pruning strategies on model compression and performance. Our research is conducted on the popular datasets CIFAR-10 and CIFAR-100. We compared diverse model architectures, including ResNet, DenseNet, EfficientNet, and MobileNet. The results emphasize the efficacy of both weight and channel pruning in achieving model compression. However, a significant distinction emerges, with weight pruning showing superior performance across all four architecture types. We realized that the weight pruning method better adapts to knowledge distillation than channel pruning. Pruned models show a significant reduction in parameters without a significant reduction in accuracy.
Reference27 articles.
1. Malihi, L., and Heidemann, G. (2023). Efficient and Controllable Model Compression through Sequential Knowledge Distillation and Pruning. Big Data Cogn. Comput., 7. 2. Hinton, G.E., Vinyals, O., and Dean, J. (2015). Distilling the Knowledge in a Neural Network. arXiv. 3. Zagoruyko, S., and Komodakis, N. (2017). Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer. arXiv. 4. Ahn, S., Hu, S.X., Damianou, A., Lawrence, N.D., and Dai, Z. (2019, January 16–20). Variational Information Distillation for Knowledge Transfer. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA. 5. Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., and Bengio, Y. (2015). FitNets: Hints for Thin Deep Nets. arXiv.
|
|