Affiliation:
1. Department of Automatic Control Engineering, Feng Chia University, Taichung, Taiwan
Abstract
Recently, large-scale artificial intelligence models with billions of parameters have achieved good results in experiments, but their practical deployment on edge computing platforms is often subject to many constraints because of their resource requirements. These models require powerful computing platforms with a high memory capacity to store and process the numerous parameters and activations, which makes it challenging to deploy these large-scale models directly. Therefore, model compression techniques are crucial role in making these models more practical and accessible. In this article, a progressive channel pruning strategy combining graph attention network and transformer, namely GAT TransPruning, is proposed, which uses the graph attention networks (GAT) and the attention of transformer mechanism to determine the channel-to-channel relationship in large networks. This approach ensures that the network maintains its critical functional connections and optimizes the trade-off between model size and performance. In this study, VGG-16, VGG-19, ResNet-18, ResNet-34, and ResNet-50 are used as large-scale network models with the CIFAR-10 and CIFAR-100 datasets for verification and quantitative analysis of the proposed progressive channel pruning strategy. The experimental results reveal that the accuracy rate only drops by 6.58% when the channel pruning rate is 89% for VGG-19/CIFAR-100. In addition, the lightweight model inference speed is 9.10 times faster than that of the original large model. In comparison with the traditional channel pruning schemes, the proposed progressive channel pruning strategy based on the GAT and Transformer cannot only cut out the insignificant weight channels and effectively reduce the model size, but also ensure that the performance drop rate of its lightweight model is still the smallest even under high pruning ratio.
Funder
National Science and Technology Council, Taiwan, R.O.C.
Reference48 articles.
1. N2N learning: network to network compression via policy gradient reinforcement learning;Ashok,2017
2. LCNN: Lookup-based convolutional neural network;Bagherinezhad,2017
3. A novel and efficient model pruning method for deep convolutional neural networks by evaluating the direct and indirect effects of filters;Basha;Neurocomputing,2024
4. Smash: one-shot model architecture search through hypernetworks;Brock,2018
5. Dynamical channel pruning by conditional accuracy change for deep neural networks;Chen;IEEE Transactions on Neural Networks and Learning Systems,2021