Class-Center-Based Self-Knowledge Distillation: A Simple Method to Reduce Intra-Class Variance
-
Published:2024-08-10
Issue:16
Volume:14
Page:7022
-
ISSN:2076-3417
-
Container-title:Applied Sciences
-
language:en
-
Short-container-title:Applied Sciences
Author:
Zhong Ke1, Zhang Lei1ORCID, Wang Lituan1ORCID, Shu Xin1, Wang Zizhou2
Affiliation:
1. Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu 610065, China 2. Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore
Abstract
Recent inter-sample self-distillation methods that spread knowledge across samples further improve the performance of deep models on multiple tasks. However, their existing implementations introduce additional sampling and computational overhead. Therefore, in this work, we propose a simple improved algorithm, the center self-distillation, which achieves a better effect with almost no additional computational cost. The design process for it has two steps. First, we show using a simple visualization design that the inter-sample self-distillation results in a denser distribution of samples with identical labels in the feature space. And, the key to its effectiveness is that it reduces the intra-class variance of features through mutual learning between samples. This brings us to the idea of providing a soft target for each class as the center for all samples within that class to learn from. Then, we propose to learn class centers and consequently compute class predictions for constructing these soft targets. In particular, to prevent over-fitting arising from eliminating intra-class variation, the specific soft target for each sample is customized by fusing the corresponding class prediction with that sample’s prediction. This is helpful in mitigating overconfident predictions and can drive the network to produce more meaningful and consistent predictions. The experimental results of various image classification tasks show that this simple yet powerful approach can not only reduce intra-class variance but also greatly improve the generalization ability of modern convolutional neural networks.
Funder
National Natural Science Foundation for Distinguished Young Scholar of China
Reference63 articles.
1. Buciluf, C., Caruana, R., and Niculescu-Mizil, A. (2006, January 20–23). Model compression. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA. 2. Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling the knowledge in a neural network. arXiv. 3. Papernot, N., McDaniel, P., Wu, X., Jha, S., and Swami, A. (2016, January 22–26). Distillation as a defense to adversarial perturbations against deep neural networks. Proceedings of the 2016 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA. 4. Yim, J., Joo, D., Bae, J., and Kim, J. (2017, January 21–26). A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA. 5. Yu, R., Li, A., Morariu, V.I., and Davis, L.S. (2017, January 22–29). Visual relationship detection with internal and external linguistic knowledge distillation. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.
|
|