MSSD: multi-scale self-distillation for object detection-Reference-Cited by-同舟云学术

MSSD: multi-scale self-distillation for object detection

Published:2024-03-21 Issue:1 Volume:2 Page:
ISSN:2731-9008
Container-title:Visual Intelligence
language:en
Short-container-title:Vis. Intell.

Author:

Jia Zihao^ORCID,Sun Shengkun,Liu Guangcan,Liu Bo

Abstract

AbstractKnowledge distillation techniques have been widely used in the field of deep learning, usually by extracting valid information from a neural network with a large number of parameters and a high learning capacity (the teacher model) to a neural network with a small number of parameters and a low learning capacity (the student model). However, there are inefficiencies in the transfer of knowledge between teacher and student. The student model does not fully learn all the knowledge of the teacher model. Therefore, we aim to achieve knowledge distillation of our network layer by a single model, i.e., self-distillation. We also apply the idea of self-distillation to the object detection task and propose a multi-scale self-distillation approach, where we argue that knowledge distillation of the information contained in feature maps at different scales can help the model better detect small targets. In addition, we propose a Gaussian mask based on the target region as an auxiliary detection method to improve the accuracy of target position detection in the distillation process. We then validate our approach on the KITTI dataset using a single-stage detector YOLO. The results demonstrate a 2.8% improvement in accuracy over the baseline model without the use of a teacher model.

Funder

National Natural Science Joint Fund Key Program

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s44267-024-00040-3.pdf

Reference49 articles.

1. Yurtsever, E., Lambert, J., Carballo, A., & Takeda, K. (2020). A survey of autonomous driving: common practices and emerging technologies. IEEE Access, 8, 58443–58469.

2. Gidaris, S., & Komodakis, N. (2015). Object detection via a multi-region and semantic segmentation-aware CNN model. In Proceedings of the IEEE international conference on computer vision (pp. 1134–1142). Piscataway: IEEE.

3. Du, J. (2023). Understanding of object detection based on CNN family and YOLO. Retrieved November 2, 2023, from https://iopscience.iop.org/article/10.1088/1742-6596/1004/1/012029/pdf.

4. Polino, A., Pascanu, R., & Alistarh, D. (2018). Model compression via distillation and quantization. [Poster presentation]. Proceedings of the 6th international conference on learning representations, Vancouver, Canada.

5. Zhou, Y., Moosavi-Dezfooli, S. M., Cheung, N. M., & Frossard, P. (2018). Adaptive quantization for deep neural network. In S. A. McIlraith & K. Q. Weinberger (Eds.), Proceedings of the 32nd AAAI conference on artificial intelligence (pp. 4596–4604). Palo Alto: AAAI Press.