Affiliation:
1. State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing 100024, China
2. China Unicom Smart City Research Institute, Beijing 100048, China
Abstract
Object detection based on Knowledge Distillation can enhance the capabilities and performance of 5G and 6G networks in various domains, such as autonomous vehicles, smart surveillance, and augmented reality. The integration of object detection with Knowledge Distillation techniques is expected to play a pivotal role in realizing the full potential of these networks. This study presents Shared Knowledge Distillation (Shared-KD) as a solution to overcome optimization challenges caused by disparities in cross-layer features between teacher–student networks. The significant gaps in intermediate-level features between teachers and students present a considerable obstacle to the efficacy of distillation. To tackle this issue, we draw inspiration from collaborative learning in real-world education, where teachers work together to prepare lessons and students engage in peer learning. Building upon this concept, our innovative contributions in model construction are highlighted as follows: (1) A teacher knowledge augmentation module: this module is proposed to combine lower-level teacher features, facilitating the knowledge transfer from the teacher to the student. (2) A student mutual learning module is introduced to enable students to learn from each other, mimicking the peer learning concept in collaborative learning. (3) The Teacher Share Module combines lower-level teacher features: the specific functionality of the teacher knowledge augmentation module is described, which involves combining lower-level teacher features. (4) The multi-step transfer process can be easily optimized due to the minimal gap between the features: the proposed approach breaks down the knowledge transfer process into multiple steps, which can be easily optimized due to the minimal gap between the features involved in each step. Shared-KD uses simple feature losses without additional weights in transformation, resulting in an efficient distillation process that can be easily combined with other methods for further improvement. The effectiveness of our approach is validated through experiments on popular tasks such as object detection and instance segmentation.
Funder
National Key Research and Development Program of China
Reference35 articles.
1. MetaLoc: Learning to learn wireless localization;Gao;IEEE J. Sel. Areas Commun.,2023
2. Cao, X., Lyu, Z., Zhu, G., Xu, J., Xu, L., and Cui, S. (2024). An overview on over-the-air federated edge learning. arXiv.
3. Accelerating Convergence of Federated Learning in MEC with Dynamic Community;Sun;IEEE Trans. Mob. Comput.,2023
4. Imagenet classification with deep convolutional neural networks;Krizhevsky;Commun. Acm,2012
5. Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23–28). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.