GrOD : Deep Learning with Gradients Orthogonal Decomposition for Knowledge Transfer, Distillation, and Adversarial Training

Author:

Xiong Haoyi1ORCID,Wan Ruosi2ORCID,Zhao Jian3ORCID,Chen Zeyu1ORCID,Li Xingjian1ORCID,Zhu Zhanxing4ORCID,Huan Jun1ORCID

Affiliation:

1. Baidu, Inc., Beijing, China

2. Peking University, and Baidu, Inc., Beijing, China

3. Institute of North Electronic Equipment, Beijing, China

4. Peking University, Beijing, China

Abstract

Regularization that incorporates the linear combination of empirical loss and explicit regularization terms as the loss function has been frequently used for many machine learning tasks. The explicit regularization term is designed in different types, depending on its applications. While regularized learning often boost the performance with higher accuracy and faster convergence, the regularization would sometimes hurt the empirical loss minimization and lead to poor performance. To deal with such issues in this work, we propose a novel strategy, namely Gr adients O rthogonal D ecomposition ( GrOD ), that improves the training procedure of regularized deep learning. Instead of linearly combining gradients of the two terms, GrOD re-estimates a new direction for iteration that does not hurt the empirical loss minimization while preserving the regularization affects, through orthogonal decomposition. We have performed extensive experiments to use GrOD improving the commonly used algorithms of transfer learning [ 2 ], knowledge distillation [ 3 ], and adversarial learning [ 4 ]. The experiment results based on large datasets, including Caltech 256 [ 5 ], MIT indoor 67 [ 6 ], CIFAR-10 [ 7 ], and ImageNet [ 8 ], show significant improvement made by GrOD for all three algorithms in all cases.

Funder

National Key R&D Program of China

National Science Foundation of China

Young Elite Scientist Sponsorship Program of China Association for Science and Technology

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference73 articles.

1. Towards Making Deep Transfer Learning Never Hurt

2. Explicit inductive bias for transfer learning with convolutional networks;Li Xuhong;Proceedings of the 35th International Conference on Machine Learning,2018

3. A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning

4. Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards deep learning models resistant to adversarial attacks. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=rJzIBfZAb.

5. Gregory Griffin Alex Holub and Pietro Perona. 2007. Caltech-256 object category dataset. Retrieved on 28 July 2022 https://authors.library.caltech.edu/7694/1/CNS-TR-2007-001.pdf.

Cited by 9 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Uncertainty graph convolution recurrent neural network for air quality forecasting;Advanced Engineering Informatics;2024-10

2. An Optimal Edge-weighted Graph Semantic Correlation Framework for Multi-view Feature Representation Learning;ACM Transactions on Multimedia Computing, Communications, and Applications;2024-04-25

3. Causal inference for out‐of‐distribution recognition via sample balancing;CAAI Transactions on Intelligence Technology;2024-04-02

4. A Canonical Data Transformation for Achieving Inter- and Within-Group Fairness;IEEE Transactions on Information Forensics and Security;2024

5. Rethinking the Person Localization for Single-Stage Multi-Person Pose Estimation;IEEE Transactions on Multimedia;2024

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3