Learning rate burst for superior SGDM and AdamW integration-Reference-Cited by-同舟云学术

Learning rate burst for superior SGDM and AdamW integration

Published:2024-03-22 Issue: Volume: Page:1-11
ISSN:1064-1246
Container-title:Journal of Intelligent & Fuzzy Systems
language:
Short-container-title:IFS

Author:

Lin Zhiwei¹,Zhang Songchuan¹,Zhou Yiwei¹,Wang Haoyu¹,Wang Shilei²

Affiliation:

1. School of Automation, Beijing Institute of Technology, Beijing, China

2. Infrastructure Inspection Research Institute, China Academy of Railway Sciences, Beijing, China

Abstract

Current mainstream deep learning optimization algorithms can be classified into two categories: non-adaptive optimization algorithms, such as Stochastic Gradient Descent with Momentum (SGDM), and adaptive optimization algorithms, like Adaptive Moment Estimation with Weight Decay (AdamW). Adaptive optimization algorithms for many deep neural network models typically enable faster initial training, whereas non-adaptive optimization algorithms often yield better final convergence. Our proposed Adaptive Learning Rate Burst (Adaburst) algorithm seeks to combine the strengths of both categories. The update mechanism of Adaburst incorporates elements from AdamW and SGDM, ensuring a seamless transition between the two. Adaburst modifies the learning rate of the SGDM algorithm based on a cosine learning rate schedule, particularly when the algorithm encounters an update bottleneck, which is called learning rate burst. This approach helps the model to escape current local optima more effectively. The results of the Adaburst experiment underscore its enhanced performance in image classification and generation tasks when compared with alternative approaches, characterized by expedited convergence and elevated accuracy. Notably, on the MNIST, CIFAR-10, and CIFAR-100 datasets, Adaburst attained accuracies that matched or exceeded those achieved by SGDM. Furthermore, in training diffusion models on the DeepFashion dataset, Adaburst achieved convergence in fewer epochs than a meticulously calibrated AdamW optimizer while avoiding abrupt blurring or other training instabilities. Adaburst augmented the final training set accuracy on the MNIST, CIFAR-10, and CIFAR-100 datasets by 0.02%, 0.41%, and 4.18%, respectively. In addition, the generative model trained on the DeepFashion dataset demonstrated a 4.62-point improvement in the Frechet Inception Distance (FID) score, a metric for assessing generative model quality. Consequently, this evidence suggests that Adaburst introduces an innovative optimization algorithm that simultaneously updates AdamW and SGDM and incorporates a learning rate burst mechanism. This mechanism significantly enhances deep neural networks’ training speed and convergence accuracy.

Publisher

IOS Press

Reference30 articles.

1. A Study of the Optimization Algorithms in Deep Learning

2. A comparison study of using optimization algorithms and artificial neural networks for predicting permeability;Kaydani;Journal of Petroleum Science and Engineering,2013

3. A comparison of optimization algorithms for deep learning;Soydaner;International Journal of Pattern Recognition and Artificial Intelligence,2020

4. An Empirical Study of the Performance of Different Optimizers in the Deep Neural Networks

5. Hybrid neural network-based metaheuristics for prediction of financial markets: A case study on global gold market;Mamoudan;Journal of Computational Design and Engineering,2023