A Bounded Scheduling Method for Adaptive Gradient Methods-Reference-Cited by-同舟云学术

A Bounded Scheduling Method for Adaptive Gradient Methods

Published:2019-09-01 Issue:17 Volume:9 Page:3569
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Tang Mingxing^ORCID,Huang Zhen,Yuan Yuan,Wang Changjian,Peng Yuxing

Abstract

Many adaptive gradient methods have been successfully applied to train deep neural networks, such as Adagrad, Adadelta, RMSprop and Adam. These methods perform local optimization with an element-wise scaling learning rate based on past gradients. Although these methods can achieve an advantageous training loss, some researchers have pointed out that their generalization capability tends to be poor as compared to stochastic gradient descent (SGD) in many applications. These methods obtain a rapid initial training process but fail to converge to an optimal solution due to the unstable and extreme learning rates. In this paper, we investigate the adaptive gradient methods and get the insights on various factors that may lead to poor performance of Adam. To overcome that, we propose a bounded scheduling algorithm for Adam, which can not only improve the generalization capability but also ensure the convergence. To validate our claims, we carry out a series of experiments on the image classification and the language modeling tasks on several standard benchmarks such as ResNet, DenseNet, SENet and LSTM on typical data sets such as CIFAR-10, CIFAR-100 and Penn Treebank. Experimental results show that our method can eliminate the generalization gap between Adam and SGD, meanwhile maintaining a relative high convergence rate during training.

Funder

The National Key Research and Development Program of China

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/9/17/3569/pdf

Reference34 articles.

1. Deep learning in neural networks: An overview

2. Efficient Weights Quantization of Convolutional Neural Networks Using Kernel Density Estimation based Non-uniform Quantizer

3. Multi-Scale Attention Deep Neural Network for Fast Accurate Object Detection

4. Building DNN acoustic models for large vocabulary speech recognition

5. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multi-module-based CVAE to predict HVCM faults in the SNS accelerator;Machine Learning with Applications;2023-09

2. Water Quality Grade Identification for Lakes in Middle Reaches of Yangtze River Using Landsat-8 Data with Deep Neural Networks (DNN) Model;Remote Sensing;2022-12-09

3. A Scaling Transition Method from SGDM to SGD with 2ExpLR Strategy;Applied Sciences;2022-11-24

4. Image-based Oil Palm Leaf Disease Detection using Convolutional Neural Network;Journal of Information and Communication Technology;2022-10-19

5. Linear Regression Machine Learning Algorithms for Estimating Reference Evapotranspiration Using Limited Climate Data;Sustainability;2022-09-16