An Optimization Strategy Based on Hybrid Algorithm of Adam and SGD


Wang Yijun,Zhou Pengyu,Zhong Wenya


Despite superior training outcomes, adaptive optimization methods such as Adam, Adagrad or RMSprop have been found to generalize poorly compared to stochastic gradient descent (SGD). So scholars (Nitish Shirish Keskar et al., 2017) proposed a hybrid strategy to start training with Adam and switch to SGD at the right time. In the learning task with a large output space, it was observed that Adam could not converge to an optimal solution (or could not converge to an extreme point in a non-convex scene) [1]. Therefore, this paper proposes a new variant of the ADAM algorithm (AMSGRAD), which not only solves the convergence problem, but also improves the empirical performance.


EDP Sciences


General Medicine

Reference9 articles.

1. Robbins Herbert and Monro Sutton. A stochastic approximation method. The annals of mathematical statistics,pp. 400–407, 1951.

2. Kingma D. and Ba J. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR 2015), 2015

3. Tieleman T. and Hinton G. Lecture 6.5-RMSProp: Divide the gradient by a running average of its recent magni-tude. COURSERA:Neural Networks for Machine Learning, 4, 2012.

Cited by 12 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献







Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3