Accelerating Symmetric Rank-1 Quasi-Newton Method with Nesterov’s Gradient for Training Neural Networks-Reference-Cited by-同舟云学术

Accelerating Symmetric Rank-1 Quasi-Newton Method with Nesterov’s Gradient for Training Neural Networks

Published:2021-12-24 Issue:1 Volume:15 Page:6
ISSN:1999-4893
Container-title:Algorithms
language:en
Short-container-title:Algorithms

Author:

Indrapriyadarsini S.^ORCID,Mahboubi Shahrzad^ORCID,Ninomiya Hiroshi^ORCID,Kamio Takeshi,Asai Hideki^ORCID

Abstract

Gradient-based methods are popularly used in training neural networks and can be broadly categorized into first and second order methods. Second order methods have shown to have better convergence compared to first order methods, especially in solving highly nonlinear problems. The BFGS quasi-Newton method is the most commonly studied second order method for neural network training. Recent methods have been shown to speed up the convergence of the BFGS method using the Nesterov’s acclerated gradient and momentum terms. The SR1 quasi-Newton method, though less commonly used in training neural networks, is known to have interesting properties and provide good Hessian approximations when used with a trust-region approach. Thus, this paper aims to investigate accelerating the Symmetric Rank-1 (SR1) quasi-Newton method with the Nesterov’s gradient for training neural networks, and to briefly discuss its convergence. The performance of the proposed method is evaluated on a function approximation and image classification problem.

Publisher

MDPI AG

Subject

Computational Mathematics,Computational Theory and Mathematics,Numerical Analysis,Theoretical Computer Science

Link

https://www.mdpi.com/1999-4893/15/1/6/pdf

Reference36 articles.

1. Large scale online learning;Bottou;Adv. Neural Inf. Process. Syst.,2004

2. Large-scale machine learning with stochastic gradient descent;Bottou,2010

3. A Stochastic Approximation Method

4. Accelerating Minibatch Stochastic Gradient Descent Using Typicality Sampling

5. Accelerating stochastic gradient descent using predictive variance reduction;Johnson;Adv. Neural Inf. Process. Syst.,2013

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Newtonian Property of Subgradient Method with Optimization of Metric Matrix Parameter Correction;Mathematics;2024-05-22

2. Machine Learning in Quasi-Newton Methods;Axioms;2024-04-05

3. Forecasting Indonesia’s poor population province using machine learning algorithm analysis;AIP Conference Proceedings;2024

4. On the Convergence Rate of Quasi-Newton Methods on Strongly Convex Functions with Lipschitz Gradient;Mathematics;2023-11-21

5. An Overview of Stochastic Quasi-Newton Methods for Large-Scale Machine Learning;Journal of the Operations Research Society of China;2023-02-25