SCORE: approximating curvature information under self-concordant regularization-Reference-Cited by-同舟云学术

SCORE: approximating curvature information under self-concordant regularization

Published:2023-07-08 Issue:2 Volume:86 Page:599-626
ISSN:0926-6003
Container-title:Computational Optimization and Applications
language:en
Short-container-title:Comput Optim Appl

Author:

Adeoye Adeyemi D.^ORCID,Bemporad Alberto

Abstract

AbstractOptimization problems that include regularization functions in their objectives are regularly solved in many applications. When one seeks second-order methods for such problems, it may be desirable to exploit specific properties of some of these regularization functions when accounting for curvature information in the solution steps to speed up convergence. In this paper, we propose the SCORE (self-concordant regularization) framework for unconstrained minimization problems which incorporates second-order information in the Newton-decrement framework for convex optimization. We propose the generalized Gauss–Newton with Self-Concordant Regularization (GGN-SCORE) algorithm that updates the minimization variables each time it receives a new input batch. The proposed algorithm exploits the structure of the second-order information in the Hessian matrix, thereby reducing computational overhead. GGN-SCORE demonstrates how to speed up convergence while also improving model generalization for problems that involve regularized minimization under the proposed SCORE framework. Numerical experiments show the efficiency of our method and its fast convergence, which compare favorably against baseline first-order and quasi-Newton methods. Additional experiments involving non-convex (overparameterized) neural network training problems show that the proposed method is promising for non-convex optimization.

Funder

Scuola IMT Alti Studi Lucca

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computational Mathematics,Control and Optimization

Link

https://link.springer.com/content/pdf/10.1007/s10589-023-00502-2.pdf

Reference57 articles.

1. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 400–407 (1951)

2. Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010, pp. 177–186. Springer, USA (2010)

3. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research 12(7) (2011)

4. Zeiler, M.D.: Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)

5. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An Inexact Sequential Quadratic Programming Method for Learning and Control of Recurrent Neural Networks;IEEE Transactions on Neural Networks and Learning Systems;2024