Discriminative Bayesian filtering lends momentum to the stochastic Newton method for minimizing log-convex functions-Reference-Cited by-同舟云学术

Discriminative Bayesian filtering lends momentum to the stochastic Newton method for minimizing log-convex functions

Published:2022-06-22 Issue:3 Volume:17 Page:657-673
ISSN:1862-4472
Container-title:Optimization Letters
language:en
Short-container-title:Optim Lett

Author:

Burkhart Michael C.^ORCID

Abstract

AbstractTo minimize the average of a set of log-convex functions, the stochastic Newton method iteratively updates its estimate using subsampled versions of the full objective’s gradient and Hessian. We contextualize this optimization problem as sequential Bayesian inference on a latent state-space model with a discriminatively-specified observation process. Applying Bayesian filtering then yields a novel optimization algorithm that considers the entire history of gradients and Hessians when forming an update. We establish matrix-based conditions under which the effect of older observations diminishes over time, in a manner analogous to Polyak’s heavy ball momentum. We illustrate various aspects of our approach with an example and review other relevant innovations for the stochastic Newton method.

Publisher

Springer Science and Business Media LLC

Subject

Control and Optimization,Business, Management and Accounting (miscellaneous)

Link

https://link.springer.com/content/pdf/10.1007/s11590-022-01895-5.pdf

Reference78 articles.

1. Abdullah, A., Kumar, R., McGregor, A., Vassilvitskii, S., Venkatasubramanian, S.: Sketching, embedding, and dimensionality reduction for information spaces. In: Int. Conf. Artif. Intell. Stat. (2016)

2. Agarwal, N., Bullins, B., Hazan, E.: Second-order stochastic optimization for machine learning in linear time. J. Mach. Learn. Res. 18, 4148–4187 (2017)

3. Akyıldız, Ö.D., Chouzenoux, É., Elvira, V., Míguez, J.: A probabilistic incremental proximal gradient method. IEEE Signal Process. Lett. 26(8), 1257–1261 (2019)

4. Amari, S.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251–276 (1998)

5. Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific J. Math. 16(1), 1–3 (1966)