Convergence of Langevin-simulated annealing algorithms with multiplicative noise-Reference-Cited by-同舟云学术

Convergence of Langevin-simulated annealing algorithms with multiplicative noise

Published:2024-03-15 Issue:348 Volume:93 Page:1761-1803
ISSN:0025-5718
Container-title:Mathematics of Computation
language:en
Short-container-title:Math. Comp.

Author:

Bras Pierre,Pagès Gilles

Abstract

We study the convergence of Langevin-Simulated Annealing type algorithms with multiplicative noise, i.e. for V : R d → R V : \mathbb {R}^d \to \mathbb {R} a potential function to minimize, we consider the stochastic differential equation d Y t = − σ σ ⊤ ∇ V ( Y t ) dY_t = - \sigma \sigma ^\top \nabla V(Y_t) d t + a ( t ) σ ( Y t ) d W t + a ( t ) 2 Υ ( Y t ) d t dt + a(t)\sigma (Y_t)dW_t + a(t)^2\Upsilon (Y_t)dt , where ( W t ) (W_t) is a Brownian motion, where σ : R d → M d ( R ) \sigma : \mathbb {R}^d \to \mathcal {M}_d(\mathbb {R}) is an adaptive (multiplicative) noise, where a : R + → R + a : \mathbb {R}^+ \to \mathbb {R}^+ is a function decreasing to 0 0 and where Υ \Upsilon is a correction term. This setting can be applied to optimization problems arising in Machine Learning; allowing σ \sigma to depend on the position brings faster convergence in comparison with the classical Langevin equation d Y t = − ∇ V ( Y t ) d t + σ d W t dY_t = -\nabla V(Y_t)dt + \sigma dW_t . The case where σ \sigma is a constant matrix has been extensively studied; however little attention has been paid to the general case. We prove the convergence for the L 1 L^1 -Wasserstein distance of Y t Y_t and of the associated Euler scheme Y ¯ t \bar {Y}_t to some measure ν ⋆ \nu ^\star which is supported by argmin ⁡ ( V ) \operatorname {argmin}(V) and give rates of convergence to the instantaneous Gibbs measure ν a ( t ) \nu _{a(t)} of density ∝ exp ⁡ ( − 2 V ( x ) / a ( t ) 2 ) \propto \exp (-2V(x)/a(t)^2) . To do so, we first consider the case where a a is a piecewise constant function. We find again the classical schedule a ( t ) = A log − 1 / 2 ⁡ ( t ) a(t) = A\log ^{-1/2}(t) . We then prove the convergence for the general case by giving bounds for the Wasserstein distance to the stepwise constant case using ergodicity properties.

Funder

Sorbonne UniversitÃ©

Publisher

American Mathematical Society (AMS)

Link

https://www.ams.org/mcom/2024-93-348/S0025-5718-2024-03899-1/mcom3899_AM.pdf

Reference39 articles.

1. The law of the Euler scheme for stochastic differential equations. I. Convergence rate of the distribution function;Bally, V.;Probab. Theory Related Fields,1996

2. Convergence rates of Gibbs measures with degenerate minimum;Bras, Pierre;Bernoulli,2022

3. S. Bubeck, R. Eldan, and J. Lehec, Finite-time analysis of projected Langevin Monte Carlo, Advances in Neural Information Processing Systems (C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, eds.), vol. 28, Curran Associates, Inc., 2015.

4. Diffusion for global optimization in 𝑅ⁿ;Chiang, Tzuu-Shuh;SIAM J. Control Optim.,1987

5. Theoretical guarantees for approximate sampling from smooth and log-concave densities;Dalalyan, Arnak S.;J. R. Stat. Soc. Ser. B. Stat. Methodol.,2017