A control-theoretic perspective on optimal high-order optimization-Reference-Cited by-同舟云学术

A control-theoretic perspective on optimal high-order optimization

Published:2021-10-22 Issue: Volume: Page:
ISSN:0025-5610
Container-title:Mathematical Programming
language:en
Short-container-title:Math. Program.

Author:

Lin Tianyi^ORCID,Jordan Michael I.

Abstract

AbstractWe provide a control-theoretic perspective on optimal tensor algorithms for minimizing a convex function in a finite-dimensional Euclidean space. Given a function

$$\varPhi : {\mathbb {R}}^d \rightarrow {\mathbb {R}}$$

Φ : R d → R that is convex and twice continuously differentiable, we study a closed-loop control system that is governed by the operators

$$\nabla \varPhi $$

∇ Φ and

$$\nabla ^2 \varPhi $$

∇ 2 Φ together with a feedback control law

$$\lambda (\cdot )$$

λ ( · ) satisfying the algebraic equation

$$(\lambda (t))^p\Vert \nabla \varPhi (x(t))\Vert ^{p-1} = \theta $$

( λ ( t ) ) p ‖ ∇ Φ ( x ( t ) ) ‖ p - 1 = θ for some

$$\theta \in (0, 1)$$

θ ∈ ( 0 , 1 ) . Our first contribution is to prove the existence and uniqueness of a local solution to this system via the Banach fixed-point theorem. We present a simple yet nontrivial Lyapunov function that allows us to establish the existence and uniqueness of a global solution under certain regularity conditions and analyze the convergence properties of trajectories. The rate of convergence is

$$O(1/t^{(3p+1)/2})$$

O ( 1 / t ( 3 p + 1 ) / 2 ) in terms of objective function gap and

$$O(1/t^{3p})$$

O ( 1 / t 3 p ) in terms of squared gradient norm. Our second contribution is to provide two algorithmic frameworks obtained from discretization of our continuous-time system, one of which generalizes the large-step A-HPE framework of Monteiro and Svaiter (SIAM J Optim 23(2):1092–1125, 2013) and the other of which leads to a new optimal p-th order tensor algorithm. While our discrete-time analysis can be seen as a simplification and generalization of Monteiro and Svaiter (2013), it is largely motivated by the aforementioned continuous-time analysis, demonstrating the fundamental role that the feedback control plays in optimal acceleration and the clear advantage that the continuous-time perspective brings to algorithmic design. A highlight of our analysis is that we show that all of the p-th order optimal tensor algorithms that we discuss minimize the squared gradient norm at a rate of

$$O(k^{-3p})$$

O ( k - 3 p ) , which complements the recent analysis in Gasnikov et al. (in: COLT, PMLR, pp 1374–1391, 2019), Jiang et al. (in: COLT, PMLR, pp 1799–1801, 2019) and Bubeck et al. (in: COLT, PMLR, pp 492–507, 2019).

Funder

U.S. Naval Research Laboratory

Publisher

Springer Science and Business Media LLC

Subject

General Mathematics,Software

Link

https://link.springer.com/content/pdf/10.1007/s10107-021-01721-3.pdf

Reference113 articles.

1. Abbas, B., Attouch, H., Svaiter, B.F.: Newton-like dynamics and forward–backward methods for structured monotone inclusions in Hilbert spaces. J. Optim. Theory Appl. 161(2), 331–360 (2014)

2. Adly, S., Attouch, H.: Finite convergence of proximal-gradient inertial algorithms combining dry friction with hessian-driven damping. SIAM J. Optim. 30(3), 2134–2162 (2020)

3. Adly, S., Attouch, H.: First-order inertial algorithms involving dry friction damping. Math. Program. 1–41 (2021)