A control-theoretic perspective on optimal high-order optimization

Author:

Lin TianyiORCID,Jordan Michael I.

Abstract

AbstractWe provide a control-theoretic perspective on optimal tensor algorithms for minimizing a convex function in a finite-dimensional Euclidean space. Given a function $$\varPhi : {\mathbb {R}}^d \rightarrow {\mathbb {R}}$$ Φ : R d R that is convex and twice continuously differentiable, we study a closed-loop control system that is governed by the operators $$\nabla \varPhi $$ Φ and $$\nabla ^2 \varPhi $$ 2 Φ together with a feedback control law $$\lambda (\cdot )$$ λ ( · ) satisfying the algebraic equation $$(\lambda (t))^p\Vert \nabla \varPhi (x(t))\Vert ^{p-1} = \theta $$ ( λ ( t ) ) p Φ ( x ( t ) ) p - 1 = θ for some $$\theta \in (0, 1)$$ θ ( 0 , 1 ) . Our first contribution is to prove the existence and uniqueness of a local solution to this system via the Banach fixed-point theorem. We present a simple yet nontrivial Lyapunov function that allows us to establish the existence and uniqueness of a global solution under certain regularity conditions and analyze the convergence properties of trajectories. The rate of convergence is $$O(1/t^{(3p+1)/2})$$ O ( 1 / t ( 3 p + 1 ) / 2 ) in terms of objective function gap and $$O(1/t^{3p})$$ O ( 1 / t 3 p ) in terms of squared gradient norm. Our second contribution is to provide two algorithmic frameworks obtained from discretization of our continuous-time system, one of which generalizes the large-step A-HPE framework of Monteiro and Svaiter (SIAM J Optim 23(2):1092–1125, 2013) and the other of which leads to a new optimal p-th order tensor algorithm. While our discrete-time analysis can be seen as a simplification and generalization of Monteiro and Svaiter (2013), it is largely motivated by the aforementioned continuous-time analysis, demonstrating the fundamental role that the feedback control plays in optimal acceleration and the clear advantage that the continuous-time perspective brings to algorithmic design. A highlight of our analysis is that we show that all of the p-th order optimal tensor algorithms that we discuss minimize the squared gradient norm at a rate of $$O(k^{-3p})$$ O ( k - 3 p ) , which complements the recent analysis in Gasnikov et al. (in: COLT, PMLR, pp 1374–1391, 2019), Jiang et al. (in: COLT, PMLR, pp 1799–1801, 2019) and Bubeck et al. (in: COLT, PMLR, pp 492–507, 2019).

Funder

U.S. Naval Research Laboratory

Publisher

Springer Science and Business Media LLC

Subject

General Mathematics,Software

Cited by 9 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3