Abstract
AbstractDistributed stochastic zeroth-order optimization (DSZO), in which the objective function is allocated over multiple agents and the derivative of cost functions is unavailable, arises frequently in large-scale machine learning and reinforcement learning. This paper introduces a distributed stochastic algorithm for DSZO in a projection-free and gradient-free manner via the Frank-Wolfe framework and the stochastic zeroth-order oracle (SZO). Such a scheme is particularly useful in large-scale constrained optimization problems where calculating gradients or projection operators is impractical, costly, or when the objective function is not differentiable everywhere. Specifically, the proposed algorithm, enhanced by recursive momentum and gradient tracking techniques, guarantees convergence with just a single batch per iteration. This significant improvement over existing algorithms substantially lowers the computational complexity. Under mild conditions, we prove that the complexity bounds on SZO of the proposed algorithm are $\mathcal{O}(n/\epsilon ^{2})$
O
(
n
/
ϵ
2
)
and $\mathcal{O}(n(2^{\frac{1}{\epsilon}}))$
O
(
n
(
2
1
ϵ
)
)
for convex and nonconvex cases, respectively. The efficacy of the algorithm is verified on black-box binary classification problems against several competing alternatives.
Funder
National Natural Science Foundation of China
Publisher
Springer Science and Business Media LLC
Reference36 articles.
1. S. Aeron, V. Saligrama, D.A. Castanon, Efficient sensor management policies for distributed target tracking in multihop sensor networks. IEEE Trans. Signal Process. 56(6), 2562–2574 (2008)
2. Z. Akhtar, K. Rajawat, Momentum based projection free stochastic optimization under affine constraints, in American Control Conf. (2021), pp. 2619–2624
3. Z. Akhtar, K. Rajawat, Zeroth and first order stochastic Frank-Wolfe algorithms for constrained optimization. IEEE Trans. Signal Process. 70, 2119–2135 (2022)
4. K. Balasubramanian, S. Ghadimi, Zeroth-order (non)-convex stochastic optimization via conditional gradient and gradient updates, in Proc. Int. Conf. Neural Inf. Process. Syst. (2018), pp. 3459–3468
5. A. Bellet, Y. Liang, A.B. Garakani et al., A distributed Frank-Wolfe algorithm for communication-efficient sparse learning, in Proc. SIAM Int. Conf. Data Mining (2015), pp. 478–486. https://doi.org/10.1137/1.9781611974010.54