1. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)
2. Chang, T.H., Hong, M., Liao, W.C., Wang, X.: Asynchronous distributed admm for large-scale optimization-part I: algorithm and convergence analysis. IEEE Trans. Signal Process. 64(12), 3118–3130 (2016)
3. Li, M., et al.: Scaling distributed machine learning with the parameter server. In: 11th
$$\{$$
USENIX
$$\}$$
Symposium on Operating Systems Design and Implementation (
$$\{$$
OSDI
$$\}$$
14), pp. 583–598 (2014)
4. Patarasuk, P., Yuan, X.: Bandwidth optimal all-reduce algorithms for clusters of workstations. J. Parallel Distrib. Comput. 69(2), 117–124 (2009)
5. Sergeev, A., Del Balso, M.: Horovod: fast and easy distributed deep learning in tensorflow. arXiv preprint
arXiv:1802.05799
(2018)