1. Bartlett, P., Shawe-Taylor, J.: Advances in Kernel Methods, Chap. Generalization Performance of Support Vector Machines and Other Pattern Classifiers, pp. 43–54. MIT Press, Cambridge (1999)
2. Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
3. Brutzkus, A., Globerson, A., Malach, E., Shalev-Shwartz, S.: SGD learns over-parameterized networks that provably generalize on linearly separable data. In: International Conference on Learning Representations, ICLR 2018. Vancouver, BC, Canada, April 30-May 3, 2018, Conference Track Proceedings (2018). https://openreview.net/forum?id=rJ33wwxRb
4. Bubeck, S.: Convex optimization: algorithms and complexity. arXiv e-prints arXiv:1405.4980 (2014)
5. Bubeck, S.: Convex optimization: algorithms and complexity. Found. Trends Mach. Learn. 8(3–4), 231–357 (2015). https://doi.org/10.1561/2200000050