1. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from: Tensorflow.org
2. Arjevani Y, Carmon Y, Duchi JC, Foster DJ, Srebro N, Woodworth B (2023) Lower bounds for non-convex stochastic optimization. Math Program 199(1–2):165–214
3. Bach F (2023) Learning theory from first principles. MIT Press (Draft), Cambridge
4. Bartlett PL, Jordan MI, McAuliffe JD (2006) Convexity, classification, and risk bounds. J Amer Stat Assoc 101(473):138–156
5. Bottou L (1999) On-line learning and stochastic approximations. In: On-line learning in neural networks. Saad D (ed).. Publications of the Newton Institute. Cambridge University Press, Cambridge, pp 9–42