1. Ainsworth, M., Shin, Y.: Plateau phenomenon in gradient descent training of RELU networks: explanation, quantification, and avoidance. SIAM J. Sci. Comput. 43, 3438–3468 (2021)
2. Allen-Zhu, Z., Li, Y., Song, Z.: A convergence theory for deep learning via over-parameterization. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, vol. 97, pp. 242–252, PMLR (2019)
3. Arjevani, Y., Field, M.: Analytic study of families of spurious minima in two-layer ReLU neural networks: a tale of symmetry II. In: Advances in Neural Information Processing Systems, vol. 34. Curran Associates, Inc. (2021)
4. Auer, P., Herbster, M., Warmuth, M.K.: Exponentially many local minima for single neurons. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems, vol. 8, pp. 316–322. Curran Associates, Inc. (1996)
5. Benedetto, J.J., Czaja, W.: Integration and Modern Analysis. Birkhäuser Advanced Texts. Birkhäuser, Boston (2010)