1. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., & Kudlur, M. (2016) Tensorflow: A system for large-scale machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16), (pp. 265–283).
2. Absil, P.-A., & Kurdyka, K. (2006). On the stable equilibrium points of gradient systems. Systems and Control Letters, 55(7), 573–577.
3. Absil, P.-A., Mahony, R., & Andrews, B. (2005). Convergence of the descent methods for analytic cost functions. SIAM Journal of Optimization, 16(2), 531–547.
4. Allen-zhu, Z. (2018). Natasha 2: Faster non-convex optimization than SGD. Advances in Neural Information Processing Systems, 32, 2675–2686.
5. Amari, S. I. (1998). Natural gradient works efficiently in learning. Neural Computation, 10(2), 251–276.