1. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, et al. (2016) Tensorflow: A system for large-scale machine learning. In: 12th $$\{$$USENIX$$\}$$ symposium on operating systems design and implementation ($$\{$$OSDI$$\}$$ 16), pp 265–283
2. Amari SI (1998) Natural gradient works efficiently in learning. Neural Comput 10(2):251–276
3. Banakar A (2011) Lyapunov stability analysis of gradient descent-learning algorithm in network training. In: ISRN applied mathematics 2011
4. Baydin AG, Cornish R, Rubio DM, Schmidt M, Wood F (2018) Online learning rate adaptation with hypergradient descent. In: Sixth international conference on learning representations (ICLR), Vancouver, Canada, April 30–May 3, 2018
5. Bengio Y (2012) Deep learning of representations for unsupervised and transfer learning. In: Proceedings of ICML workshop on unsupervised and transfer learning, pp 17–36