1. Amari, S.I.: Natural gradient works efficiently in learning. Neural Comput. 10, 251–276 (1998)
2. Grosse, R., Martens, J.: A kronecker-factored approximate fisher matrix for convolution layers. In: ICML (2016)
3. Hecht-Nielsen, R., et al.: Theory of the backpropagation neural network. Neural Netw. 1(Supplement–1), 445–448 (1988)
4. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: NIPS (2014)
5. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)