1. Fast convergence of natural gradient descent for over-parameterized neural networks;zhang;Advances in neural information processing systems,0
2. Adam: A method for stochastic optimization;kingma;3rd International Conference on Learning Representations ICLR 2015 San Diego CA USA May 7–9 2015 Conference Track Proceedings,0
3. Exact natural gradient in deep linear networks and its application to the nonlinear case;bernacchia;Advances in neural information processing systems,0
4. Deep information propagation;schoenholz;ArXiv Preprint,2016