1. Gradient descent happens in a tiny subspace;Gur-Ari;arXiv,2018
2. Empirical analysis of the Hessian of over-parametrized neural networks;Sagun
3. The full spectrum of deepnet Hessians at scale: Dynamics with SGD training and sample size;Papyan;arXiv,2019
4. Dissecting Hessian: Understanding common structure of Hessian in neural networks;Wu;arXiv,2021
5. Analytic insights into structure and rank of neural network hessian maps;Singh;Advances in Neural Information Processing Systems,2021