1. Advani MS, Saxe AM, Sompolinsky H (2020) High-dimensional dynamics of generalization error in neural networks. Neural Netw 132:428–446
2. Aguirre D, Fuentes O (2019) Improving weight initialization of relu and output layers In: International Conference on Artificial Neural Networks, pp 170–184
3. Arpit D, Bengio Y (2019) The benefits of over-parameterization at initialization in deep relu networks arXiv preprint arXiv:1901.03611 (2019)
4. Balduzzi D, Frean M, Leary L, Lewis J, Ma KWD, McWilliams B (2017) The shattered gradients problem: if resnets are the answer then what is the question? In: Proc of the International Conference on Machine Learning (ICML)
5. Chen H, Zheng L, Al Kontar R, Raskutti G (2020) Stochastic gradient descent in correlated settings: a study on gaussian processes Adv Neural Information Process Syst