1. Advani, M. S., Saxe, A. M., & Sompolinsky, H. (2020). High-dimensional dynamics of generalization error in neural networks. Neural Networks, 132, 428–446. https://doi.org/10.1016/j.neunet.2020.08.022
2. Arpit, D., Jastrzkebski, S., Ballas, N., et al. (2017). A closer look at memorization in deep networks. In: D. Precup & Y. W. Teh (Eds.), Proceedings of the 34th international conference on machine learning, Proceedings of Machine Learning Research, PMLR (Vol. 70, pp. 233–242). https://proceedings.mlr.press/v70/arpit17a.html N
3. Ba, L. J., & Frey, B. (2013) Adaptive dropout for training deep neural networks. In Proceedings of the 26th international conference on neural information processing systems, NIPS’13 (Vol. 2, pp. 3084–3092). Curran Associates Inc.
4. Belkin, M., Hsu, D., Ma, S., et al. (2019). Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proceedings of the National Academy of Sciences, 116(32), 15849–15854. https://doi.org/10.1073/pnas.1903070116
5. Belkin, M., Hsu, D., & Xu, J. (2020). Two models of double descent for weak features. SIAM Journal on Mathematics of Data Science, 2(4), 1167–1180. https://doi.org/10.1137/20M1336072