1. Advani, M. S., & Saxe, A. M. (2017). High-dimensional dynamics of generalization error in neural networks. arXiv preprint arXiv:1710.03667.
2. Ba, J. L., Kiros, J. R., & Hinton, G. E. (2016). Layer normalization. arXiv preprint arXiv:1607.06450.
3. Bartlett, P. L., Maiorov, V., & Meir, R. (1999). Almost linear vc dimension bounds for piecewise polynomial networks. In Advances in neural information processing systems (pp. 190–196).
4. Bentley, J. L., & McIlroy, M. D. (1993). Engineering a sort function. Software: Practice and Experience, 23, 1249–1265.
5. Bertsekas, D. P. (2014). Constrained optimization and Lagrange multiplier methods. New York: Academic Press.