1. Ba, J. L., Kiros, J. R., & Hinton, G. E. (2016). Layer normalization. arXiv preprint arXiv:160706450 .
2. Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3(Feb), 1137–1155.
3. Bennett, J., & Lanning, S. (2007). The netflix prize. In SIGKDD Cup (Vol. 2007, p. 35).
4. Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. In COMPSTAT (pp. 177–186). Berlin: Springer.
5. Chen, T., Zhang, W., Lu, Q., Chen, K., Zheng, Z., & Yu, Y. (2012). Svdfeature: A toolkit for feature-based collaborative filtering. Journal of Machine Learning Research, 13(1), 3619–3622.