1. Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). In: ICLR (2016)
2. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: PMLR, vol. 37, pp. 448–456 (2015)
3. Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28, pp. 2017–2025. Curran Associates, Inc., Red Hook (2015)
4. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
5. LeCun, Y., Cortes, C.: The MNIST database of handwritten digits (1998)