1. [1] L. Bottou, “Large-scale machine learning with stochastic gradient descent,” Proc. COMPSTAT '10, pp. 177-186, August 2010.
2. [2] D.P. Kingma and J.L. Ba, “Adam: A method for stochastic optimization,” arXiv preprent, arXiv: 1412.6980, 2014.
3. [3] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” arXiv preprent, arXiv:1709. 01507, 2017.
4. [4] J.-Y. Zhu, T. Park, P. Isola, and A.A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” arXiv preprent, arXiv:1703.10593, 2017.
5. [5] A. van den Oord, Y. Li, I. Babuschkin, K. Simonyan, O. Vinyals, K. Kavukcuoglu, G. van den Driessche, E. Lockhart, L.C. Cobo, F. Stimberg, N. Casagrande, D. Grewe, S. Noury, S. Dieleman, E. Elsen, N. Kalchbrenner, H. Zen, A. Graves, H. King, T. Walters, D. Belov, and D. Hassabis, “Parallel WaveNet: Fast high-fidelity speech synthesis,” arXiv preprent, arXiv:1711.10433, 2017.