1. An, W., Wang, H., Zhang, Y., Dai, Q.: Exponential decay sine wave learning rate for fast deep neural network training. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017)
2. Bengio, Y.: Practical recommendations for gradient-based training of deep architectures. Arxiv (2012)
3. Dauphin, Y., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., Bengio, Y.: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In: NIPS, vol. 27 (2014)
4. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(7), 2121–2159 (2011)
5. Feng, Y., Li, Y.: An overview of deep learning optimization methods and learning rate attenuation methods. Hans J. Data Mining 8(3), 186–200 (2018)