1. On optimization methods for deep learning;le;Proceedings of the 28th International Conference on International Conference on Machine Learning,0
2. Gradient-based learning applied to document recognition
3. Deep Residual Learning for Image Recognition
4. Training neural networks without gradients: A scalable admm approach;taylor;International Conference on Machine Learning,0
5. Divide the gradient by a running average of its recent magnitude. coursera: Neural networks for machine learning;tieleman;Technical Report,2017