1. Cyclical Learning Rates for Training Neural Networks
2. Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence;loizou;International Conference on Artificial Intelligence and Statistics,2021
3. Accurate, large minibatch SGD: Training Imagenet in 1 hour;goyal,2017
4. SGDR: Stochastic gradient descent with warm restarts;loshchilov;Proceedings of the International Conference on Learning Representations,2017
5. Understanding short-horizon bias in stochastic meta-optimization;wu;Proceedings of the International Conference on Learning Representations,2018