1. No more pesky learning rate guessing games;Smith;arXiv preprint arXiv:1506.01186,2015
2. Cyclical Learning Rates for Training Neural Networks
3. Goodfellow, I., Bengio, Y., and Courville, A., [Deep learning], MIT Press (2016). GPU memory limitations prevented our testing a total batch size greater than 1,530
4. Stochastic gradient descent tricks;Bottou,2012
5. Entropy-sgd: Biasing gradient descent into wide valleys;Chaudhari;arXiv preprint arXiv:1611,2016