1. Temperature check: theory and practice for training models with softmax-cross-entropy losses;Agarwala;Transactions on Machine Learning Research,2023
2. A PID controller approach for stochastic optimization of deep networks;An,2018
3. Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. In International conference on machine learning (pp. 41–48).
4. Manifold Gaussian processes for regression;Calandra,2016
5. Accelerated methods for nonconvex optimization;Carmon;SIAM Journal on Optimization,2018