1. An, W., Wang, H., Sun, Q., et al., A PID Controller Approach for Stochastic Optimization of Deep Networks, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018, 8522–8531.
2. Becker, S. and Lecun, Y., Improving the Convergence of Back-propagation Learning with Second-order Methods, Proceedings of the 1988 Connectionist Models Summer School, San Mateo, 1988, 29–37.
3. Bertsekas, D. P. and Tsitsiklis, J. N., Neuro-Dynamic Programming, Athena Scientific, Belmont, MA, 1996.
4. Bertsekas, D. P. and Tsitsiklis, J. N., Gradient convergence in gradient methods with errors, SIAM J. Control Optim., 10(3), 2000, 627–642.
5. Bottou, L., Curtis, F. E. and Nocedal, J., Optimization methods for large-scale machine learning, SIAM Rev., 60(2), 2018, 223–311.