1. Howard, R.A. (1960). Dynamic programming and Markov processes.
2. Peng, X.B., Coumans, E., Zhang, T., Lee, T.W., Tan, J., Levine, S. Learning agile robotic locomotion skills by imitating animals. arXiv preprint (arXiv: 2004.00784) (2020).
3. Kiran, B.R., Sobh, I., Talpaert, V., Mannion, P., Al Sallab, A.A., Yogamani, S., Pérez, P.: Deep reinforcement learning for autonomous driving: A survey. IEEE Trans. Intell. Transp. Syst. (2021).
4. Zhi-Hua, Z.: AlphaGo special session: an introduction. Acta Automatica Sinica 42(5), 670 (2016)
5. Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)