1. Asadi, K., Allen, C., Roderick, M., Mohamed, A.-R., Konidaris, G., & Littman, M. (2017). Mean actor critic. arXiv preprint arXiv:1709.00503v1.
2. Asis, K. D., Hernandez-Garcia, J. F., Holland, G. Z., & Sutton, R. S. (2018). Multi-step reinforcement learning: A unifying algorithm. In Proceedings of the 32nd AAAI conference on artificial intelligence (AAAI).
3. Austin, P. C. (2011). An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research, 46(3), 399–424.
4. Bellman, R. (1966). Dynamic programming. Science, 153(3731), 34–37.
5. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). OpenAI gym. arXiv preprint arXiv:1606.01540.