1. Berenji, H. R. (1992). A reinforcement learning-based architecture for fuzzy logic control. International Journal of Approximate Reasoning, 6(2), 267–292.
2. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). Openai gym
3. Celemin, C., & Ruiz-del Solar, J. (2019). An interactive framework for learning continuous actions policies based on corrective feedback. Journal of Intelligent & Robotic Systems, 95(1), 77–97.
4. Cheng, C.A., Yan, X., Wagener, N., & Boots, B. (2018). Fast policy learning through imitation and reinforcement. arXiv preprint arXiv:1805.10413
5. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12, 2493–2537.