1. Abdoos, M., Mozayani, N., & Bazzan, A. L. C. (2011). Traffic light control in non-stationary environments based on multi agent Q-learning. In Proceedings of the 14th International IEEE Conference on Intelligent Transportation Systems (ITSC) (pp. 1580-1585).
2. Achiam, J., Held, D., Tamar, A., & Abbeel, P. (2017). Constrained policy optimization. In Proceedings of the 34th International Conference on Machine Learning (ICML'17) (pp. 22–31).
3. Cyber-security and reinforcement learning — A brief survey;Adawadkar;Engineering Applications of Artificial Intelligence,2022
4. Reinforcement learning based recommender systems: A survey;Afsar;ACM Computing Surveys,2022
5. Agrawal, S. & Jia, R. (2017). Optimistic posterior sampling for reinforcement learning: worst-case regret bounds. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017) (pp. 1184-1194).