1. Araya-López V.M. Thomas, Buffet O.: Near-optimal brl using optimistic local transitions. In: ICML’12: Proceedings of the 29th International Coference on Machine Learning, Omnipres, Edinburgh, Scotland, pp 97–104 (2012)
2. Asiain, E., Clempner, J.B., Poznyak, A.S.: Controller exploitation-exploration: A reinforcement learning architecture. Soft Computing 23(11), 3591–3604 (2019)
3. Asmuth J., Li L., Littman M., Nouri A., Wingate D.: A bayesian sampling approach to exploration in reinforcement learning. In: UAI ’09: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, AUAI Press, Montreal, Quebec, Canada, pp 19–26 (2009)
4. Bellman R.: (1961) Adaptive Control Processes: A Guided Tour. Princeton University Press
5. Besson, R., Le Pennec, E.: Allassonnière S,: Learning from both experts and data. Entropy 21(12), 1208 (2019). https://doi.org/10.3390/e21121208