1. Mohammad Gheshlaghi Azar , Ian Osband , and Rémi Munos . 2017 . Minimax Regret Bounds for Reinforcement Learning . In ICML (Proc. of Machine Learning Research , Vol. 70). 263-- 272 . Mohammad Gheshlaghi Azar, Ian Osband, and Rémi Munos. 2017. Minimax Regret Bounds for Reinforcement Learning. In ICML (Proc. of Machine Learning Research, Vol. 70). 263--272.
2. A finite-horizon Markov decision process model for cancer chemotherapy treatment planning: an application to sequential treatment decision making in clinical trials
3. A Markovian Decision Process;Bellman Richard E.;Journal of Mathematics and Mechanics,1957
4. Dimitri P. Bertsekas . 2017. Dynamic programming and optimal control ( 4 th ed.). Vol. 1 . Athena Scientific . Dimitri P. Bertsekas. 2017. Dynamic programming and optimal control (4th ed.). Vol. 1. Athena Scientific.
5. Hippolyte Bourel , Odalric Maillard , and Mohammad Sadegh Talebi . 2020 . Tightening Exploration in Upper Confidence Reinforcement Learning . In ICML (Proc. of Machine Learning Research , Vol. 119). 1056-- 1066 . Hippolyte Bourel, Odalric Maillard, and Mohammad Sadegh Talebi. 2020. Tightening Exploration in Upper Confidence Reinforcement Learning. In ICML (Proc. of Machine Learning Research, Vol. 119). 1056--1066.