1. Optimality and approximation with policy gradient methods in markov decision processes;Alekh Agarwal;Conference on Learning Theory,2020
2. Understanding the impact of entropy on policy optimization;Zafarali Ahmed;International conference on machine learning,2019
3. Logarithmic regret for episodic continuoustime linear-quadratic reinforcement learning over a finite-time horizon;Matteo Basei;The Journal of Machine Learning Research,2022
4. A markovian decision process;Richard Bellman;Journal of mathematics and mechanics,1957