1. The theory of dynamic programming;Bellman,1954
2. Real applications of markov decision processes;White;Interfaces,1985
3. Markov decision processes: discrete stochastic dynamic programming;Puterman,2014
4. Policy gradient methods for reinforcement learning with function approximation;Sutton,2000
5. Actor-critic algorithms;Konda,2000