1. Stochastic optimal control (the discrete time case);Bertsekas,1978
2. Markov decision processes — discrete stochastic dynamic programming;Puterman,1994
3. Dynamic programming and optimal control, volume 1;Bertsekas,2007
4. Dynamic programming and optimal control, volume 2;Bertsekas,2007
5. Algorithms for reinforcement learning;Szepesvári,2009