1. Antos, A., Szepesvári, C., & Munos, R. (2008). Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, 71(1), 89–129.
2. Bach, F., & Moulines, E. (2011). Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In Advances in neural information processing systems (pp. 451–459).
3. Bach, F., & Moulines, E. (2013). Non-strongly-convex smooth stochastic approximation with convergence rate o (1/n). In Advances in neural information processing systems (pp. 773–781).
4. Bertsekas, D. P. (2012). Dynamic Programming and Optimal Control, Approximate Dynamic Programming, (4th ed., Vol. II). Belmont: Athena Scientific.
5. Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming (Optimization and Neural Computation Series, 3), (Vol. 7). Belmont: Athena Scientific.