1. Busoniu, L., Babuska, R., De Schutter, B., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, New York (2010)
2. Bertsekas, D.P., Borkar, V.S., Nedić, A.: Improved temporal difference methods with linear function approximation. In: Si, J., Barto, A., Powell, W., Wunsch, D. (eds.) Learning and Approximate Dynamic Programming. IEEE Press, New York (2004)
3. Boutsidis, C., Drineas, P., Magdon-Ismail, M.: Near-optimal column-based matrix reconstruction. SIAM J. Comput. 43, 687–717 (2014)
4. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011)
5. Bertsekas, D.P., Ioffe, S.: Temporal differences-based policy iteration and applications in neuro-dynamic programming. Laboratory for Information and Decision Systems Report LIDS-P-2349, MIT (1996)