1. Baird L.C.(1995). Residual algorithms: Reinforcement learning with function approximation. InProceedings of the Twelfth International Conference on Machine Learning 30–37.
2. Bertsekas D. Borkar V.S. andNedic A.(2004). Improved temporal difference methods with linear function approximation. In:Handbook of Learning and Approximate Dynamic Programming(eds.J.Si A. G.Barto W. B.PowellandD.Wunsch) 233–257.New York:IEEE Press.
3. Multidimensional Stochastic Approximation Methods
4. Linear Least-Squares algorithms for temporal difference learning