1. P. Bartlett and J. Baxter. Estimation and approximation bounds for gradient based reinforcement learning. Technical report, Australian National University, 2000.
2. J. Baxter and P. Bartlett. Direct gradient-based reinforcement learning. Technical report, Australian National University, Research School of Information Sciences and Engineering, July 1999.
3. J. Baxter and P. Bartlett. Algorithms for infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 2001. (forthcoming).
4. D.P. Bertsekas. Dynamic Programming and Optimal Control, Volumes 1 and 2. Athena Scientific, 1995.
5. P. Marbach and J. Tsitsiklis. Simulation-based optimization of markov reward processes. Technical report, Massachusetts Institute of Technology, 1998.