1. J. Baxter and P. L. Bartlett. Infinite-horizon gradient-based policy search. Journal of Arti.cial Intelligence Research, 15:319–350, 2001.
2. D P Bertsekas and J N Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996.
3. R. H. Crites and A. G. Barto. Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 1017–1023. MIT Press, 1996.
4. E Seneta. Non-negative Matrices and Markov Chains. Springer-Verlag, New-York, 1981.
5. S. P. Singh and D. Bertsekas. Reinforcement learning for dynamic channel allocation in cellular telephone systems. In Advances in Neural Information Processing Systems: Proceedings of the 1996 Conference, pages 974–980. MIT Press, 1997.