1. Neuronlike adaptive elements that can solve difficult learning control problems;Barto;IEEE Transactions on Systems, Man and Cybernetics,1983
2. Infinite-horizon policy gradient estimation;Baxter;Journal of Artificial Intelligence Research,2001
3. Experiments with infinite-horizon policy-gradient estimation;Baxter;Journal of Artificial Intelligence Research,2001
4. Bernstein, D.S., Hansen, E.A., Zilberstein, S., 2005. Bounded policy iteration for decentralized pomdps. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence, pp. 1287–1292.
5. Neuro-Dynamic Programming;Bertsekas,1996