1. Barto, A., Sutton, R., & Anderson, C. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13, 834–846.
2. Bertsekas D., Tsitsiklis J. (1996). Neuro-dynamic programming. Athena Scientific, Belmont, MA
3. Boddy M., Dean T. (1994). Decision-theoretic deliberation scheduling for problem solving in time-constrained environments. Artificial Intelligence 67(2): 245–286
4. Boutlier, C. (1999). Sequential optimality and coordination in multiagent systems. In Proceedings of the sixteenth international joint conference on artificial intelligence.
5. Crites R., Barto A. (1996). Improving elevator performance using reinforcement learning, Multi-ag In Advances in Neural Information Processing Systems, pages 8: 1017–1023