1. Abounadi, J., D. Bertsekas, and V. Borkar. (1996) “Learning algorithms for Markov decision processes with average cost.” Technical report, Lab. for Info. and Decision Systems, M.I.T., USA.
2. Bertsekas, D. P. and J. Tsitsiklis. (1996). Neuro-dynamic Programming. Boston, MA, USA: Athena Scientific.
3. Brooks, C., R. Fay, R. Das, J. K. MacKie-Mason, J. Kephart, and E. Durfee. (1999). “Automated strategy searches in an electronic goods market: Learning and complex price schedules.” In: Proceedings of the First ACM Conference on Electronic Commerce (EC-99), 31–40.
4. Carvalho, A. and M. Puttman. (2003). Dynamic Pricing and Reinforcement Learning, URL: gg.nwu.edu/academic/deptprog/meds-dep/OR-Seminars/Puterman.pdf
5. DiMicco, J. M., A. Greenwald, and P. Maes. (2002). “Learning Curve: A Simulation-based Approach to Dynamic Pricing.”