Publisher
Springer Science and Business Media LLC
Reference153 articles.
1. Turing AM (1950) Computing machinery and intelligence. Mind 59:433–460. https://doi.org/10.1093/mind/lix.236.433
2. Tesauro G (1995) Temporal difference learning and td-gammon. Commun ACM 38(3):58–68. https://doi.org/10.1145/203330.203343
3. Kohl N, Stone P (2004) Policy gradient reinforcement learning for fast quadrupedal locomotion. In: IEEE international conference on robotics and automation, 2004. Proceedings. ICRA ’04. 2004, vol 3, pp 2619–26243. https://doi.org/10.1109/ROBOT.2004.1307456
4. Ng AY, Coates A, Diel M, Ganapathi V, Schulte J, Tse B, Berger E, Liang E (2006) Autonomous inverted helicopter flight via reinforcement learning. In: Experimental Robotics IX, Springer, Berlin, Heidelberg, pp 363–372
5. Singh S, Litman D, Kearns M, Walker M (2002) Optimizing dialogue management with reinforcement learning: experiments with the njfun system. J Artif Int Res 16(1):105–133