1. Mastering the game of Go with deep neural networks and tree search
2. A Markovian decision process;Bellman R;J Math Mech,1957
3. Reinforcement Learning in Continuous State and Action Spaces
4. HausknechtM StoneP.Deep recurrent Q‐learning for partially observable MDPs. In: Proceedings of the AAAI Fall Symposium Series;2015;Arlington VA.
5. SuttonRS McAllesterDA SinghSP MansourY.Policy gradient methods for reinforcement learning with function approximation. In: Proceedings of the Advances in Neural Information Processing Systems;2000;Denver CO.