1. Soft actor-critic;haarnoja;Soft Actor-Critic Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor[J],0
2. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor;haarnoja;In Jennifer Dy and Andreas Krause editors Proceedings of the 35th International Conference on Machine Learning volume 80 of Proceedings of Machine Learning Research,2018
3. Deep recurrent q-learning for partially observable mdps;hausknecht;2015 AAAI Fall Symposium Series,2015
4. Learning continuous control policies by stochastic value gradients;heess;arXiv preprint arXiv 1510 09142,2015
5. Bootstrap Estimated Uncertainty of the Environment Model for Model-Based Reinforcement Learning