1. Continuous control with deep reinforcement learning;Lillicrap;arXiv:1509.02971,2019
2. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor;Haarnoja;arXiv:1801.01290,2018
3. Proximal policy optimization algorithms;Schulman;arXiv:1707.06347,2017
4. Model-based Reinforcement Learning: A Survey
5. MDP homomorphic networks: Group symmetries in reinforcement learning;van der Pol