1. Human-level control through deep reinforcement learning
2. Skill discovery for exploration and planning using deep skill graphs;Bagaria
3. Conservative q-learning for offline reinforcement learning;Kumar;Advances in Neural Information Processing Systems,2020
4. Deterministic policy gradient algorithms;Silver
5. Training language models to follow instructions with human feedback;Ouyang;Advances in Neural Information Processing Systems,2022