1. Experts in a markov decision process;Even-Dar;Advances in neural information processing systems,2004
2. Sample-Efficient Reinforcement Learning via Conservative Model-Based Actor-Critic
3. Policy gradient based quantum approximate optimization algorithm;Yao
4. Generative adversarial imitation learning;Ho;Advances in neural information processing systems,2016
5. Proximal policy optimization algorithms;Schulman,2017