1. Achiam, J.: Proximal policy optimization. In: Spinning Up in Deep RL (2018). https://spinningup.openai.com/en/latest/algorithms/ppo.html
2. Achiam, J.: Spinning up in deep RL (2018). https://spinningup.openai.com/en/latest/
3. Achiam, J.: Vanilla policy gradient. In: Spinning Up in Deep RL (2018). https://spinningup.openai.com/en/latest/algorithms/vpg.html
4. Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: Arcade learning environment: an evaluation platform for general agents. J. Artifi. Intell. Res. 47, 253–279 (2013)
5. Brockman, G., et al.: Openai gym. Tech. Rep. 1606, 01540 (2016)