1. Abbeel P, Ng AY (2010) Inverse reinforcement learning. Springer US, Boston, MA, pp 554–558. https://doi.org/10.1007/978-0-387-30164-8_417
2. Arakawa R, Kobayashi S, Unno Y, Tsuboi Y, Maeda S (2018) Dqn-tamer: human-in-the-loop reinforcement learning with intractable feedback. http://arxiv.org/1810.11748
3. Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML
4. Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym
5. Christiano P, Leike J, Brown TB, Martic M, Legg S, Amodei D (2017) Deep reinforcement learning from human preferences. In: NIPS