1. Yusuf Aytar Tobias Pfaff David Budden Thomas Paine Ziyu Wang and Nando de Freitas. 2018. Playing hard exploration games by watching youtube. In Advances in Neural Information Processing Systems. 2930–2941. Yusuf Aytar Tobias Pfaff David Budden Thomas Paine Ziyu Wang and Nando de Freitas. 2018. Playing hard exploration games by watching youtube. In Advances in Neural Information Processing Systems. 2930–2941.
2. Michael Bain and Claude Sammut. 1995. A Framework for Behavioural Cloning.. In Machine Intelligence 15. 103–129. Michael Bain and Claude Sammut. 1995. A Framework for Behavioural Cloning.. In Machine Intelligence 15. 103–129.
3. Daniel S Bernstein , Robert Givan , Neil Immerman , and Shlomo Zilberstein . 2002. The complexity of decentralized control of Markov decision processes. Mathematics of operations research 27, 4 ( 2002 ), 819–840. Daniel S Bernstein, Robert Givan, Neil Immerman, and Shlomo Zilberstein. 2002. The complexity of decentralized control of Markov decision processes. Mathematics of operations research 27, 4 (2002), 819–840.
4. Jack Clark and Dario Amodei. 2016. Faulty Reward Functions in the Wild. https://openai.com/blog/faulty-reward-functions/. Jack Clark and Dario Amodei. 2016. Faulty Reward Functions in the Wild. https://openai.com/blog/faulty-reward-functions/.
5. Hal Daumé , John Langford , and Daniel Marcu . 2009. Search-based structured prediction. Machine learning 75, 3 ( 2009 ), 297–325. Hal Daumé, John Langford, and Daniel Marcu. 2009. Search-based structured prediction. Machine learning 75, 3 (2009), 297–325.