1. Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mane, D. (2016). Concrete problems in AI Safety. arXiv: 1606.06565.
2. Armstrong, S., & O’Rourke, X. (2017). ‘Indifference’ methods for managing agent rewards. arXiv: 1712.06365.
3. Armstrong, S., Orseau, L., Leike, J., & Legg, S. (2020). Pitfalls in learning a reward function online. In IJCAI. arXiv: 2004.13654.
4. Balke, A., & Pearl, J. (1994). Probabilistic evaluation of counterfactual queries. In AAAI (pp. 230–237).
5. Bostrom, N. (2014). Superintelligence: Paths, dangers, strategies. Oxford University Press.