1. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, MA (2018)
2. Tsitsiklis, J.N., van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control 42, 674–690 (1997). https://doi.org/10.1109/9.580874
3. Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y., et al.: Policy gradient methods for reinforcement learning with function approximation. In: NIPS, pp. 1057–1063 (1999)
4. Kakade, S.: A natural policy gradient. In: Advances in Neural Information Processing Systems (2002)
5. Tampuu, A., et al.: Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE 12, 1–12 (2017). https://doi.org/10.1371/journal.pone.0172395