1. Abbas Abdolmaleki Jost Tobias Springenberg Yuval Tassa Remi Munos Nicolas Heess and Martin Riedmiller. 2018. Maximum a posteriori policy optimisation. arXiv preprint arXiv:1806.06920(2018). Abbas Abdolmaleki Jost Tobias Springenberg Yuval Tassa Remi Munos Nicolas Heess and Martin Riedmiller. 2018. Maximum a posteriori policy optimisation. arXiv preprint arXiv:1806.06920(2018).
2. Gabriel Barth-Maron Matthew W Hoffman David Budden Will Dabney Dan Horgan Dhruva Tb Alistair Muldal Nicolas Heess and Timothy Lillicrap. 2018. Distributed distributional deterministic policy gradients. arXiv preprint arXiv:1804.08617(2018). Gabriel Barth-Maron Matthew W Hoffman David Budden Will Dabney Dan Horgan Dhruva Tb Alistair Muldal Nicolas Heess and Timothy Lillicrap. 2018. Distributed distributional deterministic policy gradients. arXiv preprint arXiv:1804.08617(2018).
3. Marc G Bellemare , Will Dabney , and Rémi Munos . 2017 . A distributional perspective on reinforcement learning . In International Conference on Machine Learning. PMLR, 449–458 . Marc G Bellemare, Will Dabney, and Rémi Munos. 2017. A distributional perspective on reinforcement learning. In International Conference on Machine Learning. PMLR, 449–458.
4. Craig J Bester Steven D James and George D Konidaris. 2019. Multi-pass q-networks for deep reinforcement learning with parameterised action spaces. arXiv preprint arXiv:1905.04388(2019). Craig J Bester Steven D James and George D Konidaris. 2019. Multi-pass q-networks for deep reinforcement learning with parameterised action spaces. arXiv preprint arXiv:1905.04388(2019).