1. Watldns, C.J.C.H.: Learning from delayed rewards. PhD Thesis, University of Cambridge, England (1989)
2. Morimura, T., Sugiyama, M., Kashima,H., Hachiya, H., Tanaka, T.: Nonparametric return distribution approximation for reinforcement learning. In: International Conference on Machine Learning (2010)
3. Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. In: International Conference on Machine Learning, pp. 449–458 (2017)
4. Rowland, M., Bellemare, M.G., Dabney, W., Munos, R., Teh, Y.W. An analysis of categorical distributional reinforcement learning. In: Artificial Intelligence and Statistics (AISTATS) (2018)
5. Morimura, T., Sugiyama, M., Kashima, H., Hachiya, H., Tanaka, T.: Parametric return density estimation for reinforcement learning. In: Conference on Uncertainty in Artificial Intterigence (2010)