1. Oron Anschel, Nir Baram, and Nahum Shimkin. 2017. Averaged-dqn: Variance reduction and stabilization for deep reinforcement learning. In International conference on machine learning. PMLR, 176–185.
2. Neuro-dynamic programming: an overview
3. James L Carroll and Kevin Seppi. 2005. Task similarity measures for transfer in reinforcement learning task libraries. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., Vol. 2. IEEE, 803–808.
4. Edoardo Cetin and Oya Celiktutan. 2023. Learning Pessimism for Reinforcement Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 6971–6979.
5. Xinyue Chen, Che Wang, Zijian Zhou, and Keith W. Ross. 2021. Randomized Ensembled Double Q-Learning: Learning Fast Without a Model. In International Conference on Learning Representations. https://openreview.net/forum?id=AY8zfZm0tDd