1. Underestimation estimators to Q-learning;Abliz;Information Sciences,2022
2. Anschel, O., Baram, N., & Shimkin, N. (2017). Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning. In ICML (pp. 176–185).
3. Tight bounds on expected order statistics;Bertsimas;Probability in the Engineering and Informational Sciences,2006
4. Learning pessimism for reinforcement learning;Cetin,2023
5. Chen, X., Wang, C., Zhou, Z., & Ross, K. W. (2021). Randomized Ensembled Double Q-Learning: Learning Fast Without a Model. In ICLR.