1. Unifying count-based exploration and intrinsic motivation;Bellemare,2016
2. Burda, Y., Edwards, H., Storkey, A., & Klimov, O. (2018, October 30). Exploration by random network distillation. arXiv. 10.48550/arXiv.1810.12894.
3. The multi-armed bandit problem: An efficient nonparametric solution;Chan;Annals of Statistics,2020
4. Choshen, L., Fox, L., & Loewenstein, Y. (2018, April 11). DORA the explorer: Directed outreaching reinforcement action-selection. arXiv. 10.48550/arXiv.1804.04012.
5. SAMBA: A generic framework for secure federated multi-armed bandits;Ciucanu;Journal of Artificial Intelligence Research,2022