1. Abdallah, S., & Kaisers, M. (2016). Addressing environment non-stationarity by repeating Q-learning updates. The Journal of Machine Learning Research, 17(1), 1582–1612.
2. Agrawal, R. (1995). Sample mean based index policies with O (log n) regret for the multi-armed bandit problem. Advances in Applied Probability, 27, 1054–1078.
3. Agrawal, S., & Goyal, N. (2012). Analysis of Thompson sampling for the multi-armed bandit problem. In COLT (pp. 39–1).
4. Akakpo, N. (2008). Detecting change-points in a discrete distribution via model selection. arXiv preprint
arXiv:0801.0970
.
5. Allesiardo, R., & Féraud, R. (2015). Exp3 with drift detection for the switching bandit problem. In IEEE international conference on data science and advanced analytics (DSAA) (pp. 1–7). IEEE.