1. Abbasi-yadkori, Y., Pál, D., & Szepesvári, C. (2011). Improved algorithms for linear stochastic bandits. InAdvances in Neural Information Processing Systems 24 (pp. 2312–2320)
2. Agrawal, S. (2019). Recent advances in multiarmed bandits for sequential decision making. INFORMS TutORials in Operations Research, 167–168
3. Agrawal, S., Avadhanula, V., Goyal, V., & Zeevi, A. (2016). A near-optimal exploration-exploitation approach for assortment selection. In Proceedings of the 2016 ACM Conference on Economics and Computation (EC).
4. Agrawal, S., Avadhanula, V., Goyal, V., & Zeevi, A. (2017). Thompson sampling for the MNL-Bandit. In Proceedings of the 30th Annual Conference on Learning Theory (COLT).
5. Agrawal, S., & Goyal, N. (2012a). Analysis of Thompson sampling for the multi-armed bandit problem. In Proceedings of the 25th Annual Conference on Learning Theory (COLT).