1. Shipra Agrawal, Navin Goyal, Thompson sampling for contextual bandits with linear payoffs, in: Proceedings of the 30th International Conference on Machine Learning, Atlanta, Georgia, USA, 17–19 Jun 2013, in: Proceedings of Machine Learning Research, vol. 28, PMLR pp. 127–135.
2. A gang of bandits;Cesa-Bianchi,2013
3. Leveraging long short-term user preference in conversational recommendation via multi-agent reinforcement learning;Deng;IEEE Trans. Knowl. Data Eng.,2022
4. Bandit algorithms in information retrieval;Glowacka;Found. Trends® Inf. Retr.,2019
5. Online learning to rank for information retrieval: sigir 2016 tutorial;Grotov,2016