1. Raman Arora , Ofer Dekel , and Ambuj Tewari . 2012 . Online bandit learning against an adaptive adversary: from regret to policy regret . In Proceedings of 29th International Conference on Machine Learning. 1747--1754 . Raman Arora, Ofer Dekel, and Ambuj Tewari. 2012. Online bandit learning against an adaptive adversary: from regret to policy regret. In Proceedings of 29th International Conference on Machine Learning. 1747--1754.
2. Raman Arora , Teodor Vanislavov Marinov, and Mehryar Mohri . 2019 . Bandits with feedback graphs and switching costs. In Advances in Neural Information Processing Systems . 10397--10407. Raman Arora, Teodor Vanislavov Marinov, and Mehryar Mohri. 2019. Bandits with feedback graphs and switching costs. In Advances in Neural Information Processing Systems. 10397--10407.
3. SIAM journal on computing 32, 1;Auer Peter,2002
4. Dirk Bergemann and Juuso Välimäki . 2006. Bandit problems . Cowles Foundation discussion paper ( 2006 ). Dirk Bergemann and Juuso Välimäki. 2006. Bandit problems. Cowles Foundation discussion paper (2006).
5. Avrim Blum and Yishay Monsour . 2007. Learning, regret minimization, and equilibria. Algorithmic Game Theory ( 2007 ). Avrim Blum and Yishay Monsour. 2007. Learning, regret minimization, and equilibria. Algorithmic Game Theory (2007).