1. Agarwal A, Hsu D, Kale S, Langford J, Li L, Schapire R (2014) Taming the monster: A fast and simple algorithm for contextual bandits.31st Internat. Conf. Machine Learn. (ICML), 1638–1646.
2. Alon N, Cesa-Bianchi N, Dekel O, Koren T (2015) Online learning with feedback graphs: Beyond bandits.28th Conf. Learn Theory (COLT), 23–35.
3. Alon N, Cesa-Bianchi N, Gentile C, Mansour Y (2013) From bandits to experts: A tale of domination and independence.Advances in Neural Information Processing Systems (NIPS), vol. 27 (Curran Associates, Red Hook, NY), 1610–1618.
4. Audibert J-Y, Bubeck S, Lugosi G (2011) Minimax policies for combinatorial prediction games.24th Conf. Learn. Theory (COLT), 107–132.