1. Yasin Abbasi-yadkori Dávid Pál and Csaba Szepesvári. 2012. Improved algorithms for linear stochastic bandits. In NIPS. Yasin Abbasi-yadkori Dávid Pál and Csaba Szepesvári. 2012. Improved algorithms for linear stochastic bandits. In NIPS.
2. Jacob Abernethy Peter L. Bartlett and Elad Hazan. 2011. Blackwell Approachability and Low-Regret Learning are Equivalent. In COLT. Jacob Abernethy Peter L. Bartlett and Elad Hazan. 2011. Blackwell Approachability and Low-Regret Learning are Equivalent. In COLT.
3. Shipra Agrawal and Nikhil R. Devanur. 2014. Bandits with concave rewards and convex knapsacks. CoRR abs/1402.5758 (2014). Shipra Agrawal and Nikhil R. Devanur. 2014. Bandits with concave rewards and convex knapsacks. CoRR abs/1402.5758 (2014).
4. Shipra Agrawal Zizhuo Wang and Yinyu Ye. 2009. A dynamic near-optimal algorithm for online linear programming. to appear in Operations Research; preprint arXiv:0911.2974 (2009). Shipra Agrawal Zizhuo Wang and Yinyu Ye. 2009. A dynamic near-optimal algorithm for online linear programming. to appear in Operations Research; preprint arXiv:0911.2974 (2009).
5. Peter Auer. 2003. Using Confidence Bounds for Exploitation-exploration Trade-offs. J. Mach. Learn. Res. 3 (March 2003) 397--422. Peter Auer. 2003. Using Confidence Bounds for Exploitation-exploration Trade-offs. J. Mach. Learn. Res. 3 (March 2003) 397--422.