1. Improved algorithms for linear stochastic bandits;Abbasi-Yadkori;Advances in Neural Information Processing Systems,2011
2. Regret bounds for the adaptive control of linear quadratic systems;Abbasi-Yadkori,2011
3. Improved regret bounds for thompson sampling in linear quadratic control problems;Abeille,2018
4. Mostly exploration-free algorithms for contextual bandits;Bastani,2017
5. Dynamic programming and optimal control, Vol. 1;Bertsekas,1995