1. End-to-end training of deep visuomotor policies;Levine;The Journal of Machine Learning Research,2016
2. Mastering the game of Go without human knowledge
3. Minimax regret bounds for reinforcement learning;Azar,2017
4. Naive exploration is optimal for online lqr;Simchowitz,2020
5. Regret Bounds for the Adaptive Control of Linear Quadratic Systems;Abbasi-Yadkori,2011