1. Online learning in episodic markovian decision processes by relative entropy policy search;zimin;Advances in neural information processing systems,2013
2. Minimax regret of switching-constrained online convex optimization: No phase transition;chen;Advances in neural information processing systems,2020
3. Minimax regret bounds for reinforcement learning;azar;Proceedings of the 34th International Conference on Machine Learning - Volume 70 ser ICML'17,0
4. Nearly minimax optimal reinforcement learning for discounted mdps;he;Advances in neural information processing systems,2021
5. Phase Transitions and Cyclic Phenomena in Bandits with Switching Constraints