1. Jacob Buckman, Carles Gelada, and Marc G Bellemare. The importance of pessimism in ?xed-dataset policy optimization. In International Conference on Learning Representations, 2021.
2. Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597--1607. PMLR, 2020.
3. Doubly Robust Policy Evaluation and Optimization
4. Miroslav Dudík, John Langford, and Lihong Li. Doubly robust policy evaluation and learning. In Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML'11, 2011.
5. Mehrdad Farajtabar, Yinlam Chow, and Mohammad Ghavamzadeh. More robust doubly robust off-policy evaluation. In International Conference on Machine Learning, pages 1447--1456. PMLR, 2018.