1. Double reinforcement learning for efficient off-policy evaluation in markov decision processes;kallus;J Mach Learn Res,2020
2. Data-efficient off-policy policy evaluation for reinforcement learning;thomas;ArXiv vol abs/1604 00923,2016
3. Empiri-cal study of off-policy policy evaluation for reinforce-ment learning;voloshin;ArXiv vol abs/1911 06854,2021
4. Benchmarks for deep off-policy evaluation;fu;ArXiv vol abs/2103 16596,2021
5. Off-policy temporal difference learning with function approximation;precup;ICML,2001