1. Joshua Achiam David Held Aviv Tamar and Pieter Abbeel. 2017. Constrained policy optimization. arXiv preprint arXiv:1705.10528(2017). Joshua Achiam David Held Aviv Tamar and Pieter Abbeel. 2017. Constrained policy optimization. arXiv preprint arXiv:1705.10528(2017).
2. Rishabh Agarwal Dale Schuurmans and Mohammad Norouzi. 2019. Striving for Simplicity in Off-policy Deep Reinforcement Learning. arXiv preprint arXiv:1907.04543(2019). Rishabh Agarwal Dale Schuurmans and Mohammad Norouzi. 2019. Striving for Simplicity in Off-policy Deep Reinforcement Learning. arXiv preprint arXiv:1907.04543(2019).
3. Top-K Off-Policy Correction for a REINFORCE Recommender System
4. User Response Models to Improve a REINFORCE Recommender System
5. Minmin Chen Ramki Gummadi Chris Harris and Dale Schuurmans. 2019. Surrogate Objectives for Batch Policy Optimization in One-step Decision Making. In Advances in Neural Information Processing Systems. 8825–8835. Minmin Chen Ramki Gummadi Chris Harris and Dale Schuurmans. 2019. Surrogate Objectives for Batch Policy Optimization in One-step Decision Making. In Advances in Neural Information Processing Systems. 8825–8835.