1. Click shaping to optimize multiple objectives
2. Öykü Zeynep Bayramoug lu, Engin Erzin , Tevfik Metin Sezgin , and Yücel Yemez . 2021 . Engagement Rewarded Actor-Critic with Conservative Q-Learning for Speech-Driven Laughter Backchannel Generation . In Proceedings of the 2021 International Conference on Multimodal Interaction. 613--618 . Öykü Zeynep Bayramoug lu, Engin Erzin, Tevfik Metin Sezgin, and Yücel Yemez. 2021. Engagement Rewarded Actor-Critic with Conservative Q-Learning for Speech-Driven Laughter Backchannel Generation. In Proceedings of the 2021 International Conference on Multimodal Interaction. 613--618.
3. Top-K Off-Policy Correction for a REINFORCE Recommender System
4. Sudeep Dasari , Frederik Ebert , Stephen Tian , Suraj Nair , Bernadette Bucher , Karl Schmeckpeper , Siddharth Singh , Sergey Levine , and Chelsea Finn . 2019 . Robonet: Large-scale multi-robot learning. arXiv preprint arXiv:1910.11215 (2019). Sudeep Dasari, Frederik Ebert, Stephen Tian, Suraj Nair, Bernadette Bucher, Karl Schmeckpeper, Siddharth Singh, Sergey Levine, and Chelsea Finn. 2019. Robonet: Large-scale multi-robot learning. arXiv preprint arXiv:1910.11215 (2019).
5. Justin Fu , Aviral Kumar , Matthew Soh , and Sergey Levine . 2019 . Diagnosing bottlenecks in deep q-learning algorithms . In International Conference on Machine Learning. PMLR , 2021--2030. Justin Fu, Aviral Kumar, Matthew Soh, and Sergey Levine. 2019. Diagnosing bottlenecks in deep q-learning algorithms. In International Conference on Machine Learning. PMLR, 2021--2030.