1. Qingpeng Cai, Shuchang Liu, Xueliang Wang, Tianyou Zuo, Wentao Xie, Bin Yang, Dong Zheng, Peng Jiang, and Kun Gai. 2023. Reinforcing User Retention in a Billion Scale Short Video Recommender System. arXiv preprint arXiv:2302.01724 (2023).
2. Top-K Off-Policy Correction for a REINFORCE Recommender System
3. Scott Fujimoto, Herke Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. In International conference on machine learning. PMLR, 1587--1596.
4. KuaiRec
5. Show me the Cache: Optimizing Cache-Friendly Recommendations for Sequential Content Access