Value Penalized Q-Learning for Recommender Systems-Reference-Cited by-同舟云学术

Value Penalized Q-Learning for Recommender Systems

Published:2022-07-06 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
language:
Short-container-title:

Author:

Gao Chengqian¹,Xu Ke²,Zhou Kuangqi³,Li Lanqing²,Wang Xueqian¹,Yuan Bo¹,Zhao Peilin⁴

Affiliation:

1. Tsinghua University, Shenzhen, China

2. Tencent AI Lab, Shenzhen, China

3. National University of Singapore, Singapore, Singapore

4. Tencent AI Lab, Prefix, China

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3477495.3531796

Reference31 articles.

1. M Mehdi Afsar Trafford Crump and Behrouz Far. 2021. Reinforcement learning based recommender systems: A survey. arXiv:2101.06286 M Mehdi Afsar Trafford Crump and Behrouz Far. 2021. Reinforcement learning based recommender systems: A survey. arXiv:2101.06286

2. Rishabh Agarwal , Dale Schuurmans , and Mohammad Norouzi . 2020 . An Optimistic Perspective on Offline Reinforcement Learning. In ICML 2020 , 13--18 July 2020, Virtual Event (Proceedings of Machine Learning Research , Vol. 119). PMLR, 104-- 114 . http://proceedings.mlr.press/v119/agarwal20c.html Rishabh Agarwal, Dale Schuurmans, and Mohammad Norouzi. 2020. An Optimistic Perspective on Offline Reinforcement Learning. In ICML 2020, 13--18 July 2020, Virtual Event (Proceedings of Machine Learning Research, Vol. 119). PMLR, 104--114. http://proceedings.mlr.press/v119/agarwal20c.html

3. Gunnar Blom . 1958. Statistical Estimates and Transformed Beta Variables. Almqvist & Wiksell , John Wiley & Sons, Inc. , Sweden . Gunnar Blom. 1958. Statistical Estimates and Transformed Beta Variables. Almqvist & Wiksell, John Wiley & Sons, Inc., Sweden.

4. Jacob Buckman , Carles Gelada , and Marc G . Bellemare . 2020 . The Importance of Pessimism in Fixed-Dataset Policy Optimization . arXiv:2009.06799 https: //arxiv.org/abs/2009.06799 Jacob Buckman, Carles Gelada, and Marc G. Bellemare. 2020. The Importance of Pessimism in Fixed-Dataset Policy Optimization. arXiv:2009.06799 https: //arxiv.org/abs/2009.06799

5. Justin Fu Aviral Kumar Ofir Nachum George Tucker and Sergey Levine. 2020. D4RL: Datasets for Deep Data-Driven Reinforcement Learning. arXiv:2004.07219 https://arxiv.org/abs/2004.07219 Justin Fu Aviral Kumar Ofir Nachum George Tucker and Sergey Levine. 2020. D4RL: Datasets for Deep Data-Driven Reinforcement Learning. arXiv:2004.07219 https://arxiv.org/abs/2004.07219

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. UET4Rec: U-net encapsulated transformer for sequential recommender;Expert Systems with Applications;2024-12

2. Rethinking Offline Reinforcement Learning for Sequential Recommendation from A Pair-Wise Q-Learning Perspective;2024 International Joint Conference on Neural Networks (IJCNN);2024-06-30

3. A reinforcement learning recommender system using bi-clustering and Markov Decision Process;Expert Systems with Applications;2024-03

4. Click is not equal to purchase: multi-task reinforcement learning for multi-behavior recommendation;World Wide Web;2023-11

5. Counterfactual Adversarial Learning for Recommendation;Proceedings of the 32nd ACM International Conference on Information and Knowledge Management;2023-10-21