Value Penalized Q-Learning for Recommender Systems

Author:

Gao Chengqian1,Xu Ke2,Zhou Kuangqi3,Li Lanqing2,Wang Xueqian1,Yuan Bo1,Zhao Peilin4

Affiliation:

1. Tsinghua University, Shenzhen, China

2. Tencent AI Lab, Shenzhen, China

3. National University of Singapore, Singapore, Singapore

4. Tencent AI Lab, Prefix, China

Publisher

ACM

Reference31 articles.

1. M Mehdi Afsar Trafford Crump and Behrouz Far. 2021. Reinforcement learning based recommender systems: A survey. arXiv:2101.06286 M Mehdi Afsar Trafford Crump and Behrouz Far. 2021. Reinforcement learning based recommender systems: A survey. arXiv:2101.06286

2. Rishabh Agarwal , Dale Schuurmans , and Mohammad Norouzi . 2020 . An Optimistic Perspective on Offline Reinforcement Learning. In ICML 2020 , 13--18 July 2020, Virtual Event (Proceedings of Machine Learning Research , Vol. 119). PMLR, 104-- 114 . http://proceedings.mlr.press/v119/agarwal20c.html Rishabh Agarwal, Dale Schuurmans, and Mohammad Norouzi. 2020. An Optimistic Perspective on Offline Reinforcement Learning. In ICML 2020, 13--18 July 2020, Virtual Event (Proceedings of Machine Learning Research, Vol. 119). PMLR, 104--114. http://proceedings.mlr.press/v119/agarwal20c.html

3. Gunnar Blom . 1958. Statistical Estimates and Transformed Beta Variables. Almqvist & Wiksell , John Wiley & Sons, Inc. , Sweden . Gunnar Blom. 1958. Statistical Estimates and Transformed Beta Variables. Almqvist & Wiksell, John Wiley & Sons, Inc., Sweden.

4. Jacob Buckman , Carles Gelada , and Marc G . Bellemare . 2020 . The Importance of Pessimism in Fixed-Dataset Policy Optimization . arXiv:2009.06799 https: //arxiv.org/abs/2009.06799 Jacob Buckman, Carles Gelada, and Marc G. Bellemare. 2020. The Importance of Pessimism in Fixed-Dataset Policy Optimization. arXiv:2009.06799 https: //arxiv.org/abs/2009.06799

5. Justin Fu Aviral Kumar Ofir Nachum George Tucker and Sergey Levine. 2020. D4RL: Datasets for Deep Data-Driven Reinforcement Learning. arXiv:2004.07219 https://arxiv.org/abs/2004.07219 Justin Fu Aviral Kumar Ofir Nachum George Tucker and Sergey Levine. 2020. D4RL: Datasets for Deep Data-Driven Reinforcement Learning. arXiv:2004.07219 https://arxiv.org/abs/2004.07219

Cited by 9 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. UET4Rec: U-net encapsulated transformer for sequential recommender;Expert Systems with Applications;2024-12

2. Rethinking Offline Reinforcement Learning for Sequential Recommendation from A Pair-Wise Q-Learning Perspective;2024 International Joint Conference on Neural Networks (IJCNN);2024-06-30

3. A reinforcement learning recommender system using bi-clustering and Markov Decision Process;Expert Systems with Applications;2024-03

4. Click is not equal to purchase: multi-task reinforcement learning for multi-behavior recommendation;World Wide Web;2023-11

5. Counterfactual Adversarial Learning for Recommendation;Proceedings of the 32nd ACM International Conference on Information and Knowledge Management;2023-10-21

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3