1. Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
2. Jiawei Chen, Hande Dong, Xiang Wang, Fuli Feng, Meng Wang, and Xiangnan He. 2020. Bias and Debias in Recommender System: A Survey and Future Directions. CoRR, Vol. abs/2010.03240 (2020).
3. Jiawei Chen Junkang Wu Jiancan Wu Xuezhi Cao Sheng Zhou and Xiangnan He. 2023. Adap-(τ): Adaptively Modulating Embedding Magnitude for Recommendation. In WWW. ACM 1085--1096.
4. Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed H. Chi. 2019. Top-K Off-Policy Correction for a REINFORCE Recommender System. In WSDM. ACM, 456--464.
5. Gabriel Dulac-Arnold, Richard Evans, Peter Sunehag, and Ben Coppin. 2015. Reinforcement Learning in Large Discrete Action Spaces. CoRR, Vol. abs/1512.07679 (2015).