On the Effectiveness of Sampled Softmax Loss for Item Recommendation-Reference-Cited by-同舟云学术

On the Effectiveness of Sampled Softmax Loss for Item Recommendation

Published:2023-12-13 Issue: Volume: Page:
ISSN:1046-8188
Container-title:ACM Transactions on Information Systems
language:en
Short-container-title:ACM Trans. Inf. Syst.

Author:

Wu Jiancan¹,Wang Xiang¹,Gao Xingyu²,Chen Jiawei³,Fu Hongcheng⁴,Qiu Tianyu⁴,He Xiangnan¹

Affiliation:

1. MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, China

2. Institute of Microelectronics of the Chinese Academy of Sciences, China

3. Zhejiang University, China

4. Tencent Music Entertainment Group, China

Abstract

The learning objective plays a fundamental role to build a recommender system. Most methods routinely adopt either pointwise ( e.g., binary cross-entropy) or pairwise ( e.g., BPR) loss to train the model parameters, while rarely pay attention to softmax loss, which assumes the probabilities of all classes sum up to 1, due to its computational complexity when scaling up to large datasets or intractability for streaming data where the complete item space is not always available. The sampled softmax (SSM) loss emerges as an efficient substitute for softmax loss. Its special case, InfoNCE loss, has been widely used in self-supervised learning and exhibited remarkable performance for contrastive learning. Nonetheless, limited recommendation work uses the SSM loss as the learning objective. Worse still, none of them explores its properties thoroughly and answers “Does SSM loss suit for item recommendation?” and “What are the conceptual advantages of SSM loss, as compared with the prevalent losses?”, to the best of our knowledge. In this work, we aim to offer a better understanding of SSM for item recommendation. Specifically, we first theoretically reveal three model-agnostic advantages: (1) mitigating popularity bias, which is beneficial to long-tail recommendation; (2) mining hard negative samples, which offers informative gradients to optimize model parameters; and (3) maximizing the ranking metric, which facilitates top- K performance. However, based on our empirical studies, we recognize that the default choice of cosine similarity function in SSM limits its ability in learning the magnitudes of representation vectors. As such, the combinations of SSM with the models that also fall short in adjusting magnitudes ( e.g., matrix factorization) may result in poor representations. One step further, we provide mathematical proof that message passing schemes in graph convolution networks can adjust representation magnitude according to node degree, which naturally compensates for the shortcoming of SSM. Extensive experiments on four benchmark datasets justify our analyses, demonstrating the superiority of SSM for item recommendation. Our implementations are available in both TensorFlow and PyTorch.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/3637061

Reference64 articles.

1. Yu Bai , Sally Goldman , and Li Zhang . 2017 . TAPAS: Two-pass Approximate Adaptive Sampling for Softmax. CoRR abs/1707.03073(2017). Yu Bai, Sally Goldman, and Li Zhang. 2017. TAPAS: Two-pass Approximate Adaptive Sampling for Softmax. CoRR abs/1707.03073(2017).

2. Yoshua Bengio and Jean-Sébastien Senecal. 2003. Quick Training of Probabilistic Neural Nets by Importance Sampling. In AISTATS. Yoshua Bengio and Jean-Sébastien Senecal. 2003. Quick Training of Probabilistic Neural Nets by Importance Sampling. In AISTATS.

3. Guy Blanc and Steffen Rendle. 2018. Adaptive Sampled Softmax with Kernel Based Sampling. In ICML. 589–598. Guy Blanc and Steffen Rendle. 2018. Adaptive Sampled Softmax with Kernel Based Sampling. In ICML. 589–598.

4. Sebastian Bruch Xuanhui Wang Michael Bendersky and Marc Najork. 2019. An Analysis of the Softmax Cross Entropy Loss for Learning-to-Rank with Binary Relevance. In ICTIR. 75–78. Sebastian Bruch Xuanhui Wang Michael Bendersky and Marc Najork. 2019. An Analysis of the Softmax Cross Entropy Loss for Learning-to-Rank with Binary Relevance. In ICTIR. 75–78.

5. Zhe Cao Tao Qin Tie-Yan Liu Ming-Feng Tsai and Hang Li. 2007. Learning to rank: from pairwise approach to listwise approach. In ICML. 129–136. Zhe Cao Tao Qin Tie-Yan Liu Ming-Feng Tsai and Hang Li. 2007. Learning to rank: from pairwise approach to listwise approach. In ICML. 129–136.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Preference Contrastive Learning for Personalized Recommendation;Pattern Recognition and Computer Vision;2023-12-26

2. How graph convolutions amplify popularity bias for recommendation?;Frontiers of Computer Science;2023-12-23

3. Debiased Contrastive Loss for Collaborative Filtering;Knowledge Science, Engineering and Management;2023