On the Effectiveness of Sampled Softmax Loss for Item Recommendation

Author:

Wu Jiancan1,Wang Xiang1,Gao Xingyu2,Chen Jiawei3,Fu Hongcheng4,Qiu Tianyu4,He Xiangnan1

Affiliation:

1. MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, China

2. Institute of Microelectronics of the Chinese Academy of Sciences, China

3. Zhejiang University, China

4. Tencent Music Entertainment Group, China

Abstract

The learning objective plays a fundamental role to build a recommender system. Most methods routinely adopt either pointwise ( e.g., binary cross-entropy) or pairwise ( e.g., BPR) loss to train the model parameters, while rarely pay attention to softmax loss, which assumes the probabilities of all classes sum up to 1, due to its computational complexity when scaling up to large datasets or intractability for streaming data where the complete item space is not always available. The sampled softmax (SSM) loss emerges as an efficient substitute for softmax loss. Its special case, InfoNCE loss, has been widely used in self-supervised learning and exhibited remarkable performance for contrastive learning. Nonetheless, limited recommendation work uses the SSM loss as the learning objective. Worse still, none of them explores its properties thoroughly and answers “Does SSM loss suit for item recommendation?” and “What are the conceptual advantages of SSM loss, as compared with the prevalent losses?”, to the best of our knowledge. In this work, we aim to offer a better understanding of SSM for item recommendation. Specifically, we first theoretically reveal three model-agnostic advantages: (1) mitigating popularity bias, which is beneficial to long-tail recommendation; (2) mining hard negative samples, which offers informative gradients to optimize model parameters; and (3) maximizing the ranking metric, which facilitates top- K performance. However, based on our empirical studies, we recognize that the default choice of cosine similarity function in SSM limits its ability in learning the magnitudes of representation vectors. As such, the combinations of SSM with the models that also fall short in adjusting magnitudes ( e.g., matrix factorization) may result in poor representations. One step further, we provide mathematical proof that message passing schemes in graph convolution networks can adjust representation magnitude according to node degree, which naturally compensates for the shortcoming of SSM. Extensive experiments on four benchmark datasets justify our analyses, demonstrating the superiority of SSM for item recommendation. Our implementations are available in both TensorFlow and PyTorch.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Reference64 articles.

1. Yu Bai , Sally Goldman , and Li Zhang . 2017 . TAPAS: Two-pass Approximate Adaptive Sampling for Softmax. CoRR abs/1707.03073(2017). Yu Bai, Sally Goldman, and Li Zhang. 2017. TAPAS: Two-pass Approximate Adaptive Sampling for Softmax. CoRR abs/1707.03073(2017).

2. Yoshua Bengio and Jean-Sébastien Senecal. 2003. Quick Training of Probabilistic Neural Nets by Importance Sampling. In AISTATS. Yoshua Bengio and Jean-Sébastien Senecal. 2003. Quick Training of Probabilistic Neural Nets by Importance Sampling. In AISTATS.

3. Guy Blanc and Steffen Rendle. 2018. Adaptive Sampled Softmax with Kernel Based Sampling. In ICML. 589–598. Guy Blanc and Steffen Rendle. 2018. Adaptive Sampled Softmax with Kernel Based Sampling. In ICML. 589–598.

4. Sebastian Bruch Xuanhui Wang Michael Bendersky and Marc Najork. 2019. An Analysis of the Softmax Cross Entropy Loss for Learning-to-Rank with Binary Relevance. In ICTIR. 75–78. Sebastian Bruch Xuanhui Wang Michael Bendersky and Marc Najork. 2019. An Analysis of the Softmax Cross Entropy Loss for Learning-to-Rank with Binary Relevance. In ICTIR. 75–78.

5. Zhe Cao Tao Qin Tie-Yan Liu Ming-Feng Tsai and Hang Li. 2007. Learning to rank: from pairwise approach to listwise approach. In ICML. 129–136. Zhe Cao Tao Qin Tie-Yan Liu Ming-Feng Tsai and Hang Li. 2007. Learning to rank: from pairwise approach to listwise approach. In ICML. 129–136.

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Preference Contrastive Learning for Personalized Recommendation;Pattern Recognition and Computer Vision;2023-12-26

2. How graph convolutions amplify popularity bias for recommendation?;Frontiers of Computer Science;2023-12-23

3. Debiased Contrastive Loss for Collaborative Filtering;Knowledge Science, Engineering and Management;2023

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3