Denoising and Augmented Negative Sampling for Collaborative Filtering

Author:

Zhao Yuhan1ORCID,Chen Rui1ORCID,Lai Riwei2ORCID,Han Qilong1ORCID,Song Hongtao1ORCID,Chen Li3ORCID

Affiliation:

1. Harbin Engineering University, Harbin, China

2. Hong Kong Baptist University, Hong Kong, Hong Kong

3. Hong Kong Baptist University, Hong Kong Hong Kong

Abstract

Negative sampling plays a crucial role in implicit-feedback-based collaborative filtering, where it leverages massive unlabeled data to generate negative signals for guiding supervised learning. The current state-of-the-art approaches focus on utilizing hard negative samples that contain more information to establish a better decision boundary. To strike a balance between efficiency and effectiveness, most existing methods adopt a two-pass approach: in the first pass, a fixed number of unobserved items are sampled using a simple static distribution, while, in the second pass, a more sophisticated negative sampling strategy is employed to select the final negative items. However, selecting negative samples solely from the original items in a dataset is inherently restricted due to the limited available choices, and thus may not be able to effectively contrast positive samples. In this paper, we empirically validate this observation through meticulously designed experiments and identify three major limitations of existing solutions: ambiguous trap, information discrimination, and false negative samples. Our response to such limitations is to introduce “denoised” and “augmented” negative samples that may not exist in the original dataset. This direction renders a few substantial technical challenges. First, constructing augmented negative samples may introduce excessive noise that eventually distorts the decision boundary. Second, the scarcity of supervision signals hampers the denoising process. To this end, we introduce a novel generic denoising and augmented negative sampling (DANS) paradigm and provide a concrete instantiation. First, we disentangle the hard and easy factors of negative items. Then, we regulate the augmentation of easy factors by carefully considering the direction and magnitude. Next, we propose a reverse attention mechanism to learn a user’s negative preference, which allows us to perform a dimension-level denoising procedure on hard factors. Finally, we design an advanced negative sampling strategy to identify the final negative samples, taking into account both the score function used in existing methods and a novel metric called synthesization gain. Through extensive experiments on real-world datasets, we demonstrate that our method substantially outperforms state-of-the-art baselines. Our code is publicly available at https://github.com/Asa9aoTK/ANS-Recbole.

Publisher

Association for Computing Machinery (ACM)

Reference57 articles.

1. Yu Bai, Sally Goldman, and Li Zhang. 2017. Tapas: Two-pass Approximate Adaptive Sampling for Softmax. arXiv preprint arXiv:1707.03073(2017).

2. Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model

3. Jiawei Chen Hande Dong Xiang Wang Fuli Feng Meng Wang and Xiangnan He. 2020. Bias and Debias in Recommender System: Survey and Future Directions. arXiv preprint arXiv:2010.03240(2020).

4. Learning Recommenders for Implicit Feedback with Importance Resampling

5. Revisiting Graph Based Collaborative Filtering: A Linear Residual Graph Convolutional Network Approach

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3