Abstract
Transformer-based models are powerful for modeling temporal dynamics of user preference in sequential recommendation. Most of the variants adopt the Softmax transformation in the self-attention layers to generate dense attention probabilities. However, real-world item sequences are often noisy, containing a mixture of true-positive and false-positive interactions. Such dense attentions inevitably assign probability mass to noisy or irrelevant items, leading to sub-optimal performance and poor explainability. Here we propose a Probabilistic Masked Attention Network (PMAN) to identify the sparse pattern of attentions, which is more desirable for pruning noisy items in sequential recommendation. Specifically, we employ a probabilistic mask to achieve sparse attentions under a constrained optimization framework. As such, PMAN allows to select which information is critical to be retained or dropped in a data-driven fashion. Experimental studies on real-world benchmark datasets show that PMAN is able to improve the performance of Transformers significantly.
Publisher
International Joint Conferences on Artificial Intelligence Organization
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献