Affiliation:
1. University College London, Ipswich Suffolk, UK
2. Delft University of Technology, Delft, The Netherlands and CWI, Amsterdam, The Netherlands
3. Delft University of Technology, Delft, The Netherlands
Abstract
Collaborative filtering aims at predicting a user's interest for a given item based on a collection of user profiles. This article views collaborative filtering as a problem highly related to information retrieval, drawing an analogy between the concepts of users and items in recommender systems and queries and documents in text retrieval.
We present a probabilistic user-to-item relevance framework that introduces the concept of relevance into the related problem of collaborative filtering. Three different models are derived, namely, a
user-based
, an
item-based
, and a
unified relevance model
, and we estimate their rating predictions from three sources: the user's own ratings for different items, other users' ratings for the same item, and ratings from different but similar users for other but similar items.
To reduce the data sparsity encountered when estimating the probability density function of the relevance variable, we apply the nonparametric (data-driven) density estimation technique known as the
Parzen-window method
(or kernel-based density estimation). Using a Gaussian window function, the similarity between users and/or items would, however, be based on Euclidean distance. Because the collaborative filtering literature has reported improved prediction accuracy when using cosine similarity, we generalize the Parzen-window method by introducing a
projection kernel
.
Existing user-based and item-based approaches correspond to two simplified instantiations of our framework. User-based and item-based collaborative filterings represent only a partial view of the prediction problem, where the unified relevance model brings these partial views together under the same umbrella. Experimental results complement the theoretical insights with improved recommendation accuracy. The unified model is more robust to data sparsity because the different types of ratings are used in concert.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Science Applications,General Business, Management and Accounting,Information Systems
Cited by
46 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献