Affiliation:
1. Nanyang Technological University, Singapore
2. Simon Fraser University, British Columbia, Canada
Abstract
A user-item utility matrix represents the utility (or preference) associated with each (user, item) pair, such as citation counts, rating/vote on items or locations, and clicks on items. A high utility value indicates a strong association of the pair. In this work, we consider the problem of summarizing strong association for a large user-item matrix using a small summary size. Traditional techniques fail to distinguish user groups associated with different items (such as top-
l
item selection) or fail to focus on high utility (such as similarity- based subspace clustering and biclustering). We formulate a new problem, called Group Utility Maximization (GUM), to summarize the entire user population through
k
user groups and
l
items for each group; the goal is to maximize the total utility of selected items over all groups collectively. We show this problem is NP-hard even for
l
=1. We present two algorithms. One greedily finds the next group, called Greedy algorithm, and the other iteratively refines existing
k
groups, called
k
-max algorithm. Greedy algorithm provides the
\((1-\frac{1}{e})\)
approximation guarantee for a nonnegative utility matrix, whereas
k
-max algorithm is more efficient for large datasets. We evaluate these algorithms on real-life datasets.
Publisher
Association for Computing Machinery (ACM)
Reference39 articles.
1. Data summarization: a survey
2. Efficient machine learning on data science languages with parallel data summarization
3. David Arthur and Sergei Vassilvitskii. 2006. k-means++: The Advantages of Careful Seeding. Technical Report. Stanford.
4. Francois Belletti Karthik Lakshmanan Walid Krichene Yi-Fan Chen and John Anderson. 2019. Scalable realistic recommendation datasets through fractal expansions. arXiv:1901.08910. Retrieved from https://arxiv.org/abs/1901.08910
5. James Bennett, Stan Lanning, et al. 2007. The netflix prize. In Proceedings of the KDD Cup and Workshop. Citeseer, 35.