Summarizing User-item Matrix By Group Utility Maximization-Reference-Cited by-同舟云学术

Summarizing User-item Matrix By Group Utility Maximization

Published:2023-03-22 Issue:6 Volume:17 Page:1-22
ISSN:1556-4681
Container-title:ACM Transactions on Knowledge Discovery from Data
language:en
Short-container-title:ACM Trans. Knowl. Discov. Data

Author:

Wang Yongjie¹^ORCID,Wang Ke²^ORCID,Long Cheng¹^ORCID,Miao Chunyan¹^ORCID

Affiliation:

1. Nanyang Technological University, Singapore

2. Simon Fraser University, British Columbia, Canada

Abstract

A user-item utility matrix represents the utility (or preference) associated with each (user, item) pair, such as citation counts, rating/vote on items or locations, and clicks on items. A high utility value indicates a strong association of the pair. In this work, we consider the problem of summarizing strong association for a large user-item matrix using a small summary size. Traditional techniques fail to distinguish user groups associated with different items (such as top- l item selection) or fail to focus on high utility (such as similarity- based subspace clustering and biclustering). We formulate a new problem, called Group Utility Maximization (GUM), to summarize the entire user population through k user groups and l items for each group; the goal is to maximize the total utility of selected items over all groups collectively. We show this problem is NP-hard even for l =1. We present two algorithms. One greedily finds the next group, called Greedy algorithm, and the other iteratively refines existing k groups, called k -max algorithm. Greedy algorithm provides the

\((1-\frac{1}{e})\)

approximation guarantee for a nonnegative utility matrix, whereas k -max algorithm is more efficient for large datasets. We evaluate these algorithms on real-life datasets.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3578586

Reference39 articles.

1. Data summarization: a survey

2. Efficient machine learning on data science languages with parallel data summarization

3. David Arthur and Sergei Vassilvitskii. 2006. k-means++: The Advantages of Careful Seeding. Technical Report. Stanford.

4. Francois Belletti Karthik Lakshmanan Walid Krichene Yi-Fan Chen and John Anderson. 2019. Scalable realistic recommendation datasets through fractal expansions. arXiv:1901.08910. Retrieved from https://arxiv.org/abs/1901.08910

5. James Bennett, Stan Lanning, et al. 2007. The netflix prize. In Proceedings of the KDD Cup and Workshop. Citeseer, 35.