Summarizing User-item Matrix By Group Utility Maximization

Author:

Wang Yongjie1ORCID,Wang Ke2ORCID,Long Cheng1ORCID,Miao Chunyan1ORCID

Affiliation:

1. Nanyang Technological University, Singapore

2. Simon Fraser University, British Columbia, Canada

Abstract

A user-item utility matrix represents the utility (or preference) associated with each (user, item) pair, such as citation counts, rating/vote on items or locations, and clicks on items. A high utility value indicates a strong association of the pair. In this work, we consider the problem of summarizing strong association for a large user-item matrix using a small summary size. Traditional techniques fail to distinguish user groups associated with different items (such as top- l item selection) or fail to focus on high utility (such as similarity- based subspace clustering and biclustering). We formulate a new problem, called Group Utility Maximization (GUM), to summarize the entire user population through k user groups and l items for each group; the goal is to maximize the total utility of selected items over all groups collectively. We show this problem is NP-hard even for l =1. We present two algorithms. One greedily finds the next group, called Greedy algorithm, and the other iteratively refines existing k groups, called k -max algorithm. Greedy algorithm provides the \((1-\frac{1}{e})\) approximation guarantee for a nonnegative utility matrix, whereas k -max algorithm is more efficient for large datasets. We evaluate these algorithms on real-life datasets.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference39 articles.

1. Data summarization: a survey

2. Efficient machine learning on data science languages with parallel data summarization

3. David Arthur and Sergei Vassilvitskii. 2006. k-means++: The Advantages of Careful Seeding. Technical Report. Stanford.

4. Francois Belletti Karthik Lakshmanan Walid Krichene Yi-Fan Chen and John Anderson. 2019. Scalable realistic recommendation datasets through fractal expansions. arXiv:1901.08910. Retrieved from https://arxiv.org/abs/1901.08910

5. James Bennett, Stan Lanning, et al. 2007. The netflix prize. In Proceedings of the KDD Cup and Workshop. Citeseer, 35.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3