MiSoSouP-Reference-Cited by-同舟云学术

MiSoSouP

Published:2020-08-21 Issue:5 Volume:14 Page:1-31
ISSN:1556-4681
Container-title:ACM Transactions on Knowledge Discovery from Data
language:en
Short-container-title:ACM Trans. Knowl. Discov. Data

Author:

Riondato Matteo¹,Vandin Fabio²

Affiliation:

1. Amherst College, East Drive, Amherst, MA

2. Università di Padova, Via G. Gradenigo, Padova, Italy

Abstract

We present MiSoSouP, a suite of algorithms for extracting high-quality approximations of the most interesting subgroups, according to different popular interestingness measures, from a random sample of a transactional dataset. We describe a new formulation of these measures as functions of averages, that makes it possible to approximate them using sampling. We then discuss how pseudodimension, a key concept from statistical learning theory, relates to the sample size needed to obtain an high-quality approximation of the most interesting subgroups. We prove an upper bound on the pseudodimension of the problem at hand, which depends on characteristic quantities of the dataset and of the language of patterns of interest. This upper bound then leads to small sample sizes. Our evaluation on real datasets shows that MiSoSouP outperforms state-of-the-art algorithms offering the same guarantees, and it vastly speeds up the discovery of subgroups w.r.t. analyzing the whole dataset.

Funder

University of Padova

National Science Foundation

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3385653

Reference37 articles.

1. Subgroup discovery

2. Anytime discovery of a diverse set of patterns with Monte Carlo tree search

3. Decoupling

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. SILVAN : Estimating Betweenness Centralities with Progressive Sampling and Non-uniform Rademacher Bounds;ACM Transactions on Knowledge Discovery from Data;2023-12-09

2. Estimation and update of betweenness centrality with progressive algorithm and shortest paths approximation;Scientific Reports;2023-10-10

3. Efficient Centrality Maximization with Rademacher Averages;Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining;2023-08-04

4. VC-dimension and Rademacher Averages of Subgraphs, with Applications to Graph Mining;2023 IEEE 39th International Conference on Data Engineering (ICDE);2023-04

5. MCRapper: Monte-Carlo Rademacher Averages for Poset Families and Approximate Pattern Mining;ACM Transactions on Knowledge Discovery from Data;2022-07-30