Abstract
We demonstrate EDA4Sum, a framework dedicated to generating guided multi-step data summarization pipelines for very large datasets. Data summarization is the process of producing interpretable and representative subsets of an input dataset. It is usually performed following a one-shot process with the purpose of finding the best summary. EDA4Sum leverages Exploratory Data Analysis (EDA) to produce connected summaries in multiple steps, with the goal of maximizing their cumulative utility. A useful summary contains
k individually uniform
sets that are
collectively diverse
to be representative of the input data. EDA4Sum accommodates datasets with different characteristics by providing the ability to tune the weights of uniformity, diversity and novelty when generating multi-step summaries. We demonstrate the superiority of multi-step EDA summarization over single-step summarization for summarizing very large data, and the need to provide guidance to domain experts, by interacting with the VLDB'22 participants who will act as data analysts. The application is avilable at https://bit.ly/eda4sum_application.
Publisher
Association for Computing Machinery (ACM)
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Reference13 articles.
1. Sihem Amer-Yahia Tova Milo and Brit Youngmann. 2021. Exploring Ratings in Subjective Databases. In SIGMOD. Sihem Amer-Yahia Tova Milo and Brit Youngmann. 2021. Exploring Ratings in Subjective Databases. In SIGMOD.
2. Ori Bar El Tova Milo and Amit Somech. 2020. Automatically generating data exploration sessions using deep reinforcement learning. In SIGMOD. 1527--1537. Ori Bar El Tova Milo and Amit Somech. 2020. Automatically generating data exploration sessions using deep reinforcement learning. In SIGMOD. 1527--1537.
3. Alexandra Kim , Laks VS Lakshmanan, and Divesh Srivastava . 2020 . Summarizing Hierarchical Multidimensional Data. In ICDE. IEEE. Alexandra Kim, Laks VS Lakshmanan, and Divesh Srivastava. 2020. Summarizing Hierarchical Multidimensional Data. In ICDE. IEEE.
4. Volodymyr Mnih , Adrià Puigdomènech Badia , Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016 . Asynchronous Methods for Deep Reinforcement Learning. In ICML. Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous Methods for Deep Reinforcement Learning. In ICML.
5. DORA THE EXPLORER
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献