Dscaler-Reference-Cited by-同舟云学术

Dscaler

Published:2016-10 Issue:14 Volume:9 Page:1671-1682
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Zhang J. W.¹,Tay Y. C.¹

Affiliation:

1. National University of Singapore

Abstract

The Dataset Scaling Problem (DSP) defined in previous work states: Given an empirical set of relational tables D and a scale factor s, generate a database state D that is similar to D but s times its size . A DSP solution is useful for application development ( s < 1), scalability testing ( s > 1) and anonymization ( s = 1). Current solutions assume all table sizes scale by the same ratio s . However, a real database tends to have tables that grow at different rates. This paper therefore considers non-uniform scaling (nuDSP), a DSP generalization where, instead of a single scale factor s , tables can scale by different factors. D scaler is the first solution for nuDSP. It follows previous work in achieving similarity by reproducing correlation among the primary and foreign keys. However, it introduces the concept of a correlation database that captures fine-grained, per-tuple correlation. Experiments with well-known real and synthetic datasets D show that D scaler produces D with greater similarity to D than state-of-the-art techniques. Here, similarity is measured by number of tuples, frequency distribution of foreign key references, and multi-join aggregate queries.

Publisher

VLDB Endowment

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/3007328.3007333

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Synthetic Data Generation for Enterprise DBMS;2023 IEEE 39th International Conference on Data Engineering (ICDE);2023-04

2. A Scalable Query-Aware Enormous Database Generator for Database Evaluation;IEEE Transactions on Knowledge and Data Engineering;2022

3. Feature Extraction Method Based on Social Network Analysis;Applied Artificial Intelligence;2019-04-23

4. A Collaborative Framework for Similarity Enforcement in Synthetic Scaling of Relational Datasets;2019 IEEE 35th International Conference on Data Engineering (ICDE);2019-04

5. A collaborative framework for tweaking properties in a synthetic dataset;Proceedings of the VLDB Endowment;2018-08