Dscaler

Author:

Zhang J. W.1,Tay Y. C.1

Affiliation:

1. National University of Singapore

Abstract

The Dataset Scaling Problem (DSP) defined in previous work states: Given an empirical set of relational tables D and a scale factor s, generate a database state D that is similar to D but s times its size . A DSP solution is useful for application development ( s < 1), scalability testing ( s > 1) and anonymization ( s = 1). Current solutions assume all table sizes scale by the same ratio s . However, a real database tends to have tables that grow at different rates. This paper therefore considers non-uniform scaling (nuDSP), a DSP generalization where, instead of a single scale factor s , tables can scale by different factors. D scaler is the first solution for nuDSP. It follows previous work in achieving similarity by reproducing correlation among the primary and foreign keys. However, it introduces the concept of a correlation database that captures fine-grained, per-tuple correlation. Experiments with well-known real and synthetic datasets D show that D scaler produces D with greater similarity to D than state-of-the-art techniques. Here, similarity is measured by number of tuples, frequency distribution of foreign key references, and multi-join aggregate queries.

Publisher

VLDB Endowment

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Cited by 7 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Synthetic Data Generation for Enterprise DBMS;2023 IEEE 39th International Conference on Data Engineering (ICDE);2023-04

2. A Scalable Query-Aware Enormous Database Generator for Database Evaluation;IEEE Transactions on Knowledge and Data Engineering;2022

3. Feature Extraction Method Based on Social Network Analysis;Applied Artificial Intelligence;2019-04-23

4. A Collaborative Framework for Similarity Enforcement in Synthetic Scaling of Relational Datasets;2019 IEEE 35th International Conference on Data Engineering (ICDE);2019-04

5. A collaborative framework for tweaking properties in a synthetic dataset;Proceedings of the VLDB Endowment;2018-08

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3