MUDD-Reference-Cited by-同舟云学术

MUDD

Published:2004-01 Issue:1 Volume:29 Page:104-109
ISSN:0163-5948
Container-title:ACM SIGSOFT Software Engineering Notes
language:en
Short-container-title:SIGSOFT Softw. Eng. Notes

Author:

Stephens John M.¹,Poess Meikel²

Affiliation:

1. Gradient Systems, Redwood City, CA

2. Oracle Corporation, Redwood Shores, CA

Abstract

Today's business intelligence systems consist of hundreds of processors with disk subsystems able to handle multiple Giga-bytes of IO-bandwidth. These systems usually contain terabytes of data. Evaluating database system performance of such systems often requires generating synthetic data with well defined statistical properties. To simulate different scenarios, it is important to vary statistical properties including row counts of tables. Foremost, in order to analyze large scale systems, data generators need to be able to produce hundreds of terabytes of data in a timely fashion. In this paper we present MUDD, a multi-dimensional data generator. Originally designed for TPC-DS, a decision support benchmark being developed by the TPC, MUDD is able to generate up to 100 Terabyte of flat file data in hours, utilizing modern multi processor architectures, including clusters. Its novel design separates data generation algorithms from data distribution definitions, enabling users to adjust their workload to individual needs and different scenarios.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/974043.974060

Reference10 articles.

1. Bitton D. DeWitt. D Turbyfill C. Source code for Wisconsin Database Generator distributed on the "Wisconsin Benchmark Tape" Computer Science U. Wisconsin Madison WI. 1984. Bitton D. DeWitt. D Turbyfill C. Source code for Wisconsin Database Generator distributed on the "Wisconsin Benchmark Tape" Computer Science U. Wisconsin Madison WI. 1984.

2. Datatect 'the universal test data generation tool" http://www.quest.com/. Datatect 'the universal test data generation tool" http://www.quest.com/.

3. Quickly generating billion-record synthetic databases

4. OLAP Council APB-1OLAP Benchmark Specification Release IIhttp://www.olapcouncil.org/research/bmarkco.htm 1998. OLAP Council APB-1OLAP Benchmark Specification Release IIhttp://www.olapcouncil.org/research/bmarkco.htm 1998.

Cited by 20 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Data Generation Based on Domain Ontology;Proceedings of the 31st International Conference on Information Systems Development;2023-10-05

2. Enhanced Regular Expression as a DGL for Generation of Synthetic Big Data;J INF PROCESS SYST;2023

3. Beyond TPC-DS, a benchmark for Big Data OLAP systems (BDOLAP-Bench);Future Generation Computer Systems;2022-07

4. Analysis of Benchmark Development Times in the Transaction Processing Performance Council and Ideas on How to Reduce It with a Domain Independent Benchmark Evolution Model;Lecture Notes in Computer Science;2021

5. SmartBench;Proceedings of the VLDB Endowment;2020-08