Affiliation:
1. EMC Corp.
2. Data Computing Division
Abstract
Modern data warehouses store exceedingly large amounts of data, generally considered the crown jewels of an enterprise. The amount of data maintained in such data warehouses increases significantly over time---often at a continuous pace, e.g., by gathering additional data or retaining data for longer periods to derive additional business value, but occasionally also precipitously, e.g., when consolidating disparate data warehouses and Data Marts into a single database. Having to
expand
a data warehouse with 100's of TB of data by a substantial portion, e.g., 100% or more is a complex and disruptive maintenance operation as it typically involves some sort of dumping and reloading of data which requires substantial downtime.
In this paper we describe the methodology and mechanisms we developed in Greenplum Database to expand large-scale data warehouses in an
online
fashion, i.e., without noticeable downtime. At the core of our approach is a set of robust and transactionally consistent primitives that enable efficient data movement. Special emphasis was put on usability and control that lets an administrator tailor the expansion process to specific operational characteristics via priorities and schedules.
We present a number of experiments to quantify the impact of an on-going expansion on query workloads.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Towards a Shared-Storage-Based Serverless Database Achieving Seamless Scale-Up and Read Scale-Out;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13
2. Remus: Efficient Live Migration for Distributed Databases with Snapshot Isolation;Proceedings of the 2022 International Conference on Management of Data;2022-06-10
3. Skyline Query Processing over Encrypted Data;Proceedings of the First International Workshop on Privacy and Secuirty of Big Data - PSBD '14;2014
4. Big Graph Analytics;Proceedings of the 17th International Workshop on Data Warehousing and OLAP - DOLAP '14;2014
5. Healthcare Trajectory Mining by Combining Multidimensional Component and Itemsets;New Frontiers in Mining Complex Patterns;2013