Affiliation:
1. Computer Sciences Department, University of Wisconsin-Madison
Abstract
Today large amounts of data are stored on tertiary storage media such as magnetic tapes and optical disks. DBMSs typically operate only on magnetic disks since they know how to maneuver disks and how to optimize accesses on them. Tertiary devices present a problem for DBMSs since these devices have dismountable media and have very different operational characteristics compared to magnetic disks. For instance, most tape drives offer very high capacity at low cost but are accessed sequentially, involve lengthy latencies, and deliver lower bandwidth. Typically, the scope of a DBMS's query optimizer does not include tertiary devices, and the DBMS might not even know how to control and operate upon tertiary-resident data. In a three-level hierarchy of storage devices (main memory, disk, tape), the typical solution is to elevate tape-resident data to disk devices, thus bringing such data into the DBMS' control, and then to perform the required operations on disk. This requires additional space on disk and may not give the lowest response time possible. With this challenge in mind, we studied the trade-offs between memory and disk requirements and the execution time of a join with the help of two well-known join methods. The conventional, disk-based Nested Block Join and Hybrid Hash Join were modified to operate directly on tapes. An experimental implementation of the modified algorithms gave us more insight into how the algorithms perform in practice. Our performance analysis shows that a DBMS desiring to operate on tertiary storage will benefit from special algorithms that operate directly on tape-resident data and take into account and exploit the mismatch in disk and tape characteristics.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications,Hardware and Architecture,Software
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Massive Storage Systems;Journal of Computer Science and Technology;2006-09
2. High Performance Virtual Backup and Archive System;Computational Science – ICCS 2006;2006
3. Scheduling Queries for Tape-Resident Data;Euro-Par 2000 Parallel Processing;2000