Affiliation:
1. University of Texas at San Antonio
Abstract
Declustering techniques reduce query response times through parallel I/O by distributing data among parallel disks. Recently, replication-based approaches were proposed to further reduce the response time. Efficient retrieval of replicated data from multiple disks is a challenging problem. Existing retrieval techniques are designed for storage arrays with identical disks, having no initial load or network delay. In this article, we consider the generalized retrieval problem of replicated data where the disks in the system might be heterogeneous, the disks may have initial load, and the storage arrays might be located on different sites. We first formulate the generalized retrieval problem using a Linear Programming (LP) model and solve it with mixed integer programming techniques. Next, the generalized retrieval problem is formulated as a more efficient maximum flow problem. We prove that the retrieval schedule returned by the maximum flow technique yields the optimal response time and this result matches the LP solution. We also propose a low-complexity online algorithm for the generalized retrieval problem by not guaranteeing the optimality of the result. Performance of proposed and state of the art retrieval strategies are investigated using various replication schemes, query types, query loads, disk specifications, network delays, and initial loads.
Funder
Division of Computer and Network Systems
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture
Reference63 articles.
1. Adaptec. 2010. Adaptec high-performance hybrid arrays (HPHAs). http://www.adaptec.com/nr/rdonlyres/a1c72763-e3b9-45f7-b871-a490c29a9b11/0/hpha5_fb.pdf. PMC-Sierra Inc. Adaptec. 2010. Adaptec high-performance hybrid arrays (HPHAs). http://www.adaptec.com/nr/rdonlyres/a1c72763-e3b9-45f7-b871-a490c29a9b11/0/hpha5_fb.pdf. PMC-Sierra Inc.
2. Equivalent Disk Allocations
3. On the parallel implementation of Goldberg's maximum flow algorithm
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Big Data Aware Virtual Machine Placement in Cloud Data Centers;Proceedings of the Fourth IEEE/ACM International Conference on Big Data Computing, Applications and Technologies;2017-12-05
2. Exploiting Replication for Energy Efficiency of Heterogeneous Storage Systems;2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS);2016-09
3. Multithreaded Maximum Flow Based Optimal Replica Selection Algorithm for Heterogeneous Storage Architectures;IEEE Transactions on Computers;2016-05-01
4. Dynamic Data Layout Optimization for High Performance Parallel I/O;INT C HIGH PERFORM;2016