Affiliation:
1. Univ. of Minnesota, Minneapolis
2. Univ. of Washington, Seattle
Abstract
Cartesian product files have recently been shown to exhibit attractive properties for partial match queries. This paper considers the file allocation problem for Cartesian product files, which can be stated as follows: Given a
k
-attribute Cartesian product file and an
m
-disk system, allocate buckets among the
m
disks in such a way that, for all possible partial match queries, the concurrency of disk accesses is maximized. The Disk Modulo (DM) allocation method is described first, and it is shown to be strict optimal under many conditions commonly occurring in practice, including all possible partial match queries when the number of disks is 2 or 3. It is also shown that although it has good performance, the DM allocation method is not strict optimal for all possible partial match queries when the number of disks is greater than 3. The General Disk Modulo (GDM) allocation method is then described, and a sufficient but not necessary condition for strict optimality of the GDM method for all partial match queries and any number of disks is then derived. Simulation studies comparing the DM and random allocation methods in terms of the average number of disk accesses, in response to various classes of partial match queries, show the former to be significantly more effective even when the number of disks is greater than 3, that is, even in cases where the DM method is not strict optimal. The results that have been derived formally and shown by simulation can be used for more effective design of optimal file systems for partial match queries. When considering multiple-disk systems with independent access paths, it is important to ensure that similar records are clustered into the same or similar buckets, while similar buckets should be dispersed uniformly among the disks.
Publisher
Association for Computing Machinery (ACM)
Cited by
107 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Generalized Optimal Response Time Retrieval of Replicated Data from Storage Arrays;ACM Transactions on Storage;2013-07
2. A hierarchical organization approach of multi-dimensional remote sensing data for lightweight Web Map Services;Earth Science Informatics;2012-03
3. Equivalent Disk Allocations;IEEE Transactions on Parallel and Distributed Systems;2012-03
4. Geospatial Web Service for Remote Sensing Data Visualization;2011 IEEE International Conference on Advanced Information Networking and Applications;2011-03
5. Parallel M-tree Based on Declustering Metric Objects using K-medoids Clustering;2010 Ninth International Symposium on Distributed Computing and Applications to Business, Engineering and Science;2010-08