Affiliation:
1. Xiamen University
2. University of Nebraska-Lincoln
3. National University of Defense Technology
Abstract
Data deduplication has been demonstrated to be an effective technique in reducing the total data transferred over the network and the storage space in cloud backup, archiving, and primary storage systems, such as VM (virtual machine) platforms. However, the performance of restore operations from a deduplicated backup can be significantly lower than that without deduplication. The main reason lies in the fact that a file or block is split into multiple small data chunks that are often located in different disks after deduplication, which can cause a subsequent read operation to invoke many disk IOs involving multiple disks and thus degrade the read performance significantly. While this problem has been by and large ignored in the literature thus far, we argue that the time is ripe for us to pay significant attention to it in light of the emerging cloud storage applications and the increasing popularity of the VM platform in the cloud. This is because, in a cloud storage or VM environment, a simple read request on the client side may translate into a restore operation if the data to be read or a VM suspended by the user was previously deduplicated when written to the cloud or the VM storage server, a likely scenario considering the network bandwidth and storage capacity concerns in such an environment.
To address this problem, in this article, we propose SAR, an SSD (solid-state drive)-Assisted Read scheme, that effectively exploits the high random-read performance properties of SSDs and the unique data-sharing characteristic of deduplication-based storage systems by storing in SSDs the unique data chunks with high reference count, small size, and nonsequential characteristics. In this way, many read requests to HDDs are replaced by read requests to SSDs, thus significantly improving the read performance of the deduplication-based storage systems in the cloud. The extensive trace-driven and VM restore evaluations on the prototype implementation of SAR show that SAR outperforms the traditional deduplication-based and flash-based cache schemes significantly, in terms of the average response times.
Funder
National Science Foundation
Huawei Technologies
Ministry of Education of the People's Republic of China
State Education Ministry
Division of Information and Intelligent Systems
National Natural Science Foundation of China
Division of Computer and Network Systems
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture
Reference41 articles.
1. FAWN
2. Armbrust M. Fox A. Griffith R. Joseph A. D. Katz R. H. Konwinski A. Lee G. Patterson D. A. Rabkin A. Stoica I. and Zaharia M. 2009. Above the clouds: A Berkeley view of cloud computing. Tech. rep. USB/EECS-2009-28 University of California Berkeley. Armbrust M. Fox A. Griffith R. Joseph A. D. Katz R. H. Konwinski A. Lee G. Patterson D. A. Rabkin A. Stoica I. and Zaharia M. 2009. Above the clouds: A Berkeley view of cloud computing. Tech. rep. USB/EECS-2009-28 University of California Berkeley.
3. Providing High Reliability in a Minimum Redundancy Archival Storage System
4. Gordon
Cited by
53 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献