Improving duplicate elimination in storage systems

Author:

Bobbarjung Deepak R.1,Jagannathan Suresh1,Dubnicki Cezary2

Affiliation:

1. Purdue University, West Lafayette, IN

2. NEC Laboratories America, Princeton, NJ

Abstract

Minimizing the amount of data that must be stored and managed is a key goal for any storage architecture that purports to be scalable. One way to achieve this goal is to avoid maintaining duplicate copies of the same data. Eliminating redundant data at the source by not writing data which has already been stored not only reduces storage overheads, but can also improve bandwidth utilization. For these reasons, in the face of today's exponentially growing data volumes, redundant data elimination techniques have assumed critical significance in the design of modern storage systems.Intelligent object partitioning techniques identify data that is new when objects are updated, and transfer only these chunks to a storage server. In this article, we propose a new object partitioning technique, called fingerdiff , that improves upon existing schemes in several important respects. Most notably, fingerdiff dynamically chooses a partitioning strategy for a data object based on its similarities with previously stored objects in order to improve storage and bandwidth utilization. We present a detailed evaluation of fingerdiff , and other existing object partitioning schemes, using a set of real-world workloads. We show that for these workloads, the duplicate elimination strategies employed by fingerdiff improve storage utilization on average by 25%, and bandwidth utilization on average by 40% over comparable techniques.

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture

Reference31 articles.

1. Compactly encoding unstructured inputs with differential compression

2. Berlekamp E. R. 1968. Algebraic Coding Theory. McGraw-Hill New York.]] Berlekamp E. R. 1968. Algebraic Coding Theory. McGraw-Hill New York.]]

3. Blomer J. Kalfane M. Karp R. Karpinski M. Luby M. and Zuckerman D. 1995. An xor-based erasure-resilient coding scheme. Tech. Rep. International Computer Science Institute Berkeley California.]] Blomer J. Kalfane M. Karp R. Karpinski M. Luby M. and Zuckerman D. 1995. An xor-based erasure-resilient coding scheme. Tech. Rep. International Computer Science Institute Berkeley California.]]

Cited by 75 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. FASTSync: A FAST Delta Sync Scheme for Encrypted Cloud Storage in High-bandwidth Network Environments;ACM Transactions on Storage;2023-10-03

2. A Detailed Review of Data Deduplication Approaches in the Cloud and Key Challenges;2023 4th International Conference on Smart Electronics and Communication (ICOSEC);2023-09-20

3. Double Sliding Window Chunking Algorithm for Data Deduplication in Ocean Observation;IEEE Access;2023

4. An efficient enhanced prefix hash tree model for optimizing the storage and image deduplication in cloud;Concurrency and Computation: Practice and Experience;2022-08-05

5. Improving Restore Performance of Deduplication Systems via a Greedy Rewriting Scheme;2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS);2021-12

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3