Affiliation:
1. The University of British Columbia, Microsoft Research
2. Microsoft Research
Abstract
We collected file system content data from 857 desktop computers at Microsoft over a span of 4 weeks. We analyzed the data to determine the relative efficacy of data deduplication, particularly considering whole-file versus block-level elimination of redundancy. We found that whole-file deduplication achieves about three quarters of the space savings of the most aggressive block-level deduplication for storage of live file systems, and 87% of the savings for backup images. We also studied file fragmentation, finding that it is not prevalent, and updated prior file system metadata studies, finding that the distribution of file sizes continues to skew toward very large unstructured files.
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture
Reference33 articles.
1. BackupRead. 2010. Microsoft Corp. BackupRead function. MSDN. http://msdn.microsoft.com/en-us/library/aa362509(VS.85).aspx BackupRead. 2010. Microsoft Corp. BackupRead function. MSDN. http://msdn.microsoft.com/en-us/library/aa362509(VS.85).aspx
2. Space/time trade-offs in hash coding with allowable errors
Cited by
225 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献