Affiliation:
1. Asparna Research Center, Israel
Abstract
Differencing between compressed archives is a common task in file management and synchronization. Applications include source code distribution, application updates, and document synchronization. General purpose binary differencing tools can create and apply patches to compressed archives, but don’t consider the internal structure of the compressed archive or the file lifecycle. Therefore, they miss opportunities to save space based on the archive’s internal structure and metadata. To address the gap, we develop a content-aware, format independent theory for differencing on compressed archives and propose a canonical form and digest for compressed archives. Based on them, we present Donag, a content-aware differencing and patching algorithm that produces smaller patches than general purpose binary differencing tools on versioned archives by exploiting the compressed archives’ internal structure. Donag uses the VCDiff and BSDiff engines internally. We compare Donag’s patches to ones produced by bsdiff, xdelta3, and Delta++ on three classes of compressed archives: open-source code repositories, large and small applications, and office productivity documents (DOCX, XLSX, PPTX). Donag’s patches are typically 10% to 89% smaller than those produced by bsdiff, xdelta3, and Delta++, with reasonable memory overhead and throughput on commodity hardware. In the worst case, Donag’s patches are negligibly larger.
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture
Reference41 articles.
1. Software Updates: Courgette;Adams Stephen;Online,2009
2. Gioele Barabucci. 2013. A Universal Delta Model. Dissertation. Universita di Bologna, Bologna, Italy. https://core.ac.uk/download/pdf/11014284.pdf.
3. Measuring the quality of diff algorithms: a formalization
4. Method and system for differencing container files;Bittinger Reed;US Patent,2000
5. John Boyer and Glenn Marcy. 2008. Canonical XML Version 1.1. W3C Recommendation. World Wide Web Consortium.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Exploiting Multiple Similarity Spaces for Efficient and Flexible Incremental Update of Mobile Apps;IEEE INFOCOM 2024 - IEEE Conference on Computer Communications;2024-05-20
2. Automotive OTA Upgrade Scheme Based on Optimal Difference Algorithm;2023 3rd International Conference on Robotics, Automation and Intelligent Control (ICRAIC);2023-11-24