MetaStore: Analyzing Deep Learning Meta-Data at Scale-Reference-Cited by-同舟云学术

MetaStore: Analyzing Deep Learning Meta-Data at Scale

Published:2024-02 Issue:6 Volume:17 Page:1446-1459
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Zhang Huayi¹,Yan Binwei²,Cao Lei³,Madden Samuel⁴,Rundensteiner Elke⁵

Affiliation:

1. WPI, Data Science, Worcester, MA

2. MIT, Cambridge, MA

3. U of Arizona, CS; MIT, CSAIL, Cambridge, MA

4. MIT, CSAIL, Cambridge, MA

5. WPI, Computer Science, Worcester, MA

Abstract

The process of training deep learning models produces a huge amount of meta-data, including but not limited to losses, hidden feature embeddings, and gradients. Model diagnosis tools have been developed to analyze losses and feature embeddings with the aim to improve the performance of these models. However, gradients, despite carrying rich information that is potentially relevant for model interpretation and data debugging, have yet to be fully explored due to their size and complexity. Each single gradient has a size as large as the number of parameters of the neural net - often measured in the tens of millions. This makes it extremely challenging to efficiently collect, store, and analyze large numbers of gradients in these models. In this work, we develop MetaStore to fill this gap. MetaStore leverages our observation that storing certain compact intermediate results produced in the back propagation process, namely, the prefix and suffix gradients, is sufficient for the exact restoration of the original gradient. These prefix and suffix gradients are much more compact than the original gradients, thus allowing us to address the gradient collection and storage challenges. Furthermore, MetaStore features a rich set of analytics operators that allow the users to analyze the gradients for data debugging or model interpretation. Rather than first having to restore the original gradients and then run analytics on top of this decompressed view, MetaStore directly executes these operators on the compact prefix and suffix structures, making gradient-based analytics efficient and scalable. Our experiments on popular deep learning models such as VGG, BERT, and ResNet and benchmark image and text datasets demonstrate that MetaStore outperforms strong baseline methods from 4 to 678x in storage costs and from 2 to 1000x in running time.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.14778/3648160.3648182

Reference56 articles.

1. Deep Learning with Differential Privacy

2. A. F. Aji and K. Heafield. Sparse communication for distributed gradient descent. arXiv preprint arXiv:1704.05021, 2017.

3. D. Alistarh, D. Grubic, J. Li, R. Tomioka, and M. Vojnovic. Qsgd: Communication-efficient sgd via gradient quantization and encoding. Advances in neural information processing systems, 30, 2017.

4. ModelTracker

5. Qsparse-local-sgd: Distributed sgd with quantization, sparsification and local computations;Basu D.;Advances in Neural Information Processing Systems,2019