Affiliation:
1. Inria, Labri, University of Bordeaux, Talence, France
Abstract
Deep learning training memory needs can prevent the user from considering large models and large batch sizes. In this work, we propose to use techniques from memory-aware scheduling and automatic differentiation (AD) to execute a backpropagation graph with a bounded memory requirement at the cost of extra recomputations. The case of a single homogeneous chain, i.e. the case of a network whose stages are all identical and form a chain, is well understood and optimal solutions have been proposed in the AD literature. The networks encountered in practice in the context of deep learning are much more diverse, both in terms of shape and heterogeneity. In this work, we define the class of backpropagation graphs, and extend those on which one can compute in polynomial time a solution that minimizes the total number of recomputations. In particular, we consider join graphs which correspond to models such as siamese or cross-modal networks.
This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.
Funder
Agence Nationale de la Recherche
Subject
General Physics and Astronomy,General Engineering,General Mathematics
Reference23 articles.
1. Glorot X Bengio Y. 2010 Understanding the difficulty of training deep feedforward neural networks. In Proc. of the thirteenth Int. Conf. on Artificial Intelligence and Statistics pp. 249–256.
2. Dean J et al. 2012 Large scale distributed deep networks. In Advances in neural information processing systems pp. 1223–1231. (https://papers.nips.cc/paper/4687-large-scale-distributed-deep-networks)
3. Das D Avancha S Mudigere D Vaidynathan K Sridharan S Kalamkar D Kaul B Dubey P. 2016 Distributed deep learning using synchronous stochastic gradient descent. (http://arxiv.org/abs/1602.06709)
4. Complete Register Allocation Problems
5. An Application of Generalized Tree Pebbling to Sparse Matrix Factorization
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献