Abstract
Automatic differentiation (AD) in reverse mode (RAD) is a central component of deep learning and other uses of large-scale optimization. Commonly used RAD algorithms such as backpropagation, however, are complex and stateful, hindering deep understanding, improvement, and parallel execution. This paper develops a simple, generalized AD algorithm calculated from a simple, natural specification. The general algorithm is then specialized by varying the representation of derivatives. In particular, applying well-known constructions to a naive representation yields two RAD algorithms that are far simpler than previously known. In contrast to commonly used RAD implementations, the algorithms defined here involve no graphs, tapes, variables, partial derivatives, or mutation. They are inherently parallel-friendly, correct by construction, and usable directly from an existing programming language with no need for new data types or programming style, thanks to use of an AD-agnostic compiler plugin.
Publisher
Association for Computing Machinery (ACM)
Subject
Safety, Risk, Reliability and Quality,Software
Reference56 articles.
1. Andrew W. Appel. 2007. Compiling with Continuations. Cambridge University Press. Andrew W. Appel. 2007. Compiling with Continuations. Cambridge University Press.
2. Richard Bird and Oege de Moor. 1996. The Algebra of Programming . Prentice-Hall. Richard Bird and Oege de Moor. 1996. The Algebra of Programming . Prentice-Hall.
3. Max Bolingbroke. 2011. Constraint kinds for GHC. Blog post. http://blog.omega- prime.co.uk/2011/09/10/ constraint- kinds- for- ghc/ . Max Bolingbroke. 2011. Constraint kinds for GHC. Blog post. http://blog.omega- prime.co.uk/2011/09/10/ constraint- kinds- for- ghc/ .
Cited by
51 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. δ is for Dialectica;Proceedings of the 39th Annual ACM/IEEE Symposium on Logic in Computer Science;2024-07-08
2. Physics-informed neural networks in groundwater flow modeling: Advantages and future directions;Groundwater for Sustainable Development;2024-05
3. Distributions for Compositionally Differentiating Parametric Discontinuities;Proceedings of the ACM on Programming Languages;2024-04-29
4. A Tensor Algebra Compiler for Sparse Differentiation;2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO);2024-03-02
5. Algebraic Dynamical Systems in Machine Learning;Applied Categorical Structures;2024-01-18