Fast Exact Multiplication by the Hessian-Reference-Cited by-同舟云学术

Fast Exact Multiplication by the Hessian

Published:1994-01 Issue:1 Volume:6 Page:147-160
ISSN:0899-7667
Container-title:Neural Computation
language:en
Short-container-title:Neural Computation

Author:

Pearlmutter Barak A.¹

Affiliation:

1. Siemens Corporate Research, 755 College Road East, Princeton, NJ 08540 USA

Abstract

Just storing the Hessian H (the matrix of second derivatives δ2E/δwiδwj of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly calculates Hv, where v is an arbitrary vector. To calculate Hv, we first define a differential operator Rv{f(w)} = (δ/δr)f(w + rv)|r=0, note that Rv{▽w} = Hv and Rv{w} = v, and then apply Rv{·} to the equations used to compute ▽w. The result is an exact and numerically stable procedure for computing Hv, which takes about as much computation, and is about as local, as a gradient evaluation. We then apply the technique to a one pass gradient calculation algorithm (backpropagation), a relaxation gradient calculation algorithm (recurrent backpropagation), and two stochastic gradient calculation algorithms (Boltzmann machines and weight perturbation). Finally, we show that this technique can be used at the heart of many iterative techniques for computing various properties of H, obviating any need to calculate the full Hessian.

Publisher

MIT Press - Journals

Subject

Cognitive Neuroscience,Arts and Humanities (miscellaneous)

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.1.147

Reference13 articles.

1. Exact Calculation of the Hessian Matrix for the Multilayer Perceptron

2. Automatic Hessians by reverse accumulation

Cited by 209 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Random Learning Leads to Faster Convergence in ‘Model‐Free’ ILC: With Application to MIMO Feedforward in Industrial Printing;International Journal of Adaptive Control and Signal Processing;2024-09-02

2. Fast maximum likelihood estimation for general hierarchical models;Journal of Applied Statistics;2024-07-24

3. Distributed adaptive greedy quasi-Newton methods with explicit non-asymptotic convergence bounds;Automatica;2024-07

4. IDE: A System for Iterative Mislabel Detection;Companion of the 2024 International Conference on Management of Data;2024-06-09

5. FUR-API: Dataset and Baselines Toward Realistic API Anomaly Detection;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14