Exact learning dynamics of deep linear networks with prior knowledge <sup>*</sup>-Reference-Cited by-同舟云学术

Exact learning dynamics of deep linear networks with prior knowledge ^*

Published:2023-11-01 Issue:11 Volume:2023 Page:114004
ISSN:1742-5468
Container-title:Journal of Statistical Mechanics: Theory and Experiment
language:
Short-container-title:J. Stat. Mech.

Author:

J Dominé Clémentine C,Braun Lukas,Fitzgerald James E,Saxe Andrew M

Abstract

Abstract Learning in deep neural networks is known to depend critically on the knowledge embedded in the initial network weights. However, few theoretical results have precisely linked prior knowledge to learning dynamics. Here we derive exact solutions to the dynamics of learning with rich prior knowledge in deep linear networks by generalising Fukumizu’s matrix Riccati solution (Fukumizu 1998 Gen 1 1E–03). We obtain explicit expressions for the evolving network function, hidden representational similarity, and neural tangent kernel over training for a broad class of initialisations and tasks. The expressions reveal a class of task-independent initialisations that radically alter learning dynamics from slow non-linear dynamics to fast exponential trajectories while converging to a global optimum with identical representational similarity, dissociating learning trajectories from the structure of initial internal representations. We characterise how network weights dynamically align with task structure, rigorously justifying why previous solutions successfully described learning from small initial weights without incorporating their fine-scale structure. Finally, we discuss the implications of these findings for continual learning, reversal learning and learning of structured knowledge. Taken together, our results provide a mathematical toolkit for understanding the impact of prior knowledge on deep learning.

Publisher

IOP Publishing

Subject

Statistics, Probability and Uncertainty,Statistics and Probability,Statistical and Nonlinear Physics

Link

https://iopscience.iop.org/article/10.1088/1742-5468/ad01b8/pdf

Reference61 articles.

1. Theory of deep learning (in preparation);Arora,2020

2. A convergence analysis of gradient descent for deep linear neural networks;Arora,2018b

3. On the optimization of deep networks: implicit acceleration by overparameterization;Arora,2018a

4. Implicit regularization in deep matrix factorization;Arora,2019a

5. On exact computation with an infinitely wide neural net;Arora,2019b