Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win-Reference-Cited by-同舟云学术

Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win

Published:2022-06-28 Issue:6 Volume:36 Page:6577-6586
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Evci Utku,Ioannou Yani,Keskin Cem,Dauphin Yann

Abstract

Sparse Neural Networks (NNs) can match the generalization of dense NNs using a fraction of the compute/storage for inference, and have the potential to enable efficient training. However, naively training unstructured sparse NNs from random initialization results in significantly worse generalization, with the notable exceptions of Lottery Tickets (LTs) and Dynamic Sparse Training (DST). In this work, we attempt to answer: (1) why training unstructured sparse networks from random initialization performs poorly and; (2) what makes LTs and DST the exceptions? We show that sparse NNs have poor gradient flow at initialization and propose a modified initialization for unstructured connectivity. Furthermore, we find that DST methods significantly improve gradient flow during training over traditional sparse training methods. Finally, we show that LTs do not improve gradient flow, rather their success lies in re-learning the pruning solution they are derived from — however, this comes at the cost of learning novel solutions.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Finding core labels for maximizing generalization of graph neural networks;Neural Networks;2024-12

2. Dynamic Sparse Learning: A Novel Paradigm for Efficient Recommendation;Proceedings of the 17th ACM International Conference on Web Search and Data Mining;2024-03-04

3. Neural Network Pruning for Real-Time Polyp Segmentation;Medical Image Understanding and Analysis;2023-12-02

4. PTCP: Alleviate Layer Collapse in Pruning at Initialization via Parameter Threshold Compensation and Preservation;Communications in Computer and Information Science;2023-11-27

5. Dimensionality reduced training by pruning and freezing parts of a deep neural network: a survey;Artificial Intelligence Review;2023-05-01