Efficient and effective training of sparse recurrent neural networks-Reference-Cited by-同舟云学术

Efficient and effective training of sparse recurrent neural networks

Published:2021-01-26 Issue:15 Volume:33 Page:9625-9636
ISSN:0941-0643
Container-title:Neural Computing and Applications
language:en
Short-container-title:Neural Comput & Applic

Author:

Liu Shiwei^ORCID,Ni’mah Iftitahu,Menkovski Vlado,Mocanu Decebal Constantin,Pechenizkiy Mykola

Abstract

AbstractRecurrent neural networks (RNNs) have achieved state-of-the-art performances on various applications. However, RNNs are prone to be memory-bandwidth limited in practical applications and need both long periods of training and inference time. The aforementioned problems are at odds with training and deploying RNNs on resource-limited devices where the memory and floating-point operations (FLOPs) budget are strictly constrained. To address this problem, conventional model compression techniques usually focus on reducing inference costs, operating on a costly pre-trained model. Recently, dynamic sparse training has been proposed to accelerate the training process by directly training sparse neural networks from scratch. However, previous sparse training techniques are mainly designed for convolutional neural networks and multi-layer perceptron. In this paper, we introduce a method to train intrinsically sparse RNN models with a fixed number of parameters and floating-point operations (FLOPs) during training. We demonstrate state-of-the-art sparse performance with long short-term memory and recurrent highway networks on widely used tasks, language modeling, and text classification. We simply use the results to advocate that, contrary to the general belief that training a sparse neural network from scratch leads to worse performance than dense networks, sparse training with adaptive connectivity can usually achieve better performance than dense models for RNNs.

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Software

Link

https://link.springer.com/content/pdf/10.1007/s00521-021-05727-y.pdf

Reference66 articles.

1. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: 12th $$\{$$USENIX$$\}$$ symposium on operating systems design and implementation ($$\{$$OSDI$$\}$$ 16), pp 265–283

2. Tessera k, Hooker S, Rosman B (2021) Keep the gradients flowing: using gradient flow to study sparse network optimization. https://openreview.net/forum?id=HI0j7omXTaG

3. Liu S, Mocanu DC, Pei Y, Pechenizkiy M (2021) Selfish sparse RNN training. In: Submitted to international conference on learning representations. https://openreview.net/forum?id=5wmNjjvGOXh

4. Antol S, Agrawal A, Lu J, Mitchell M, Batra D, Lawrence ZC, Parikh D (2015) VQA: visual question answering. In: Proceedings of the IEEE international conference on computer vision, pp 2425–2433

5. Aquino G, Rubio JDJ, Pacheco J, Gutierrez GJ, Ochoa G, Balcazar R, Cruz DR, Garcia E, Novoa JF, Zacarias A (2020) Novel nonlinear hypothesis for the delta parallel robot modeling. IEEE Access 8:46324–46334

Cited by 18 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Learning-powered migration of social digital twins at the network edge;Computer Communications;2024-10

2. Learn & drop: fast learning of cnns based on layer dropping;Neural Computing and Applications;2024-03-28

3. An Automatic Process of Online Handwriting Recognition and Its Challenges;Lecture Notes in Networks and Systems;2024

4. Exploring Neural Network Structure through Sparse Recurrent Neural Networks: A Recasting and Distillation of Neural Network Hyperparameters;2023 International Conference on Machine Learning and Applications (ICMLA);2023-12-15

5. Hybrid Deep Learning Model Based on Sparse Recurrent Architecture;Journal of Circuits, Systems and Computers;2023-11-04