Structured Matrices and Their Application in Neural Networks: A Survey-Reference-Cited by-同舟云学术

Structured Matrices and Their Application in Neural Networks: A Survey

Published:2023-07-26 Issue:3 Volume:41 Page:697-722
ISSN:0288-3635
Container-title:New Generation Computing
language:en
Short-container-title:New Gener. Comput.

Author:

Kissel Matthias^ORCID,Diepold Klaus

Abstract

AbstractModern neural network architectures are becoming larger and deeper, with increasing computational resources needed for training and inference. One approach toward handling this increased resource consumption is to use structured weight matrices. By exploiting structures in weight matrices, the computational complexity for propagating information through the network can be reduced. However, choosing the right structure is not trivial, especially since there are many different matrix structures and structure classes. In this paper, we give an overview over the four main matrix structure classes, namely semiseparable matrices, matrices of low displacement rank, hierarchical matrices and products of sparse matrices. We recapitulate the definitions of each structure class, present special structure subclasses, and provide references to research papers in which the structures are used in the domain of neural networks. We present two benchmarks comparing the classes. First, we benchmark the error for approximating different test matrices. Second, we compare the prediction performance of neural networks in which the weight matrix of the last layer is replaced by structured matrices. After presenting the benchmark results, we discuss open research questions related to the use of structured matrices in neural networks and highlight future research directions.

Funder

Technische Universität München

Publisher

Springer Science and Business Media LLC

Subject

Computer Networks and Communications,Hardware and Architecture,Theoretical Computer Science,Software

Link

https://link.springer.com/content/pdf/10.1007/s00354-023-00226-1.pdf

Reference95 articles.

1. Ailon, N., Leibovitch, O., Nair, V.: Sparse linear networks with a fixed butterfly structure: theory and practice. In: Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, vol. 161, pp. 1174–1184. PMLR (2021)

2. Ambikasaran, S.: Fast algorithms for dense numerical linear algebra and applications. PhD thesis (2013)

3. Ambikasaran, S., Darve, E.: An o (n log n) fast direct solver for partial hierarchically semi-separable matrices. J. Sci. Comput. 57(3), 477–501 (2013)

4. Appuswamy, R., Nayak, T., Arthur, J., Esser, S., Merolla, P., Mckinstry, J., Melano, T., Flickner, M., Modha, D.: Structured convolution matrices for energy-efficient deep learning. arXiv preprint arXiv:1606.02407 (2016)

5. Beatson, R.K., Newsam, G.N.: Fast evaluation of radial basis functions: I. Comput. Math. Appl. 24(12), 7–19 (1992)

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. On Block g-Circulant Matrices with Discrete Cosine and Sine Transforms for Transformer-Based Translation Machine;Mathematics;2024-05-29

2. What is the gradient of a scalar function defined on a subspace of square matrices ?;Indian Journal of Pure and Applied Mathematics;2024-04-24