Real block-circulant matrices and DCT-DST algorithm for transformer neural network-Reference-Cited by-同舟云学术

Real block-circulant matrices and DCT-DST algorithm for transformer neural network

Published:2023-12-12 Issue: Volume:9 Page:
ISSN:2297-4687
Container-title:Frontiers in Applied Mathematics and Statistics
language:
Short-container-title:Front. Appl. Math. Stat.

Author:

Asriani Euis,Muchtadi-Alamsyah Intan,Purwarianti Ayu

Abstract

In the encoding and decoding process of transformer neural networks, a weight matrix-vector multiplication occurs in each multihead attention and feed forward sublayer. Assigning the appropriate weight matrix and algorithm can improve transformer performance, especially for machine translation tasks. In this study, we investigate the use of the real block-circulant matrices and an alternative to the commonly used fast Fourier transform (FFT) algorithm, namely, the discrete cosine transform–discrete sine transform (DCT-DST) algorithm, to be implemented in a transformer. We explore three transformer models that combine the use of real block-circulant matrices with different algorithms. We start from generating two orthogonal matrices, U and Q. The matrix U is spanned by the combination of the reals and imaginary parts of eigenvectors of the real block-circulant matrix, whereas Q is defined such that the matrix multiplication QU can be represented in the shape of a DCT-DST matrix. The final step is defining the Schur form of the real block-circulant matrix. We find that the matrix-vector multiplication using the DCT-DST algorithm can be defined by assigning the Kronecker product between the DCT-DST matrix and an orthogonal matrix in the same order as the dimension of the circulant matrix that spanned the real block circulant. According to the experiment's findings, the dense-real block circulant DCT-DST model with largest matrix dimension was able to reduce the number of model parameters up to 41%. The same model of 128 matrix dimension gained 26.47 of BLEU score, higher compared to the other two models on the same matrix dimensions.

Publisher

Frontiers Media SA

Subject

Applied Mathematics,Statistics and Probability

Reference37 articles.

1. Generalized Inverses: Theory and Computations

2. Symmetric tensors and symmetric tensor rank;Comon;SIAM J Matrix Anal Appl,2008

3. Structured Matrices and Polynomials

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. On Block g-Circulant Matrices with Discrete Cosine and Sine Transforms for Transformer-Based Translation Machine;Mathematics;2024-05-29