An Efficient Transformer Decoder with Compressed Sub-layers-Reference-Cited by-同舟云学术

An Efficient Transformer Decoder with Compressed Sub-layers

Published:2021-05-18 Issue:15 Volume:35 Page:13315-13323
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Li Yanyang,Lin Ye,Xiao Tong,Zhu Jingbo

Abstract

The large attention-based encoder-decoder network (Transformer) has become prevailing recently due to its effectiveness. But the high computation complexity of its decoder raises the inefficiency issue. By examining the mathematic formulation of the decoder, we show that under some mild conditions, the architecture could be simplified by compressing its sub-layers, the basic building block of Transformer, and achieves a higher parallelism. We thereby propose Compressed Attention Network, whose decoder layer consists of only one sub-layer instead of three. Extensive experiments on 14 WMT machine translation tasks show that our model is 1.42x faster with performance on par with a strong baseline. This strong baseline is already 2x faster than the widely used standard baseline without loss in performance.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A K-Shape Clustering Based Transformer-Decoder Model for Predicting Multi-Step Potentials of Urban Mobility Field;IEEE Transactions on Intelligent Transportation Systems;2024-08

2. EDT: An EEG-based attention model for feature learning and depression recognition;Biomedical Signal Processing and Control;2024-07

3. Next generation of computer vision for plant disease monitoring in precision agriculture: A contemporary survey, taxonomy, experiments, and future direction;Information Sciences;2024-04

4. Multi-Timescale Load Forecasting Based on OWA Optimization of VMD Combined with Informer;2023 2nd Asia Power and Electrical Technology Conference (APET);2023-12-28

5. PET: Parameter-efficient Knowledge Distillation on Transformer;PLOS ONE;2023-07-06