1. Attention is all you need;vaswani;Advances in neural information processing systems,2017
2. Efficient content-based sparse attention with routing transformers;roy,2020
3. Informer: Beyond efficient transformer for long sequence time-series forecasting;zhou;Proc of the Association for the Advancement of Artificial Intelligence,2021
4. Transformers are RNNs: Fast autoregressive transformers with linear attention;katharopoulos;Proceedings of the 37th International Conference on Machine Learning,2020
5. Rethinking attention with performers;choromanski;International Conference on Learning Representations,2021