1. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in Neural Information Processing Systems (2017).https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
2. Kitaev, N., Kaiser, L., Levskaya, A.: Reformer: The Efficient Transformer. International Conference on Learning Representations (2019). https://doi.org/10.1145/3503161.3548409
3. Wu, Z., Liu, Z., Lin, J., Lin, Y., Han, S.: Lite Transformer with Long-Short Range Attention. International conference on learning representations (2019). https://openreview.net/forum?id=ByeMPlHKPH
4. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. international conference on learning representations (2021). https://openreview.net/forum?id=yicbfdntty
5. Tolstikhin, I.O., Houlsby, N., Kolesnikov, A., Beyer, L., Zhai, X., Unterthiner, T., Yung, J., Steiner, A., Keysers, D., Uszkoreit, J., et al.: MLP-Mixer: An all-MLP Architecture for Vision. Advances in Neural Information Processing Systems (2021). https://openreview.net/forum?id=EI2KOXKdnP