1. Non-local Neural Networks
2. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
3. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
4. Pieter Abbeel, and Ashish Vaswani. Bottleneck transformers for visual recognition;srinivas,2021
5. Stand-Alone Self-Attention in Vision Models;ramachandran;Advances in neural information processing systems,2019