1. An image is worth 16x16 words: Transformers for image recognition at scale;dosovitskiy;ArXiv Preprint,2020
2. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
3. Exploring the limits of transfer learning with a unified text-to-text transformer;raffel;J Mach Learn Res,2020
4. Faster r-cnn: Towards real-time object detection with region proposal networks;ren;Advances in neural information processing systems,2015
5. Ssd: Single shot multibox detector;liu;European Conference on Computer Vision,2016