1. An image is worth 16×16 words: Transformers for image recognition at scale;dosovitskiy;ArXiv Preprint,2020
2. End-to-end object detection with transformers;carion;Computer Vision-ECCV 2020 16th European Conference,2020
3. Scene Graph Generation from Objects, Phrases and Region Captions
4. Faster r-cnn: Towards real-time object detection with region proposal networks;ren;Advances in neural information processing systems,2015
5. Attention is all you need;vaswani;Advances in neural information processing systems,2017