1. Ahn, D., Kim, S., Hong, H., Ko, B.C., 2023. STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 3330–3339.
2. Beit: Bert pre-training of image transformers;Bao,2021
3. Transformerfusion: Monocular rgb scene reconstruction using transformers;Bozic;Adv. Neural Inf. Process. Syst.,2021
4. Language models are few-shot learners;Brown;Adv. Neural Inf. Process. Syst.,2020
5. Cascade R-CNN: High quality object detection and instance segmentation;Cai;IEEE Trans. Pattern Anal. Mach. Intell.,2019