FlexiViT: One Model for All Patch Sizes-Reference-Cited by-同舟云学术

FlexiViT: One Model for All Patch Sizes

Published:2023-06 Issue: Volume: Page:
ISSN:
Container-title:2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
language:
Short-container-title:

Author:

Beyer Lucas¹,Izmailov Pavel¹,Kolesnikov Alexander²,Caron Mathilde²,Kornblith Simon²,Zhai Xiaohua²,Minderer Matthias²,Tschannen Michael²,Alabdulmohsin Ibrahim²,Pavetic Filip²

Affiliation:

1. Google Research

2. Google Research, Brain Team

Publisher

IEEE

Link

http://xplorestaging.ieee.org/ielx7/10203037/10203050/10205121.pdf?arnumber=10205121

Reference67 articles.

1. The Cityscapes Dataset for Semantic Urban Scene Understanding

2. Not all images are worth 16×16 words: Dynamic transformers for efficient image recognition;wang;Advances in Neural Information Processing Systems 34 Annual Conference on Neural Information Processing Systems 2021 NeurIPS 2021,2021

3. On the Efficacy of Knowledge Distillation

4. Going deeper with Image Transformers

5. An image is worth 16×16 words: Transformers for image recognition at scale;dosovitskiy;International Conference on Learning Representations,0

Cited by 14 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. STFormer: Spatio‐temporal former for hand–object interaction recognition from egocentric RGB video;Electronics Letters;2024-09

2. Enhancing Skin Cancer Diagnosis Using Swin Transformer with Hybrid Shifted Window-Based Multi-head Self-attention and SwiGLU-Based MLP;Journal of Imaging Informatics in Medicine;2024-06-05

3. Variable Temporal Length Training for Action Recognition CNNs;Sensors;2024-05-25

4. Cross-modal attention network for retinal disease classification based on multi-modal images;Biomedical Optics Express;2024-05-14

5. From Coarse to Fine: Efficient Training for Audio Spectrogram Transformers;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14