CF-ViT: A General Coarse-to-Fine Method for Vision Transformer-Reference-Cited by-同舟云学术

CF-ViT: A General Coarse-to-Fine Method for Vision Transformer

Published:2023-06-26 Issue:6 Volume:37 Page:7042-7052
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Chen Mengzhao,Lin Mingbao,Li Ke,Shen Yunhang,Wu Yongjian,Chao Fei,Ji Rongrong

Abstract

Vision Transformers (ViT) have made many breakthroughs in computer vision tasks. However, considerable redundancy arises in the spatial dimension of an input image, leading to massive computational costs. Therefore, We propose a coarse-to-fine vision transformer (CF-ViT) to relieve computational burden while retaining performance in this paper. Our proposed CF-ViT is motivated by two important observations in modern ViT models: (1) The coarse-grained patch splitting can locate informative regions of an input image. (2) Most images can be well recognized by a ViT model in a small-length token sequence. Therefore, our CF-ViT implements network inference in a two-stage manner. At coarse inference stage, an input image is split into a small-length patch sequence for a computationally economical classification. If not well recognized, the informative patches are identified and further re-split in a fine-grained granularity. Extensive experiments demonstrate the efficacy of our CF-ViT. For example, without any compromise on performance, CF-ViT reduces 53% FLOPs of LV-ViT, and also achieves 2.01x throughput. Code of this project is at https://github.com/ChenMnZ/CF-V

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 14 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Cotton Disease Recognition Method in Natural Environment Based on Convolutional Neural Network;Agriculture;2024-09-11

2. Dynamic attention guider network;Computing;2024-07-30

3. Reconstructing 3D Biomedical Architectural Order at Multiple Spatial Scales with Multimodal Stack Input;Journal of Bionic Engineering;2024-06-18

4. MarginFinger: Controlling Generated Fingerprint Distance to Classification boundary Using Conditional GANs;Proceedings of the 2024 International Conference on Multimedia Retrieval;2024-05-30

5. A Novel Approach for Flower Classification using Deep Networks;2024 Second International Conference on Data Science and Information System (ICDSIS);2024-05-17