A Comprehensive Survey of Transformers for Computer Vision-Reference-Cited by-同舟云学术

A Comprehensive Survey of Transformers for Computer Vision

Published:2023-04-25 Issue:5 Volume:7 Page:287
ISSN:2504-446X
Container-title:Drones
language:en
Short-container-title:Drones

Author:

Jamil Sonain¹^ORCID,Jalil Piran Md.²^ORCID,Kwon Oh-Jin¹^ORCID

Affiliation:

1. Department of Electronics Engineering, Sejong University, Seoul 05006, Republic of Korea

2. Department of Computer Engineering, Sejong University, Seoul 05006, Republic of Korea

Abstract

As a special type of transformer, vision transformers (ViTs) can be used for various computer vision (CV) applications. Convolutional neural networks (CNNs) have several potential problems that can be resolved with ViTs. For image coding tasks such as compression, super-resolution, segmentation, and denoising, different variants of ViTs are used. In our survey, we determined the many CV applications to which ViTs are applicable. CV applications reviewed included image classification, object detection, image segmentation, image compression, image super-resolution, image denoising, anomaly detection, and drone imagery. We reviewed the state of the-art and compiled a list of available models and discussed the pros and cons of each model.

Publisher

MDPI AG

Subject

Artificial Intelligence,Computer Science Applications,Aerospace Engineering,Information Systems,Control and Systems Engineering

Link

https://www.mdpi.com/2504-446X/7/5/287/pdf

Reference175 articles.

1. Heo, B., Yun, S., Han, D., Chun, S., Choe, J., and Oh, S.J. (2021). Proceedings of the IEEE/CVF International Conference on Computer Vision, IEEE.

2. Tenney, I., Das, D., and Pavlick, E. (2019). BERT rediscovers the classical NLP pipeline. arXiv.

3. GPT-3: Its nature, scope, limits, and consequences;Floridi;Minds Mach.,2020

4. Imagenet classification with deep convolutional neural networks;Krizhevsky;Commun. ACM,2017

5. Jamil, S., Rahman, M., Ullah, A., Badnava, S., Forsat, M., and Mirjavadi, S.S. (2020). Malicious UAV detection using integrated audio and visual features for public safety applications. Sensors, 20.

Cited by 21 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Linear time shrinking-SL(t)-ViT approach for brain tumor identification and categorization;IETE Journal of Research;2024-08-28

2. Backbones-review: Feature extractor networks for deep learning and deep reinforcement learning approaches in computer vision;Computer Science Review;2024-08

3. Using transformers for multimodal emotion recognition: Taxonomies and state of the art review;Engineering Applications of Artificial Intelligence;2024-07

4. Introducing PneumNet—A Groundbreaking Dual Version Deep Learning Model for Pneumonia Disease Detection;International Journal of Imaging Systems and Technology;2024-06-19

5. Sensor fusion with multi-modal ground sensor network for endangered animal protection in large areas;Signal Processing, Sensor/Information Fusion, and Target Recognition XXXIII;2024-06-07