Object detection using convolutional neural networks and transformer-based models: a review-Reference-Cited by-同舟云学术

Object detection using convolutional neural networks and transformer-based models: a review

Published:2023-11-20 Issue:1 Volume:10 Page:
ISSN:2314-7172
Container-title:Journal of Electrical Systems and Information Technology
language:en
Short-container-title:Journal of Electrical Systems and Inf Technol

Author:

Shah Shrishti,Tembhurne Jitendra^ORCID

Abstract

AbstractTransformer models are evolving rapidly in standard natural language processing tasks; however, their application is drastically proliferating in computer vision (CV) as well. Transformers are either replacing convolution networks or being used in conjunction with them. This paper aims to differentiate the design of convolutional neural networks (CNNs) built models and models based on transformer, particularly in the domain of object detection. CNNs are designed to capture local spatial patterns through convolutional layers, which is well suited for tasks that involve understanding visual hierarchies and features. However, transformers bring a new paradigm to CV by leveraging self-attention mechanisms, which allows to capture both local and global context in images. Here, we target the various aspects such as basic level of understanding, comparative study, application of attention model, and highlighting tremendous growth along with delivering efficiency are presented effectively for object detection task. The main emphasis of this work is to offer basic understanding of architectures for object detection task and motivates to adopt the same in computer vision tasks. In addition, this paper highlights the evolution of transformer-based models in object detection and their growing importance in the field of computer vision, we also identified the open research direction in the same field.

Publisher

Springer Science and Business Media LLC

Subject

General Earth and Planetary Sciences,General Engineering,General Environmental Science

Link

https://link.springer.com/content/pdf/10.1186/s43067-023-00123-z.pdf

Reference70 articles.

1. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

2. Girshick RJCS (2015) Fast R-CNN. arXiv preprint arXiv:1504.08083

3. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, vol 28

4. Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, vol 29

5. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: single shot multibox detector. In: European conference on computer vision. Springer, Cham, pp 21–37

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. AI-powered trustable and explainable fall detection system using transfer learning;Image and Vision Computing;2024-09

2. Identifying rice field weeds from unmanned aerial vehicle remote sensing imagery using deep learning;Plant Methods;2024-07-16

3. Improving Object Detection Accuracy with Self-Training Based on Bi-Directional Pseudo Label Recovery;Electronics;2024-06-07