A survey: object detection methods from CNN to transformer-Reference-Cited by-同舟云学术

A survey: object detection methods from CNN to transformer

Published:2022-10-21 Issue: Volume: Page:
ISSN:1380-7501
Container-title:Multimedia Tools and Applications
language:en
Short-container-title:Multimed Tools Appl

Author:

Arkin Ershat,Yadikar Nurbiya,Xu Xuebin,Aysa Alimjan,Ubul Kurban^ORCID

Abstract

AbstractObject detection is the most important problem in computer vision tasks. After AlexNet proposed, based on Convolutional Neural Network (CNN) methods have become mainstream in the computer vision field, many researches on neural networks and different transformations of algorithm structures have appeared. In order to achieve fast and accurate detection effects, it is necessary to jump out of the existing CNN framework and has great challenges. Transformer’s relatively mature theoretical support and technological development in the field of Natural Language Processing have brought it into the researcher’s sight, and it has been proved that Transformer’s method can be used for computer vision tasks, and proved that it exceeds the existing CNN method in some tasks. In order to enable more researchers to better understand the development process of object detection methods, existing methods, different frameworks, challenging problems and development trends, paper introduced historical classic methods of object detection used CNN, discusses the highlights, advantages and disadvantages of these algorithms. By consulting a large amount of paper, the paper compared different CNN detection methods and Transformer detection methods. Vertically under fair conditions, 13 different detection methods that have a broad impact on the field and are the most mainstream and promising are selected for comparison. The comparative data gives us confidence in the development of Transformer and the convergence between different methods. It also presents the recent innovative approaches to using Transformer in computer vision tasks. In the end, the challenges, opportunities and future prospects of this field are summarized.

Funder

National Key Research and Development Program of China

National Natural Science Foundation of China

Publisher

Springer Science and Business Media LLC

Subject

Computer Networks and Communications,Hardware and Architecture,Media Technology,Software

Link

https://link.springer.com/content/pdf/10.1007/s11042-022-13801-3.pdf

Reference98 articles.

1. Arkin E, Yadikar N, Muhtar Y, Ubul K (2021) "A Survey of Object Detection Based on CNN and Transformer," 2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML), pp. 99–108, https://doi.org/10.1109/PRML52754.2021.9520732.

2. Bochkovskiy, A, Wang, CY, Liao, HYM (2020) Yolov4: Optimal speed and accuracy of object detection. https://doi.org/10.48550/arXiv.2004.10934.

3. Brock, A, Donahue, J, Simonyan, K (2018) Large scale GAN training for high fidelity natural image synthesis. https://doi.org/10.48550/arXiv.1809.11096.

4. Cai, Z, Fan, Q, Feris, RS, Vasconcelos, N (2016) A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds) Computer Vision – ECCV 2016. ECCV 2016. Lecture notes in computer science(), vol 9908. Springer, Cham. https://doi.org/10.1007/978-3-319-46493-0_22.

5. Cao Y, Chen K, Loy CC, Lin D (2020) "Prime Sample Attention in Object Detection," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11580–11588, https://doi.org/10.1109/CVPR42600.2020.01160.

Cited by 54 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. CFSPT: A lightweight cross-machine model for compound fault diagnosis of machine-level motors;Information Fusion;2024-11

2. Vision transformer promotes cancer diagnosis: A comprehensive review;Expert Systems with Applications;2024-10

3. Lite-YOLOv8: a more lightweight algorithm for Tubercle Bacilli detection;Medical & Biological Engineering & Computing;2024-09-12

4. Cigarette Detection in Images Based on YOLOv8;Sakarya University Journal of Computer and Information Sciences;2024-08-31

5. H2P×PKD: Progressive Training Pipeline with Knowledge Distillation for Lightweight Backbones in Pedestrian Detection;2024 International Conference on Multimedia Analysis and Pattern Recognition (MAPR);2024-08-15