IntelPVT: intelligent patch-based pyramid vision transformers for object detection and classification
-
Published:2023-10-27
Issue:
Volume:
Page:
-
ISSN:1868-8071
-
Container-title:International Journal of Machine Learning and Cybernetics
-
language:en
-
Short-container-title:Int. J. Mach. Learn. & Cyber.
Author:
Nimma Divya, Zhou ZhaoxianORCID
Abstract
AbstractSince the advent of Transformers followed by Vision Transformers (ViTs), enormous success has been achieved by researchers in the field of computer vision and object detection. The difficulty mechanism of splitting images into fixed patches posed a serious challenge in this arena and resulted in loss of useful information at the time of object detection and classification. To overcome the challengers, we propose an innovative Intelligent-based patching mechanism and integrated it seamlessly into the conventional Patch-based ViT framework. The proposed method enables the utilization of patches with flexible sizes to capture and retain essential semantic content from input images and therefore increases the performance compared with conventional methods. Our method was evaluated with three renowned datasets Microsoft Common Objects in Context (MSCOCO-2017), Pascal VOC (Visual Object Classes Challenge) and Cityscapes upon object detection and classification. The experimental results showed promising improvements in specific metrics, particularly in higher confidence thresholds, making it a notable performer in object detection and classification tasks.
Publisher
Springer Science and Business Media LLC
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Software
Reference27 articles.
1. Bhojanapalli, S., Chakrabarti, A., Glasner, D., Li, D., Unterthiner, T., Veit, A. (2021) Understanding robustness of transformers for image classification. CoRR abs/2103.14586 https://arxiv.org/abs/2103.14586 2. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: Vedaldi A, Bischof H, Brox T, Frahm J-M (eds) Computer Vision – ECCV 2020. Springer, Cham, pp 213–229 3. Strudel, R., Garcia, R., Laptev, I., Schmid, C. (2021) Segmentary: Transformer for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 7262–7272 4. Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y. (2017) Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 5. Yuan, L., Chen, Y., Wang, T., Yu, W., Shi, Y., Tay, F.E.H., Feng, J., Yan, S. (2021) Tokens-to-token vit: Training vision transformers from scratch on ImageNet. CoRR abs/2101.11986 https://arxiv.org/abs/2101. 11986
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|