CRTED: Few-Shot Object Detection via Correlation-RPN and Transformer Encoder

CRTED: Few-Shot Object Detection via Correlation-RPN and Transformer Encoder–Decoder

Published:2024-05-10 Issue:10 Volume:13 Page:1856
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Chen Jinlong¹,Xu Kejian¹,Ning Yi²,Jiang Lianyuan¹,Xu Zhi¹

Affiliation:

1. School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541000, China

2. School of Continuing Education, Guilin University of Electronic Technology, Guilin 541000, China

Abstract

Few-shot object detection (FSOD) aims to address the challenge of requiring a substantial number of annotations for training in conventional object detection, which is very labor-intensive. However, the existing few-shot methods achieve high precision with the sacrifice of time for exhaustive fine-tuning or have poor performance in novel-class adaptation. We presume the major reason is that the valuable correlation feature among different categories is insufficiently exploited, hindering the generalization of knowledge from base to novel categories for object detection. In this paper, we propose few-shot object detection via Correlation-RPN and transformer encoder–decoder (CRTED), a novel training network to learn object-relevant features of inter-class correlation and intra-class compactness while suppressing object-agnostic features in the background with limited annotated samples. And we also introduce a four-way tuple-contrast training strategy to positively activate the training progress of our object detector. Experiments over two few-shot benchmarks (Pascal VOC, MS-COCO) demonstrate that our proposed CRTED without further fine-tuning can achieve comparable performance with current state-of-the-art fine-tuned works. The codes and pre-trained models will be released.

Funder

Guangxi Science and Technology Development Project

Guilin Science and Technology Plan Project

Publisher

MDPI AG

Link

https://www.mdpi.com/2079-9292/13/10/1856/pdf

Reference47 articles.

1. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020, January 23–28). End-to-end object detection with transformers. Proceedings of the European Conference on Computer Vision, Glasgow, UK.

2. Han, G., Zhang, X., and Li, C. (2018, January 22–26). Semi-supervised dff: Decoupling detection and feature flow for video object detectors. Proceedings of the 26th ACM International Conference on Multimedia, Seoul, Republic of Korea.

3. Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27–30). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.

4. Sun, P., Zhang, R., Jiang, Y., Kong, T., Xu, C., Zhan, W., Tomizuka, M., Li, L., Yuan, Z., and Wang, C. (2021, January 20–25). Sparse r-cnn: End-to-end object detection with learnable proposals. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.

5. Zhang, H., Li, F., Liu, S., Zhang, L., Su, H., Zhu, J., Ni, L.M., and Shum, H.Y. (2022). Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Advantages and Pitfalls of Dataset Condensation: An Approach to Keyword Spotting with Time-Frequency Representations;Electronics;2024-05-28