Author:
Li Fei,Yan Hongping,Shi Linsu
Abstract
AbstractThe application of deep neural network has achieved remarkable success in object detection. However, the network structures should be still evolved consistently and tuned finely to acquire better performance. This gears to the continuous demands on high performance in those complex scenes, where multi-scale objects to be detected are located here and there. To this end, this paper proposes a network structure called Multi-Scale Coupled Attention (MSCA) under the framework of self-attention learning with methodologies of importance assessment. Architecturally, it consists of a Multi-Scale Coupled Channel Attention (MSCCA) module, and a Multi-Scale Coupled Spatial Attention (MSCSA) module. Specifically, the MSCCA module is developed to achieve the goal of self-attention learning linearly on the multi-scale channels. In parallel, the MSCSA module is constructed to achieve this goal nonlinearly on the multi-scale spatial grids. The MSCCA and MSSCA modules can be connected together into a sequence, which can be used as a plugin to develop end-to-end learning models for object detection. Finally, our proposed network is compared on two public datasets with 13 classical or state-of-the-art models, including the Faster R-CNN, Cascade R-CNN, RetinaNet, SSD, PP-YOLO, YOLO v3, YOLO v5, YOLO v7, YOLOX, DETR, conditional DETR, UP-DETR and FP-DETR. Comparative experimental results with numerical scores, the ablation study, and the performance behaviour all demonstrate the effectiveness of our proposed model.
Publisher
Springer Science and Business Media LLC
Reference62 articles.
1. Viola, P. A., & Jones, M. J. Rapid object detection using a boosted cascade of simple features. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition 511–518 (2001).
2. Viola, P. A., & Jones, M. J. Robust real-time face detection. In IEEE International Conference on Computer Vision 137–154 (2001).
3. Dalal, N., & Triggs, B. Histograms of oriented gradients for human detection. In IEEE/CVF International Conference on Computer Vision and Pattern Recognition 886–893 (2005).
4. Felzenszwalb, P. F., Mcallester, D. A., & Ramanan, D. A discriminatively trained, multiscale, deformable part model. In IEEE/CVF International Conference on Computer Vision and Pattern Recognition 1–8 (2008).
5. Felzenszwalb, P. F., Girshick, R. B., & Mcallester, D. A.: Cascade object detection with deformable part models. In IEEE/CVF International Conference on Computer Vision and Pattern Recognition 2241–2248 (2010).
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献