Author:
Qian Chengwu,Qian Jiangbo,Wang Chong,Ye Xulun,Zhong Caiming
Abstract
AbstractIn the field of object detection, there is often a high level of occlusion in real scenes, which can very easily interfere with the accuracy of the detector. Currently, most detectors use a convolutional neural network (CNN) as a backbone network, but the robustness of CNNs for detection under cover is poor, and the absence of object pixels makes conventional convolution ineffective in extracting features, leading to a decrease in detection accuracy. To address these two problems, we propose VFN (A Vision Enhancement and Feature Fusion Multiscale Detection Network), which first builds a multiscale backbone network using different stages of the Swin Transformer, and then utilizes a vision enhancement module using dilated convolution to enhance the vision of feature points at different scales and address the problem of missing pixels. Finally, the feature guidance module enables features at each scale to be enhanced by fusing with each other. The total accuracy demonstrated by VFN on both the PASCAL VOC dataset and the CrowdHuman dataset is better than that of other methods, and its ability to find occluded objects is also better, demonstrating the effectiveness of our method.The code is available at https://github.com/qcw666/vfn.
Publisher
Springer Science and Business Media LLC
Reference42 articles.
1. Pauls, J.H., Schmidt, B., & Stiller, C.: Automatic mapping of tailored landmark representations for automated driving and map learning. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 6725–6731. IEEE (2021)
2. Chen, F., Lu, Y., Li, Y., & Xie, X.: Real-time active detection of targets and path planning using UAVs. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 391–397. IEEE (2021)
3. Simonyan, K., & Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
4. He, K., Zhang, X., Ren, S., & Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
5. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A.C.: Ssd: single shot multibox detector. In: Proceedings of the European Conference on Computer Vision (2016)