INSANet: INtra-INter Spectral Attention Network for Effective Feature Fusion of Multispectral Pedestrian Detection
Author:
Lee Sangin1ORCID, Kim Taejoo2ORCID, Shin Jeongmin2ORCID, Kim Namil3ORCID, Choi Yukyung2ORCID
Affiliation:
1. Department of Software, Sejong University, Seoul 05006, Republic of Korea 2. Department of Convergence Engineering for Intelligent Drone, Sejong University, Seoul 05006, Republic of Korea 3. NAVER LABS, Seongnam 13561, Republic of Korea
Abstract
Pedestrian detection is a critical task for safety-critical systems, but detecting pedestrians is challenging in low-light and adverse weather conditions. Thermal images can be used to improve robustness by providing complementary information to RGB images. Previous studies have shown that multi-modal feature fusion using convolution operation can be effective, but such methods rely solely on local feature correlations, which can degrade the performance capabilities. To address this issue, we propose an attention-based novel fusion network, referred to as INSANet (INtra-INter Spectral Attention Network), that captures global intra- and inter-information. It consists of intra- and inter-spectral attention blocks that allow the model to learn mutual spectral relationships. Additionally, we identified an imbalance in the multispectral dataset caused by several factors and designed an augmentation strategy that mitigates concentrated distributions and enables the model to learn the diverse locations of pedestrians. Extensive experiments demonstrate the effectiveness of the proposed methods, which achieve state-of-the-art performance on the KAIST dataset and LLVIP dataset. Finally, we conduct a regional performance evaluation to demonstrate the effectiveness of our proposed network in various regions.
Funder
Ministry of Science and ICT
Reference47 articles.
1. Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16–21). Are we ready for autonomous driving? the kitti vision benchmark suite. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA. 2. Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., and Beijbom, O. (2020, January 19–25). nuScenes: A multimodal dataset for autonomous driving. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual. 3. Scene-specific pedestrian detection for static video surveillance;Wang;IEEE Trans. Pattern Anal. Mach. Intell.,2013 4. Du, D., Zhu, P., Wen, L., Bian, X., Lin, H., Hu, Q., Peng, T., Zheng, J., Wang, X., and Zhang, Y. (2019, January 27–28). VisDrone-DET2019: The vision meets drone object detection in image challenge results. Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Repbulic of Korea. 5. Hwang, S., Park, J., Kim, N., Choi, Y., and So Kweon, I. (2015, January 7–12). Multispectral pedestrian detection: Benchmark dataset and baseline. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|