Semantic-Aware Fusion Network Based on Super-Resolution
Author:
Xu Lingfeng1, Zou Qiang123
Affiliation:
1. School of Microelectronics, Tianjin University, Tianjin 300072, China 2. Tianjin International Joint Research Center for Internet of Things, Tianjin 300072, China 3. Tianjin Key Laboratory of Imaging and Sensing Microelectronic Technology, Tianjin University, Tianjin 300072, China
Abstract
The aim of infrared and visible image fusion is to generate a fused image that not only contains salient targets and rich texture details, but also facilitates high-level vision tasks. However, due to the hardware limitations of digital cameras and other devices, there are more low-resolution images in the existing datasets, and low-resolution images are often accompanied by the problem of losing details and structural information. At the same time, existing fusion algorithms focus too much on the visual quality of the fused images, while ignoring the requirements of high-level vision tasks. To address the above challenges, in this paper, we skillfully unite the super-resolution network, fusion network and segmentation network, and propose a super-resolution-based semantic-aware fusion network. First, we design a super-resolution network based on a multi-branch hybrid attention module (MHAM), which aims to enhance the quality and details of the source image, enabling the fusion network to integrate the features of the source image more accurately. Then, a comprehensive information extraction module (STDC) is designed in the fusion network to enhance the network’s ability to extract finer-grained complementary information from the source image. Finally, the fusion network and segmentation network are jointly trained to utilize semantic loss to guide the semantic information back to the fusion network, which effectively improves the performance of the fused images on high-level vision tasks. Extensive experiments show that our method is more effective than other state-of-the-art image fusion methods. In particular, our fused images not only have excellent visual perception effects, but also help to improve the performance of high-level vision tasks.
Reference70 articles.
1. Lu, Y., Wu, Y., Liu, B., Zhang, T., Li, B., Chu, Q., and Yu, N. (2020, January 13–19). Cross-modality person re-identification with shared-specific feature transfer. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, DC, USA. 2. Pedestrian detection with unsupervised multispectral feature learning using deep neural networks;Cao;Inf. Fusion,2019 3. Li, C., Zhu, C., Huang, Y., Tang, J., and Wang, L. (2018, January 6). Cross-modal ranking with soft consistency and noisy labels for robust RGB-T tracking. Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland. 4. Ha, Q., Watanabe, K., Karasawa, T., Ushiku, Y., and Harada, T. (2017, January 24–28). MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada. 5. Infrared and visible image fusion methods and applications: A survey;Ma;Inf. Fusion,2019
|
|