MFST: Multi-Modal Feature Self-Adaptive Transformer for Infrared and Visible Image Fusion-Reference-Cited by-同舟云学术

MFST: Multi-Modal Feature Self-Adaptive Transformer for Infrared and Visible Image Fusion

Published:2022-07-05 Issue:13 Volume:14 Page:3233
ISSN:2072-4292
Container-title:Remote Sensing
language:en
Short-container-title:Remote Sensing

Author:

Liu Xiangzeng^ORCID,Gao Haojie,Miao Qiguang^ORCID,Xi Yue,Ai Yunfeng,Gao Dingguo

Abstract

Infrared and visible image fusion is to combine the information of thermal radiation and detailed texture from the two images into one informative fused image. Recently, deep learning methods have been widely applied in this task; however, those methods usually fuse multiple extracted features with the same fusion strategy, which ignores the differences in the representation of these features, resulting in the loss of information in the fusion process. To address this issue, we propose a novel method named multi-modal feature self-adaptive transformer (MFST) to preserve more significant information about the source images. Firstly, multi-modal features are extracted from the input images by a convolutional neural network (CNN). Then, these features are fused by the focal transformer blocks that can be trained through an adaptive fusion strategy according to the characteristics of different features. Finally, the fused features and saliency information of the infrared image are considered to obtain the fused image. The proposed fusion framework is evaluated on TNO, LLVIP, and FLIR datasets with various scenes. Experimental results demonstrate that our method outperforms several state-of-the-art methods in terms of subjective and objective evaluation.

Funder

Ministry of Science and Technology of the People's Republic of China

Ministry of Education of the People's Republic of China

Publisher

MDPI AG

Subject

General Earth and Planetary Sciences

Link

https://www.mdpi.com/2072-4292/14/13/3233/pdf

Reference54 articles.

1. An Adaptive Fusion Algorithm for Visible and Infrared Videos Based on Entropy and the Cumulative Distribution of Gray Levels

2. Multisensor Image Fusion and Enhancement in Spectral Total Variation Domain