Affiliation:
1. College of Electronics and Information Engineering Sichuan University Chengdu China
2. Samsung Electronics 129 Samseong‐ro Yeongtong‐gu Suwon‐si Gyeonggi‐do South Korea
Abstract
AbstractThe infrared and visible fusion technology holds a pivotal position in smart city for cloud and fog computing, particularly in security system. By fusing infrared and visible image information, this technology enhances target identification, tracking and monitoring precision, bolstering overall system security. However, existing deep learning‐based methods rely heavily on convolutional operations, which excel at extracting local features but have limited receptive fields, hampering global information capture. To overcome this difficulty, we introduce GRDATFusion, a novel end‐to‐end network comprising three key modules: transformer, gradient residual dense and attention residual. The gradient residual dense module extracts local complementary features, leveraging a dense‐shaped network to retain potentially lost information. The attention residual module focuses on crucial input image details, while the transformer module captures global information and models long‐range dependencies. Experiments on public datasets show that GRDATFusion outperforms state‐of‐the‐art algorithms in qualitative and quantitative assessments. Ablation studies validate our approach's advantages, and efficiency comparisons demonstrate its computational efficiency. Therefore, our method makes the security systems in smart city with shorter delay and satisfies the real‐time requirement.
Reference55 articles.
1. End-to-End Object Detection with Transformers
2. An image is worth 16×16 words: Transformers for image recognition at scale;Dosovitskiy A.;arXiv,2020
3. PPT Fusion: Pyramid patch transformerfor a case study in image fusion;Fu Y.;ArXiv,2021