CAE-Net: Cross-Modal Attention Enhancement Network for RGB-T Salient Object Detection
-
Published:2023-02-14
Issue:4
Volume:12
Page:953
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Lv Chengtao1ORCID, Wan Bin1, Zhou Xiaofei1ORCID, Sun Yaoqi12, Hu Ji12, Zhang Jiyong1ORCID, Yan Chenggang1
Affiliation:
1. School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China 2. Lishui Institute of Hangzhou Dianzi University, Lishui 323000, China
Abstract
RGB salient object detection (SOD) performs poorly in low-contrast and complex background scenes. Fortunately, the thermal infrared image can capture the heat distribution of scenes as complementary information to the RGB image, so the RGB-T SOD has recently attracted more and more attention. Many researchers have committed to accelerating the development of RGB-T SOD, but some problems still remain to be solved. For example, the defective sample and interfering information contained in the RGB or thermal image hinder the model from learning proper saliency features, meanwhile the low-level features with noisy information result in incomplete salient objects or false positive detection. To solve these problems, we design a cross-modal attention enhancement network (CAE-Net). First, we concretely design a cross-modal fusion (CMF) module to fuse cross-modal features, where the cross-attention unit (CAU) is employed to enhance the two modal features, and channel attention is used to dynamically weigh and fuse the two modal features. Then, we design the joint-modality decoder (JMD) to fuse cross-level features, where the low-level features are purified by higher level features, and multi-scale features are sufficiently integrated. Besides, we add two single-modality decoder (SMD) branches to preserve more modality-specific information. Finally, we employ a multi-stream fusion (MSF) module to fuse three decoders’ features. Comprehensive experiments are conducted on three RGB-T datasets, and the results show that our CAE-Net is comparable to the other methods.
Funder
National Key Research and Development Program of China Fundamental Research Funds for the Provincial Universities of Zhejiang National Natural Science Foundation of China “Pioneer” and “Leading Goose” R&D Program of Zhejiang Province Zhejiang Province Nature Science Foundation of China Hangzhou Dianzi University (HDU) and the China Electronics Corporation DATA (CECDATA) Joint Research Center of Big Data Technologies 111 Project
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference65 articles.
1. Liu, J., Gong, S., Guan, W., Li, B., Li, H., and Liu, J. (2020). Tracking and Localization based on Multi-angle Vision for Underwater Target. Electronics, 9. 2. Tang, L., Sun, K., Huang, S., Wang, G., and Jiang, K. (2022). Quality Assessment of View Synthesis Based on Visual Saliency and Texture Naturalness. Electronics, 11. 3. Ji, L., Hu, X., and Wang, M. (2018). Saliency Preprocessing Locality-Constrained Linear Coding for Remote Sensing Scene Classification. Electronics, 7. 4. Duan, C., Liu, Y., Xing, C., and Wang, Z. (2022). Infrared and Visible Image Fusion Using Truncated Huber Penalty Function Smoothing and Visual Saliency Based Threshold Optimization. Electronics, 11. 5. Gradient-based learning applied to document recognition;LeCun;Proc. IEEE,1998
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|