S-MAT: Semantic-Driven Masked Attention Transformer for Multi-Label Aerial Image Classification-Reference-Cited by-同舟云学术

S-MAT: Semantic-Driven Masked Attention Transformer for Multi-Label Aerial Image Classification

Published:2022-07-20 Issue:14 Volume:22 Page:5433
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Wu Hongjun^ORCID,Xu Cheng^ORCID,Liu Hongzhe^ORCID

Abstract

Multi-label aerial scene image classification is a long-standing and challenging research problem in the remote sensing field. As land cover objects usually co-exist in an aerial scene image, modeling label dependencies is a compelling approach to improve the performance. Previous methods generally directly model the label dependencies among all the categories in the target dataset. However, most of the semantic features extracted from an image are relevant to the existing objects, making the dependencies among the nonexistant categories unable to be effectively evaluated. These redundant label dependencies may bring noise and further decrease the performance of classification. To solve this problem, we propose S-MAT, a Semantic-driven Masked Attention Transformer for multi-label aerial scene image classification. S-MAT adopts a Masked Attention Transformer (MAT) to capture the correlations among the label embeddings constructed by a Semantic Disentanglement Module (SDM). Moreover, the proposed masked attention in MAT can filter out the redundant dependencies and enhance the robustness of the model. As a result, the proposed method can explicitly and accurately capture the label dependencies. Therefore, our method achieves CF1s of 89.21%, 90.90%, and 88.31% on three multi-label aerial scene image classification benchmark datasets: UC-Merced Multi-label, AID Multi-label, and MLRSNet, respectively. In addition, extensive ablation studies and empirical analysis are provided to demonstrate the effectiveness of the essential components of our method under different factors.

Funder

National Natural Science Foundation of China

Beijing Key Science and Technology Project

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/22/14/5433/pdf

Reference58 articles.

1. Local feature representation based on linear filtering with feature pooling and divisive normalization for remote sensing image classification

2. Remote Sensing Image Scene Classification Based on Global–Local Dual-Branch Structure Model

3. Triplet-Metric-Guided Multi-Scale Attention for Remote Sensing Image Scene Classification with a Convolutional Neural Network

4. Deep residual learning for image recognition;He;Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2016

5. Mobilenetv2: Inverted residuals and linear bottlenecks;Sandler;Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2018

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Label-Driven Graph Convolutional Network for Multilabel Remote Sensing Image Classification;IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing;2024

2. Adjacent-Atrous Mechanism for Expanding Global Receptive Fields: An End-to-End Network for Multiattribute Scene Analysis in Remote Sensing Imagery;IEEE Transactions on Geoscience and Remote Sensing;2024

3. Optimizing Multimodal Scene Recognition through Mutual Information-Based Feature Selection in Deep Learning Models;Applied Sciences;2023-10-29

4. Cross-modality semantic guidance for multi-label image classification;Intelligent Data Analysis;2023-09-14

5. Joint learning networks of low-level and high-level features for multi-label ship recognition in complex backgrounds;Applied Intelligence;2023-07-23