On the Co-Selection of Vision Transformer Features and Images for Very High-Resolution Image Scene Classification-Reference-Cited by-同舟云学术

On the Co-Selection of Vision Transformer Features and Images for Very High-Resolution Image Scene Classification

Published:2022-11-17 Issue:22 Volume:14 Page:5817
ISSN:2072-4292
Container-title:Remote Sensing
language:en
Short-container-title:Remote Sensing

Author:

Chaib Souleyman^ORCID,Mansouri Dou El Kefel,Omara Ibrahim^ORCID,Hagag Ahmed^ORCID,Dhelim Sahraoui^ORCID,Bensaber Djamel Amar^ORCID

Abstract

Recent developments in remote sensing technology have allowed us to observe the Earth with very high-resolution (VHR) images. VHR imagery scene classification is a challenging problem in the field of remote sensing. Vision transformer (ViT) models have achieved breakthrough results in image recognition tasks. However, transformer–encoder layers encode different levels of features, where the latest layer represents semantic information, in contrast to the earliest layers, which contain more detailed data but ignore the semantic information of an image scene. In this paper, a new deep framework is proposed for VHR scene understanding by exploring the strengths of ViT features in a simple and effective way. First, pre-trained ViT models are used to extract informative features from the original VHR image scene, where the transformer–encoder layers are used to generate the feature descriptors of the input images. Second, we merged the obtained features as one signal data set. Third, some extracted ViT features do not describe well the image scenes, such as agriculture, meadows, and beaches, which could negatively affect the performance of the classification model. To deal with this challenge, we propose a new algorithm for feature- and image selection. Indeed, this gives us the possibility of eliminating the less important features and images, as well as those that are abnormal; based on the similarity of preserving the whole data set, we selected the most informative features and important images by dropping the irrelevant images that degraded the classification accuracy. The proposed method was tested on three VHR benchmarks. The experimental results demonstrate that the proposed method outperforms other state-of-the-art methods.

Publisher

MDPI AG

Subject

General Earth and Planetary Sciences

Link

https://www.mdpi.com/2072-4292/14/22/5817/pdf

Reference55 articles.

1. Modeling the shape of the scene: A holistic representation of the spatial envelope;Int. J. Comput. Vis.,2001

2. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns;IEEE Trans. Pattern Anal. Mach. Intell.,2002

3. Color indexing;Int. J. Comput. Vis.,1991

4. Lowe, D.G. (1999, January 20–27). Object recognition from local scale-invariant features. Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece.

5. Dalal, N., and Triggs, B. (2005, January 20–25). Histograms of oriented gradients for human detection. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA.

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. SingleS2R: Single sample driven Sim-to-Real transfer for Multi-Source Visual-Tactile Information Understanding using multi-scale vision transformers;Information Fusion;2024-08

2. Local feature acquisition and global context understanding network for very high-resolution land cover classification;Scientific Reports;2024-06-01

3. Local feature matching from detector-based to detector-free: a survey;Applied Intelligence;2024-03

4. Remote sensing traffic scene retrieval based on learning control algorithm for robot multimodal sensing information fusion and human-machine interaction and collaboration;Frontiers in Neurorobotics;2023-10-11

5. A critical survey of GEOBIA methods for forest image detection and classification;Geocarto International;2023-09-13