MSCAC: A Multi-Scale Swin–CNN Framework for Progressive Remote Sensing Scene Classification
-
Published:2024-07-29
Issue:3
Volume:4
Page:462-480
-
ISSN:2673-7086
-
Container-title:Geographies
-
language:en
-
Short-container-title:Geographies
Author:
Solomon A. Arun1, Agnes S. Akila2ORCID
Affiliation:
1. Department of Civil Engineering, GMR Institute of Technology, Rajam 532127, India 2. Department of Computer Science and Engineering, GMR Institute of Technology, Rajam 532127, India
Abstract
Recent advancements in deep learning have significantly improved the performance of remote sensing scene classification, a critical task in remote sensing applications. This study presents a new aerial scene classification model, the Multi-Scale Swin–CNN Aerial Classifier (MSCAC), which employs the Swin Transformer, an advanced architecture that has demonstrated exceptional performance in a range of computer vision applications. The Swin Transformer leverages shifted window mechanisms to efficiently model long-range dependencies and local features in images, making it particularly suitable for the complex and varied textures in aerial imagery. The model is designed to capture intricate spatial hierarchies and diverse scene characteristics at multiple scales. A framework is developed that integrates the Swin Transformer with a multi-scale strategy, enabling the extraction of robust features from aerial images of different resolutions and contexts. This approach allows the model to effectively learn from both global structures and fine-grained details, which is crucial for accurate scene classification. The model’s performance is evaluated on several benchmark datasets, including UC-Merced, WHU-RS19, RSSCN7, and AID, where it demonstrates a superior or comparable accuracy to state-of-the-art models. The MSCAC model’s adaptability to varying amounts of training data and its ability to improve with increased data make it a promising tool for real-world remote sensing applications. This study highlights the potential of integrating advanced deep-learning architectures like the Swin Transformer into aerial scene classification, paving the way for more sophisticated and accurate remote sensing systems. The findings suggest that the proposed model has significant potential for various remote sensing applications, including land cover mapping, urban planning, and environmental monitoring.
Reference45 articles.
1. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Adv. Neural Inf. Process. Syst., 30, Available online: https://papers.nips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html. 2. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv. 3. Bazi, Y., Bashmal, L., Al Rahhal, M.M., Al Dayil, R., and Al Ajlan, N. (2021). Vision transformers for remote sensing image classification. Remote Sens., 13. 4. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11–17). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada. 5. Color indexing;Swain;Int. J. Comput. Vis.,1991
|
|