Abstract
Deep learning methods have been widely studied for Polarimetric synthetic aperture radar (PolSAR) land cover classification. The scarcity of PolSAR labeled samples and the small receptive field of the model limit the performance of deep learning methods for land cover classification. In this paper, a vision Transformer (ViT)-based classification method is proposed. The ViT structure can extract features from the global range of images based on a self-attention block. The powerful feature representation capability of the model is equivalent to a flexible receptive field, which is suitable for PolSAR image classification at different resolutions. In addition, because of the lack of labeled data, the Mask Autoencoder method is used to pre-train the proposed model with unlabeled data. Experiments are carried out on the Flevoland dataset acquired by NASA/JPL AIRSAR and the Hainan dataset acquired by the Aerial Remote Sensing System of the Chinese Academy of Sciences. The experimental results on both datasets demonstrate the superiority of the proposed method.
Subject
General Earth and Planetary Sciences
Cited by
22 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献