Abstract
To learn discriminative features, hyperspectral image (HSI), containing 3-D cube data, is a preferable means of capturing multi-head self-attention from both spatial and spectral domains if the burden in model optimization and computation is low. In this paper, we design a dual multi-head contextual self-attention (DMuCA) network for HSI classification with the fewest possible parameters and lower computation costs. To effectively capture rich contextual dependencies from both domains, we decouple the spatial and spectral contextual attention into two sub-blocks, SaMCA and SeMCA, where depth-wise convolution is employed to contextualize the input keys in the pure dimension. Thereafter, multi-head local attentions are implemented as group processing when the keys are alternately concatenated with the queries. In particular, in the SeMCA block, we group the spatial pixels by evenly sampling and create multi-head channel attention on each sampling set, to reduce the number of the training parameters and avoid the storage increase. In addition, the static contextual keys are fused with the dynamic attentional features in each block to strengthen the capacity of the model in data representation. Finally, the decoupled sub-blocks are weighted and summed together for 3-D attention perception of HSI. The DMuCA module is then plugged into a ResNet to perform HSI classification. Extensive experiments demonstrate that our proposed DMuCA achieves excellent results over several state-of-the-art attention mechanisms with the same backbone.
Funder
National Natural Science Foundation of China
Natural Science Basic Research Plan in Shaanxi Province of China
Subject
General Earth and Planetary Sciences
Cited by
13 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献