Abstract
Abstract
Objective. Existing registration networks based on cross-attention design usually divide the image pairs to be registered into patches for input. The division and merging operations of a series of patches are difficult to maintain the topology of the deformation field and reduce the interpretability of the network. Therefore, our goal is to develop a new network architecture based on a cross-attention mechanism combined with a multi-resolution strategy to improve the accuracy and interpretability of medical image registration. Approach. We propose a new deformable image registration network NCNet based on neighborhood cross-attention combined with multi-resolution strategy. The network structure mainly consists of a multi-resolution feature encoder, a multi-head neighborhood cross-attention module and a registration decoder. The hierarchical feature extraction capability of our encoder is improved by introducing large kernel parallel convolution blocks; the cross-attention module based on neighborhood calculation is used to reduce the impact on the topology of the deformation field and double normalization is used to reduce its computational complexity. Main result. We performed atlas-based registration and inter-subject registration tasks on the public 3D brain magnetic resonance imaging datasets LPBA40 and IXI respectively. Compared with the popular VoxelMorph method, our method improves the average DSC value by 7.9% and 3.6% on LPBA40 and IXI. Compared with the popular TransMorph method, our method improves the average DSC value by 4.9% and 1.3% on LPBA40 and IXI. Significance. We proved the advantages of the neighborhood attention calculation method compared to the window attention calculation method based on partitioning patches, and analyzed the impact of the pyramid feature encoder and double normalization on network performance. This has made a valuable contribution to promoting the further development of medical image registration methods.