Abstract
Recent progress in single-pixel imaging (SPI) has exhibited remarkable performance using deep neural networks, e.g., convolutional neural networks (CNNs) and vision Transformers (ViTs). Nonetheless, it is challenging for existing methods to well model object image from single-pixel detections that have a long-range dependency, where CNNs are constrained by their local receptive fields, and ViTs suffer from high quadratic complexity of attention mechanism. Inspired by the Mamba architecture, known for its proficiency in handling long sequences and global contextual information with enhanced computational efficiency as state space models (SSMs), we propose a hybrid network of CNN and Mamba for SPI, named CMSPI. The proposed CMSPI integrates the local feature extraction capability of convolutional layers with the abilities of SSMs for efficiently capturing the long-range dependency, and the design of complementary split-concat structure, depthwise separable convolution, and residual connection enhance learning power of network model. Besides, CMSPI adopts a two-step training strategy, which makes reconstruction performance better and hardware-friendly. Simulations and real experiments demonstrate that CMSPI has higher imaging quality, lower memory consumption, and less computational burden than the state-of-the-art SPI methods.
Funder
Natural Science Foundation of Shandong Province
National Natural Science Foundation of China
National Key Research and Development Program of China