Author:
Yao Xiaoling,Hu Lihua,Zhang Jifu
Abstract
AbstractDigitalization of ancient architectures is one of the effective means for the preservation of heritage structures, with 3D reconstruction based on computer vision being a key component of such digitalization techniques. However, Chinese ancient architectures are located in mountainous areas, and existing 3D reconstruction methods fall short in restoring the local structures of these architectures. This paper proposes a self-attention-guided unsupervised single image-based depth estimation method, providing innovative technical support for the reconstruction of local structures in Chinese ancient architectures. First, an attention module is constructed based on features extracted from architectural images learned by the encoder, and then embedded into the encoder-decoder to capture the interdependencies across local features. Second, a disparity map is generated using the loss constraint network, including reconstruction matching, smoothness of the disparity, and left-right disparity consistency. Third, an unsupervised architecture based on binocular image pairs is constructed to remove any potential adverse effects due to unknown scale or estimated pose errors. Finally, with the known baseline distance and camera focal length, the disparity map is converted into the depth map to perform the end-to-end depth estimation from a single image. Experiments on the our architecture dataset validates our method, and it performs well also well on KITTI.
Funder
National Natural Science Foundation of China
Publisher
Springer Science and Business Media LLC