A Coarse-to-Fine Transformer-Based Network for 3D Reconstruction from Non-Overlapping Multi-View Images
-
Published:2024-03-03
Issue:5
Volume:16
Page:901
-
ISSN:2072-4292
-
Container-title:Remote Sensing
-
language:en
-
Short-container-title:Remote Sensing
Author:
Shan Yue1, Xiao Jun1ORCID, Liu Lupeng1ORCID, Wang Yunbiao1, Yu Dongbo1, Zhang Wenniu1
Affiliation:
1. School of Artificial Intelligence, University of Chinese Academy and Sciences, No. 19 Yuquan Road, Shijingshan District, Beijing 100049, China
Abstract
Reconstructing 3D structures from non-overlapping multi-view images is a crucial task in the field of 3D computer vision, since it is difficult to establish feature correspondences and infer depth from overlapping parts of views. Previous methods, whether generating the surface mesh or volume of an object, face challenges in simultaneously ensuring the accuracy of detailed topology and the integrity of the overall structure. In this paper, we introduce a novel coarse-to-fine Transformer-based reconstruction network to generate precise point clouds from multiple input images at sparse and non-overlapping viewpoints. Specifically, we firstly employ a general point cloud generation architecture enhanced by the concept of adaptive centroid constraint for the coarse point cloud corresponding to the object. Subsequently, a Transformer-based refinement module applies deformation to each point. We design an attention-based encoder to encode both image projection features and point cloud geometric features, along with a decoder to calculate deformation residuals. Experiments on ShapeNet demonstrate that our proposed method outperforms other competing methods.
Funder
National Natural Science Foundation of China Beijing Natural Science Foundation China Postdoctoral
Science Foundation the State Key Laboratory of Robotics and Systems the Fundamental Research Funds for the Central Universities
Reference47 articles.
1. Yao, Y., Luo, Z., Li, S., Fang, T., and Quan, L. (2018, January 8–14). Mvsnet: Depth inference for unstructured multi-view stereo. Proceedings of the European Conference on Computer Vision, Munich, Germany. 2. Chen, R., Han, S., Xu, J., and Su, H. (November, January 27). Point-based multi-view stereo network. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea. 3. Li, J., Lu, Z., Wang, Y., Wang, Y., and Xiao, J. (2022, January 10–14). DS-MVSNet: Unsupervised Multi-view Stereo via Depth Synthesis. Proceedings of the ACM International Conference on Multimedia, Lisboa, Portugal. 4. Jia, R., Chen, X., Cui, J., and Hu, Z. (2022). MVS-T: A coarse-to-fine multi-view stereo network with transformer for low-resolution images 3D reconstruction. Sensors, 22. 5. Wen, C., Zhang, Y., Li, Z., and Fu, Y. (November, January 27). Pixel2mesh++: Multi-view 3d mesh generation via deformation. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea.
|
|