A Coarse-to-Fine Transformer-Based Network for 3D Reconstruction from Non-Overlapping Multi-View Images-Reference-Cited by-同舟云学术

A Coarse-to-Fine Transformer-Based Network for 3D Reconstruction from Non-Overlapping Multi-View Images

Published:2024-03-03 Issue:5 Volume:16 Page:901
ISSN:2072-4292
Container-title:Remote Sensing
language:en
Short-container-title:Remote Sensing

Author:

Shan Yue¹,Xiao Jun¹^ORCID,Liu Lupeng¹^ORCID,Wang Yunbiao¹,Yu Dongbo¹,Zhang Wenniu¹

Affiliation:

1. School of Artificial Intelligence, University of Chinese Academy and Sciences, No. 19 Yuquan Road, Shijingshan District, Beijing 100049, China

Abstract

Reconstructing 3D structures from non-overlapping multi-view images is a crucial task in the field of 3D computer vision, since it is difficult to establish feature correspondences and infer depth from overlapping parts of views. Previous methods, whether generating the surface mesh or volume of an object, face challenges in simultaneously ensuring the accuracy of detailed topology and the integrity of the overall structure. In this paper, we introduce a novel coarse-to-fine Transformer-based reconstruction network to generate precise point clouds from multiple input images at sparse and non-overlapping viewpoints. Specifically, we firstly employ a general point cloud generation architecture enhanced by the concept of adaptive centroid constraint for the coarse point cloud corresponding to the object. Subsequently, a Transformer-based refinement module applies deformation to each point. We design an attention-based encoder to encode both image projection features and point cloud geometric features, along with a decoder to calculate deformation residuals. Experiments on ShapeNet demonstrate that our proposed method outperforms other competing methods.

Funder

National Natural Science Foundation of China

Beijing Natural Science Foundation

China Postdoctoral Science Foundation

the State Key Laboratory of Robotics and Systems

the Fundamental Research Funds for the Central Universities

Publisher

MDPI AG

Link

https://www.mdpi.com/2072-4292/16/5/901/pdf

Reference47 articles.

1. Yao, Y., Luo, Z., Li, S., Fang, T., and Quan, L. (2018, January 8–14). Mvsnet: Depth inference for unstructured multi-view stereo. Proceedings of the European Conference on Computer Vision, Munich, Germany.

2. Chen, R., Han, S., Xu, J., and Su, H. (November, January 27). Point-based multi-view stereo network. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea.

3. Li, J., Lu, Z., Wang, Y., Wang, Y., and Xiao, J. (2022, January 10–14). DS-MVSNet: Unsupervised Multi-view Stereo via Depth Synthesis. Proceedings of the ACM International Conference on Multimedia, Lisboa, Portugal.

4. Jia, R., Chen, X., Cui, J., and Hu, Z. (2022). MVS-T: A coarse-to-fine multi-view stereo network with transformer for low-resolution images 3D reconstruction. Sensors, 22.

5. Wen, C., Zhang, Y., Li, Z., and Fu, Y. (November, January 27). Pixel2mesh++: Multi-view 3d mesh generation via deformation. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea.