BSI-MVS: multi-view stereo network with bidirectional semantic information-Reference-Cited by-同舟云学术

BSI-MVS: multi-view stereo network with bidirectional semantic information

Published:2024-03-21 Issue:1 Volume:14 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Jia Ruiming,Yu Jun,Hu Zhenghui,Yuan Fei

Abstract

AbstractThe basic principle of multi-view stereo (MVS) is to perform 3D reconstruction by extracting depth information from multiple views. Most current SOTA MVS networks are based on Vision Transformer, which usually means expensive computational complexity. To reduce computational complexity and improve depth map accuracy, we propose a MVS network with Bidirectional Semantic Information (BSI-MVS). Firstly, we design a Multi-Level Spatial Pyramid module to generate multiple layers of feature map for extracting multi-scale information. Then we propose a 2D Bidirectional-LSTM module to capture bidirectional semantic information at different time steps in the horizontal and vertical directions, which contains abundant depth information. Finally, cost volumes are built based on various levels of feature maps to optimize the final depth map. We experiment on the DTU and BlendedMVS datasets. The result shows that our network, in terms of overall metrics, surpasses TransMVSNet, CasMVSNet, CVP-MVSNet, and AACVP-MVSNet respectively by 17.84%, 36.42%, 14.96%, and 4.86%, which also shows a noticeable performance enhancement in objective metrics and visualizations.

Funder

Zhenghui Hu

Anhua Zheng

Publisher

Springer Science and Business Media LLC

Link

https://www.nature.com/articles/s41598-024-55612-6.pdf

Reference31 articles.

1. Liu, J. et al. PlaneMVS: 3D Plane Reconstruction from Multi-View Stereo. in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition vols 2022-June 8655–8665 (2022).

2. Hirschmüller, H. Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30, 328–341 (2008).

3. Yao, Y., Luo, Z., Li, S., Fang, T. & Quan, L. MVSNet: Depth inference for unstructured multi-view stereo. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) vol. 11212 LNCS 785–801 (2018).

4. Wei, Z., Zhu, Q., Min, C., Chen, Y. & Wang, G. AA-RMVSNet: Adaptive aggregation recurrent multi-view stereo network. in Proceedings of the IEEE International Conference on Computer Vision 6167–6176 (2021).

5. Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 5999–6009 (2017).

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A deep learning-based framework for efficient and accurate 3D real-scene reconstruction;International Journal of Information Technology;2024-07-18