Affiliation:
1. Shandong Jiaotong University
2. Shandong Zhengyuan Yeda Environmental Technology Co., Ltd
Abstract
Abstract
3D reconstruction, integral to environmental perception, has seen extensive adoption across numerous domains in recent years. Notably, within multi-view 3D reconstruction, MVSNet has emerged as a standout. MVSNet has pioneered the inclusion of differentiable homography transformations within neural networks, seamlessly merging camera model transformations and facilitating end-to-end multi-view 3D reconstruction. However, MVSNet's method has average performance in feature extraction and insufficient correlation between multi-scale cost volumes, which indicates that there is potential for further improvement in 3D scene reconstruction. To solve this problem, we propose an improved reconstruction algorithm based on the MVSNet network architecture. To glean richer pixel details from images, we suggest deploying a DE module integrated with a residual framework, supplanting the prevailing feature extraction mechanism. ECA-Net and dilated convolution are used to expand the range of the receptive field without increasing the number of parameters, and feature splicing and fusion is performed through the residual structure to retain the global information of the original image. Moreover, harnessing attention mechanisms refines the 3D cost volume's regularization process, bolstering the integration of information across multi-scale feature volumes, consequently enhancing depth estimation precision. On assessing our model using the DTU dataset, findings highlight the network's 3D reconstruction scoring a completeness (comp) of 0.411mm and an overall quality of 0.418mm. This performance is higher than that of traditional methods and other deep learning-based methods. Concurrently, the visual representation of the point cloud model exhibits marked advancements. Trials on the BlendedMVS dataset signify that our network exhibits commendable generalization prowess.
Publisher
Research Square Platform LLC
Reference22 articles.
1. Accurate, dende, and robust multiview stereopsis;Furukawa Y;IEEE Trans Pattern Anal Mach Intell,2009
2. Structure-from-motion revisited;Schonberger JL;In Proc IEEE Conf Comput Vis Pattern Recognit (CVPR),2016
3. Stereo processing by semiglobal matching and mutual information;Hirschmuller H;IEEE Trans Pattern Anal Mach Intell,2007
4. Multi-view stereo in the Deep Learning Era: A comprehensive revfiew;Wang X;Displays,2021
5. Learned multi-patch Similarity;Hartmann W;Proc IEEE Int Conf Comput Vis (ICCV),2017