Abstract
AbstractMulti-view stereo remains a popular choice when recovering 3D geometry, despite performance varying dramatically according to the scene content. Moreover, typical pinhole camera assumptions fail in the presence of shallow depth of field inherent to macro-scale scenes; limiting application to larger scenes with diffuse reflectance. However, the presence of defocus blur can itself be considered a useful reconstruction cue, particularly in the presence of view-dependent materials. With this in mind, we explore the complimentary nature of stereo and defocus cues in the context of multi-view 3D reconstruction; and propose a complete pipeline for scene modelling from a finite aperature camera that encompasses image formation, camera calibration and reconstruction stages. As part of our evaluation, an ablation study reveals how each cue contributes to the higher performance observed over a range of complex materials and geometries. Though of lesser concern with large apertures, the effects of image noise are also considered. By introducing pre-trained deep feature extraction into our cost function, we show a step improvement over per-pixel comparisons; as well as verify the cross-domain applicability of networks using largely in-focus training data applied to defocused images. Finally, we compare to a number of modern multi-view stereo methods, and demonstrate how the use of both cues leads to a significant increase in performance across several synthetic and real datasets.
Funder
Engineering and Physical Sciences Research Council
Publisher
Springer Science and Business Media LLC
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Software