An Implicit Neural Representation for the Image Stack: Depth, All in Focus, and High Dynamic Range-Reference-Cited by-同舟云学术

An Implicit Neural Representation for the Image Stack: Depth, All in Focus, and High Dynamic Range

Published:2023-12-05 Issue:6 Volume:42 Page:1-11
ISSN:0730-0301
Container-title:ACM Transactions on Graphics
language:en
Short-container-title:ACM Trans. Graph.

Author:

Wang Chao¹,Serrano Ana²,Pan Xingang³,Wolski Krzysztof¹,Chen Bin¹,Myszkowski Karol¹,Seidel Hans-Peter¹,Theobalt Christian¹,Leimkühler Thomas¹

Affiliation:

1. Max-Planck-Institut für Informatik, Germany

2. Universidad de Zaragoza, I3A, Spain

3. Max-Planck-Institut für Informatik, Germany & Nanyang Technological University, Singapore

Abstract

In everyday photography, physical limitations of camera sensors and lenses frequently lead to a variety of degradations in captured images such as saturation or defocus blur. A common approach to overcome these limitations is to resort to image stack fusion, which involves capturing multiple images with different focal distances or exposures. For instance, to obtain an all-in-focus image, a set of multi-focus images is captured. Similarly, capturing multiple exposures allows for the reconstruction of high dynamic range. In this paper, we present a novel approach that combines neural fields with an expressive camera model to achieve a unified reconstruction of an all-in-focus high-dynamic-range image from an image stack. Our approach is composed of a set of specialized implicit neural representations tailored to address specific sub-problems along our pipeline: We use neural implicits to predict flow to overcome misalignments arising from lens breathing, depth, and all-in-focus images to account for depth of field, as well as tonemapping to deal with sensor responses and saturation - all trained using a physically inspired supervision structure with a differentiable thin lens model at its core. An important benefit of our approach is its ability to handle these tasks simultaneously or independently, providing flexible post-editing capabilities such as refocusing and exposure adjustment. By sampling the three primary factors in photography within our framework (focal distance, aperture, and exposure time), we conduct a thorough exploration to gain valuable insights into their significance and impact on overall reconstruction quality. Through extensive validation, we demonstrate that our method outperforms existing approaches in both depth-from-defocus and all-in-focus image reconstruction tasks. Moreover, our approach exhibits promising results in each of these three dimensions, showcasing its potential to enhance captured image quality and provide greater control in post-processing.

Funder

Spanish Agencia Estatal de Investigación

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design

Link

https://dl.acm.org/doi/pdf/10.1145/3618367

Reference46 articles.

1. Maryam Azimi et al. 2021. PU21: A novel perceptually uniform encoding for adapting existing quality metrics for HDR . In 2021 Picture Coding Symposium (PCS). IEEE, 1--5. Maryam Azimi et al. 2021. PU21: A novel perceptually uniform encoding for adapting existing quality metrics for HDR. In 2021 Picture Coding Symposium (PCS). IEEE, 1--5.

2. Systematically differentiating parametric discontinuities

3. Conditional Random Field Model for Robust Multi-Focus Image Fusion

4. Recovering high dynamic range radiance maps from photographs

5. David Eigen , Christian Puhrsch , and Rob Fergus . 2014. Depth map prediction from a single image using a multi-scale deep network. Advances in neural information processing systems 27 ( 2014 ). David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth map prediction from a single image using a multi-scale deep network. Advances in neural information processing systems 27 (2014).