Abstract
AbstractLearning deformable 3D objects from 2D images is often an ill-posed problem. Existing methods rely on explicit supervision to establish multi-view correspondences, such as template shape models and keypoint annotations, which restricts their applicability on objects “in the wild”. A more natural way of establishing correspondences is by watching videos of objects moving around. In this paper, we present DOVE, a method that learns textured 3D models of deformable object categories from monocular videos available online, without keypoint, viewpoint or template shape supervision. By resolving symmetry-induced pose ambiguities and leveraging temporal correspondences in videos, the model automatically learns to factor out 3D shape, articulated pose and texture from each individual RGB frame, and is ready for single-image inference at test time. In the experiments, we show that existing methods fail to learn sensible 3D shapes without additional keypoint or template supervision, whereas our method produces temporally consistent 3D models, which can be animated and rendered from arbitrary viewpoints. Project page: https://dove3d.github.io/.
Funder
Facebook
Innovate UK
Department of Engineering Science, University of Oxford
Engineering and Physical Sciences Research Council
Clarendon Fund
Publisher
Springer Science and Business Media LLC
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Software
Reference83 articles.
1. Apple Reality Kit Object Capture API. https://developer.apple.com/augmented-reality/object-capture/. Accessed: 2021-10
2. Arnab, A., Doersch, C., & Zisserman, A. (2019). Exploiting temporal context for 3d human pose estimation in the wild. In: CVPR.
3. Besl, P. J., & McKay, N. D. (1992). A method for registration of 3-D shapes. IEEE TPAMI14(2).
4. Chan, E., Monteiro, M., Kellnhofer, P., Wu, J., & Wetzstein, G. (2021). pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In: CVPR.
5. Chen, C., Tyagi, A., Agrawal, A., Drover, D., MV, R., Stojanov, S., & Rehg, J. M. (2019). Unsupervised 3d pose estimation with geometric self-supervision. In: CVPR.
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Towards Estimation of 3D Poses and Shapes of Animals from Oblique Drone Imagery;The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences;2024-06-11
2. Recent Trends in 3D Reconstruction of General Non‐Rigid Scenes;Computer Graphics Forum;2024-04-30
3. SceNeRFlow: Time-Consistent Reconstruction of General Dynamic Scenes;2024 International Conference on 3D Vision (3DV);2024-03-18
4. Viewset Diffusion: (0-)Image-Conditioned 3D Generative Models from 2D Data;2023 IEEE/CVF International Conference on Computer Vision (ICCV);2023-10-01
5. PPR: Physically Plausible Reconstruction from Monocular Videos;2023 IEEE/CVF International Conference on Computer Vision (ICCV);2023-10-01