Abstract
The state of the art in articulated hand tracking has been greatly advanced by hybrid methods that fit a generative hand model to depth data, leveraging both temporally and discriminatively predicted starting poses. In this paradigm, the generative model is used to define an energy function and a local iterative optimization is performed from these starting poses in order to find a "good local minimum" (i.e. a local minimum close to the true pose). Performing this optimization quickly is key to exploring more starting poses, performing more iterations and, crucially, exploiting high frame rates that ensure that temporally predicted starting poses are in the basin of convergence of a good local minimum. At the same time, a detailed and accurate generative model tends to deepen the good local minima and widen their basins of convergence. Recent work, however, has largely had to trade-off such a detailed hand model with one that facilitates such rapid optimization. We present a new implicit model of hand geometry that mostly avoids this compromise and leverage it to build an ultra-fast hybrid hand tracking system. Specifically, we construct an articulated signed distance function that, for any pose, yields a closed form calculation of both the distance to the detailed surface geometry and the necessary derivatives to perform gradient based optimization. There is no need to introduce or update any explicit "correspondences" yielding a simple algorithm that maps well to parallel hardware such as GPUs. As a result, our system can run at extremely high frame rates (e.g. up to 1000fps). Furthermore, we demonstrate how to detect, segment and optimize for two strongly interacting hands, recovering complex interactions at extremely high framerates. In the absence of publicly available datasets of sufficiently high frame rate, we leverage a multiview capture system to create a new 180fps dataset of one and two hands interacting together or with objects.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design
Reference51 articles.
1. Motion Capture of Hands in Action Using Discriminative Salient Points
2. Blender Online Community. 2016. Blender - a 3D modelling and rendering package. Blender Foundation Blender Institute Amsterdam. http://www.blender.org Blender Online Community. 2016. Blender - a 3D modelling and rendering package. Blender Foundation Blender Institute Amsterdam. http://www.blender.org
3. What Shape Are Dolphins? Building 3D Morphable Models from 2D Images
4. Mean shift: a robust approach toward feature space analysis
5. Martin de La Gorce David J Fleet and Nikos Paragios. 2011. Model-based 3d hand pose estimation from monocular video. IEEE transactions on pattern analysis and machine intelligence 33 9 (2011) 1793--1805. 10.1109/TPAMI.2011.33 Martin de La Gorce David J Fleet and Nikos Paragios. 2011. Model-based 3d hand pose estimation from monocular video. IEEE transactions on pattern analysis and machine intelligence 33 9 (2011) 1793--1805. 10.1109/TPAMI.2011.33
Cited by
67 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献