Formulating facial mesh tracking as a differentiable optimization problem: a backpropagation-based solution
-
Published:2024-07-19
Issue:1
Volume:2
Page:
-
ISSN:2731-9008
-
Container-title:Visual Intelligence
-
language:en
-
Short-container-title:Vis. Intell.
Author:
Peng Siran, Zhu XiangyuORCID, Yi DongORCID, Qian ChenORCID, Lei ZhenORCID
Abstract
AbstractFacial mesh tracking enables the production of topologically consistent 3D facial meshes from stereo video input captured by calibrated cameras. This technology is an integral part of many digital human applications, such as personalized avatar creation, audio-driven 3D facial animation, and talking face video generation. Currently, most facial mesh tracking methods are built on computer graphics techniques, which involve complex procedures and often necessitate human annotation within pipelines. As a result, these approaches are difficult to implement and hard to generalize across various scenarios. We propose a backpropagation-based solution that formulates facial mesh tracking as a differentiable optimization problem called the BPMT. Our solution leverages visual clues extracted from the stereo input to estimate vertex-wise geometry and texture information. The BPMT is composed of two steps: automatic face analysis and mesh tracking. In the first step, a range of visual clues are automatically extracted from the input, including facial point clouds, multi-view 2D landmarks, 3D landmarks in the world coordinate system, motion fields, and image masks. The second step can be viewed as a differentiable optimization problem, with constraints comprising stereo video input and facial clues. The primary objective is to achieve topologically consistent 3D facial meshes across frames. Additionally, the parameters to be optimized encompass the positions of free-form deformed vertices and a shared texture UV map. Furthermore, the 3D morphable model (3DMM) is introduced as a form of regularization to enhance the convergence of the optimization process. Leveraging the fully developed backpropagation software, we progressively register the facial meshes to the recorded object, generating high-quality 3D faces with consistent topologies. The BPMT requires no manual labeling within the pipeline, making it suitable for producing large-scale stereo facial data. Moreover, our method exhibits a high degree of flexibility and extensibility, positioning it as a promising platform for future research in the community.
Funder
Chinese National Natural Science Foundation Projects Beijing Science and Technology Planning Project Natural Science Foundation of Beijing Municipality the Youth Innovation Promotion Association CAS InnoHK program
Publisher
Springer Science and Business Media LLC
Reference34 articles.
1. Fyffe, G., Jones, A., Alexander, O., Ichikari, R., & Debevec, P. (2015). Driving high-resolution facial scans with video performance capture. ACM Transactions on Graphics, 34(1), 1–14. 2. Lombardi, S., Saragih, J., Simon, T., & Sheikh, Y. (2018). Deep appearance models for face rendering. ACM Transactions on Graphics, 37(4), 1–13. 3. Wei, S.-E., Saragih, J., Simon, T., Harley, A. W., Lombardi, S., Perdoch, M., et al. (2019). VR facial animation via multiview image translation. ACM Transactions on Graphics, 38(4), 1–16. 4. Zielonka, W., Bolkart, T., & Thies, J. (2023). Instant volumetric head avatars. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4574–4584). Piscataway: IEEE. 5. Karras, T., Aila, T., Laine, S., Herva, A., & Lehtinen, J. (2017). Audio-driven facial animation by joint end-to-end learning of pose and emotion. ACM Transactions on Graphics, 36(4), 1–12.
|
|