3D hand pose and mesh estimation via a generic Topology-aware Transformer model

Author:

Yu Shaoqi,Wang Yintong,Chen Lili,Zhang Xiaolin,Li Jiamao

Abstract

In Human-Robot Interaction (HRI), accurate 3D hand pose and mesh estimation hold critical importance. However, inferring reasonable and accurate poses in severe self-occlusion and high self-similarity remains an inherent challenge. In order to alleviate the ambiguity caused by invisible and similar joints during HRI, we propose a new Topology-aware Transformer network named HandGCNFormer with depth image as input, incorporating prior knowledge of hand kinematic topology into the network while modeling long-range contextual information. Specifically, we propose a novel Graphformer decoder with an additional Node-offset Graph Convolutional layer (NoffGConv). The Graphformer decoder optimizes the synergy between the Transformer and GCN, capturing long-range dependencies and local topological connections between joints. On top of that, we replace the standard MLP prediction head with a novel Topology-aware head to better exploit local topological constraints for more reasonable and accurate poses. Our method achieves state-of-the-art 3D hand pose estimation performance on four challenging datasets, including Hands2017, NYU, ICVL, and MSRA. To further demonstrate the effectiveness and scalability of our proposed Graphformer Decoder and Topology aware head, we extend our framework to HandGCNFormer-Mesh for the 3D hand mesh estimation task. The extended framework efficiently integrates a shape regressor with the original Graphformer Decoder and Topology aware head, producing Mano parameters. The results on the HO-3D dataset, which contains various and challenging occlusions, show that our HandGCNFormer-Mesh achieves competitive results compared to previous state-of-the-art 3D hand mesh estimation methods.

Publisher

Frontiers Media SA

Reference61 articles.

1. GCST: graph convolutional skeleton transformer for action recognition;Bai;arXiv preprint arXiv:2109.02860,2021

2. “Active learning for bayesian 3d hand pose estimation,”;Caramalau;Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision,2021

3. “End-to-end object detection with transformers,”;Carion,2020

4. Pose guided structured region ensemble network for cascaded hand pose estimation;Chen;Neurocomputing,2020

5. “Pose2Mesh: graph convolutional network for 3d human pose and mesh recovery from a 2d human pose,”;Choi,2020

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3