Rhythmic Gesticulator

Author:

Ao Tenglong1,Gao Qingzhe2,Lou Yuke1,Chen Baoquan1,Liu Libin1

Affiliation:

1. Peking University, China

2. Shandong University and Peking University, China

Abstract

Automatic synthesis of realistic co-speech gestures is an increasingly important yet challenging task in artificial embodied agent creation. Previous systems mainly focus on generating gestures in an end-to-end manner, which leads to difficulties in mining the clear rhythm and semantics due to the complex yet subtle harmony between speech and gestures. We present a novel co-speech gesture synthesis method that achieves convincing results both on the rhythm and semantics. For the rhythm, our system contains a robust rhythm-based segmentation pipeline to ensure the temporal coherence between the vocalization and gestures explicitly. For the gesture semantics, we devise a mechanism to effectively disentangle both low- and high-level neural embeddings of speech and motion based on linguistic theory. The high-level embedding corresponds to semantics, while the low-level embedding relates to subtle variations. Lastly, we build correspondence between the hierarchical embeddings of the speech and the motion, resulting in rhythm- and semantics-aware gesture synthesis. Evaluations with existing objective metrics, a newly proposed rhythmic metric, and human feedback show that our method outperforms state-of-the-art systems by a clear margin.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design

Reference74 articles.

1. Style‐Controllable Speech‐Driven Gesture Synthesis Using Normalising Flows

2. Deep motifs and motion signatures

3. Andreas Aristidou , Anastasios Yiannakidis , Kfir Aberman , Daniel Cohen-Or , Ariel Shamir , and Yiorgos Chrysanthou . 2022. Rhythm is a Dancer: Music-Driven Motion Synthesis with Global Structure . IEEE Transactions on Visualization and Computer Graphics ( 2022 ), 1--1. Andreas Aristidou, Anastasios Yiannakidis, Kfir Aberman, Daniel Cohen-Or, Ariel Shamir, and Yiorgos Chrysanthou. 2022. Rhythm is a Dancer: Music-Driven Motion Synthesis with Global Structure. IEEE Transactions on Visualization and Computer Graphics (2022), 1--1.

4. Alexei Baevski , Steffen Schneider , and Michael Auli . 2020 . vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations . In International Conference on Learning Representations. Alexei Baevski, Steffen Schneider, and Michael Auli. 2020. vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations. In International Conference on Learning Representations.

5. Multimodal Machine Learning: A Survey and Taxonomy

Cited by 19 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons;Proceedings of the 31st ACM International Conference on Multimedia;2023-10-26

2. The KCL-SAIR team's entry to the GENEA Challenge 2023 Exploring Role-based Gesture Generation in Dyadic Interactions: Listener vs. Speaker;International Cconference on Multimodal Interaction;2023-10-09

3. DiffuGesture: Generating Human Gesture From Two-person Dialogue With Diffusion Models;International Cconference on Multimodal Interaction;2023-10-09

4. Gesture Motion Graphs for Few-Shot Speech-Driven Gesture Reenactment;INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION;2023-10-09

5. Large language models in textual analysis for gesture selection;INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION;2023-10-09

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3