Evaluating Gesture Generation in a Large-scale Open Challenge: The GENEA Challenge 2022

Author:

Kucherenko Taras1ORCID,Wolfert Pieter2ORCID,Yoon Youngwoo3ORCID,Viegas Carla4ORCID,Nikolov Teodor5ORCID,Tsakov Mihail6ORCID,Henter Gustav Eje7ORCID

Affiliation:

1. SEED, Electronic Arts Inc, Stockholm, Sweden

2. Donders Institute for Brain, Cognition & Behaviour, Radboud Universiteit, Nijmegen, Netherlands and IDLab, Ghent University, Gent, Belgium

3. ETRI, Daejeon, Korea (the Republic of)

4. Carnegie Mellon University, Pittsburgh, United States and Nova University of Lisbon, Lisboa, Portugal

5. Department of Computing Science, Umeå Universitet, Umeå, Sweden and Motorica AB, Stockholm, Sweden

6. Department of Computing Science, Umeå Universitet, Umeå, Sweden

7. Division of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden and Motorica AB, Stockholm, Sweden

Abstract

This article reports on the second GENEA Challenge to benchmark data-driven automatic co-speech gesture generation. Participating teams used the same speech and motion dataset to build gesture-generation systems. Motion generated by all these systems was rendered to video using a standardised visualisation pipeline and evaluated in several large, crowdsourced user studies. Unlike when comparing different research articles, differences in results are here only due to differences between methods, enabling direct comparison between systems. The dataset was based on 18 hours of full-body motion capture, including fingers, of different persons engaging in a dyadic conversation. Ten teams participated in the challenge across two tiers: full-body and upper-body gesticulation. For each tier, we evaluated both the human-likeness of the gesture motion and its appropriateness for the specific speech signal. Our evaluations decouple human-likeness from gesture appropriateness, which has been a difficult problem in the field. The evaluation results show some synthetic gesture conditions being rated as significantly more human-like than 3D human motion capture. To the best of our knowledge, this has not been demonstrated before. On the other hand, all synthetic motion is found to be vastly less appropriate for the speech than the original motion-capture recordings. We also find that conventional objective metrics do not correlate well with subjective human-likeness ratings in this large evaluation. The one exception is the Fréchet gesture distance (FGD), which achieves a Kendall’s tau rank correlation of around -0.5. Based on the challenge results we formulate numerous recommendations for system building and evaluation.

Funder

Industrial Fundamental Technology Development Program

Flemish Research Foundation

Portuguese Foundation for Science and Technology

Knut and Alice Wallenberg Foundation

Publisher

Association for Computing Machinery (ACM)

Reference134 articles.

1. No Gestures Left Behind: Learning Relationships between Spoken Language and Freeform Gestures

2. Chaitanya Ahuja, Dong Won Lee, and Louis-Philippe Morency. 2022. Low-resource adaptation for personalized co-speech gesture generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’22). 20566–20576. DOI:10.1109/CVPR52688.2022.01991

3. Simon Alexanderson. 2020. The StyleGestures entry to the GENEA challenge 2020. In Proceedings of the GENEA Workshop (GENEA ’20). DOI:10.5281/zenodo.4088599

4. Style‐Controllable Speech‐Driven Gesture Synthesis Using Normalising Flows

5. Listen, Denoise, Action! Audio-Driven Motion Synthesis with Diffusion Models

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Speech-Driven Personalized Gesture Synthetics: Harnessing Automatic Fuzzy Feature Inference;IEEE Transactions on Visualization and Computer Graphics;2024-10

2. DiT-Gesture: A Speech-Only Approach to Stylized Gesture Generation;Electronics;2024-04-27

3. Unified Speech and Gesture Synthesis Using Flow Matching;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3