Synthetic Humans for Action Recognition from Unseen Viewpoints-Reference-Cited by-同舟云学术

Synthetic Humans for Action Recognition from Unseen Viewpoints

Published:2021-05-12 Issue:7 Volume:129 Page:2264-2287
ISSN:0920-5691
Container-title:International Journal of Computer Vision
language:en
Short-container-title:Int J Comput Vis

Author:

Varol Gül^ORCID,Laptev Ivan,Schmid Cordelia,Zisserman Andrew

Abstract

AbstractAlthough synthetic training data has been shown to be beneficial for tasks such as human pose estimation, its use for RGB human action recognition is relatively unexplored. Our goal in this work is to answer the question whether synthetic humans can improve the performance of human action recognition, with a particular focus on generalization to unseen viewpoints. We make use of the recent advances in monocular 3D human body reconstruction from real action sequences to automatically render synthetic training videos for the action labels. We make the following contributions: (1) we investigate the extent of variations and augmentations that are beneficial to improving performance at new viewpoints. We consider changes in body shape and clothing for individuals, as well as more action relevant augmentations such as non-uniform frame sampling, and interpolating between the motion of individuals performing the same action; (2) We introduce a new data generation methodology, SURREACT, that allows training of spatio-temporal CNNs for action classification; (3) We substantially improve the state-of-the-art action recognition performance on the NTU RGB+D and UESTC standard human action multi-view benchmarks; Finally, (4) we extend the augmentation approach to in-the-wild videos from a subset of the Kinetics dataset to investigate the case when only one-shot training data is available, and demonstrate improvements in this case as well.

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Software

Link

https://link.springer.com/content/pdf/10.1007/s11263-021-01467-7.pdf

Reference99 articles.

1. Carnegie-Mellon Mocap Database. http://mocap.cs.cmu.edu/.

2. Badler, N. I., Phillips, C. B., & Webber, B. L. (1993). Simulating Humans: Computer Graphics Animation and Control. New York, NY, USA: Oxford University Press Inc.

3. Baradel, F., Wolf, C., & Mille, J. (2017). Pose-conditioned spatio-temporal attention for human action recognition. CoRR. (abs/1703.10106).

4. Baradel, F., Wolf, C., Mille, J., & Taylor, G.W. (2018). Glimpse clouds: Human activity recognition from unstructured feature points. In: CVPR.

5. Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., & Black, M.J. (2016). Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In: ECCV.

Cited by 40 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Human-centric multimodal fusion network for robust action recognition;Expert Systems with Applications;2024-04

2. NOVAction23: Addressing the data diversity gap by uniquely generated synthetic sequences for real-world human action recognition;Computers & Graphics;2024-02

3. A descriptive behavior intention inference framework using spatio-temporal semantic features for human–robot real-time interaction;Engineering Applications of Artificial Intelligence;2024-02

4. Fusing the Appearance and Gait Features for Clothes-Changing Video Person Re-Identification;2024

5. Visual analysis of fatigue in Industry 4.0;The International Journal of Advanced Manufacturing Technology;2023-12-02