DGU-HAU: A Dataset for 3D Human Action Analysis on Utterances
-
Published:2023-11-27
Issue:23
Volume:12
Page:4793
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Park Jiho1ORCID, Park Kwangryeol2ORCID, Kim Dongho3ORCID
Affiliation:
1. Department of Artificial Intelligence, Dongguk University, Seoul 04620, Republic of Korea 2. Department of Computer Science and Engineering, Dongguk University, Seoul 04620, Republic of Korea 3. Software Education Institute, Dongguk University, Seoul 04620, Republic of Korea
Abstract
Constructing diverse and complex multi-modal datasets is crucial for advancing human action analysis research, providing ground truth annotations for training deep learning networks, and enabling the development of robust models across real-world scenarios. Generating natural and contextually appropriate nonverbal gestures is essential for enhancing immersive and effective human–computer interactions in various applications. These applications include video games, embodied virtual assistants, and conversations within a metaverse. However, existing speech-related human datasets are focused on style transfer, so they have limitations that make them unsuitable for 3D human action analysis studies, such as human action recognition and generation. Therefore, we introduce a novel multi-modal dataset, DGU-HAU, a dataset for 3D human action on utterances that commonly occurs during daily life. We validate the dataset using a human action generation model, Action2Motion (A2M), a state-of-the-art 3D human action generation model.
Funder
Ministry of Education MSIT (Ministry of Science and ICT), Korea, under the ITRC IITP
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference29 articles.
1. Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., Viola, F., Green, T., Back, T., and Natsev, P. (2017). The kinetics human action video dataset. arXiv. 2. Microsoft kinect sensor and its effect;Zhang;IEEE Multimed.,2012 3. Enhanced computer vision with microsoft kinect sensor: A review;Han;IEEE Trans. Cybern.,2013 4. Shahroudy, A., Liu, J., Ng, T.T., and Wang, G. (2016, January 27–30). Ntu rgb+ d: A large scale dataset for 3d human activity analysis. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA. 5. Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding;Liu;IEEE Trans. Pattern Anal. Mach. Intell.,2019
|
|