M2AST:MLP-Mixer-based Adaptive Spatial-Temporal Graph Learning for Human Motion Prediction-Reference-Cited by-同舟云学术

M2AST:MLP-Mixer-based Adaptive Spatial-Temporal Graph Learning for Human Motion Prediction

Published:2023-08-09 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Tang Junyi¹,Liu Yuanwei¹,Su Yong¹,An Simin¹

Affiliation:

1. Tianjin Normal University

Abstract

Abstract Human motion prediction is a challenging task in human-centric computer vision that involves forecasting future poses based on historical sequences. Despite recent progress in modeling spatial-temporal relationships of motion sequences using complex structured graphs, few approaches have been able to provide an adaptive and compact representation for varying graph structures of human motion. Inspired by the advantages of MLP-Mixer, a lightweight architecture developed for learning complex interactions in multi-dimensional data, we explore its potential as a backbone for motion prediction. Human motion prediction is a challenging task in human-centric computer vision, involving forecasting future poses based on historical sequences. Despite recent progress in modeling spatial-temporal relationships of motion sequences using complex structured graphs, few approaches have provided an adaptive and lightweight representation for varying graph structures of human motion. Taking inspiration from the advantages of MLP-Mixer, a lightweight architecture designed for learning complex interactions in multi-dimensional data, we explore its potential as a backbone for motion prediction. To this end, we propose a novel MLP-Mixer-based adaptive spatial-temporal pattern learning framework (M\(^2\)AST). Our framework includes an adaptive spatial mixer to model the spatial relationships between joints, an adaptive temporal mixer to learn temporal smoothness, and a local dynamic mixer to capture fine-grained cross-dependencies between joints of adjacent poses. The final method achieves a compact representation of human motion dynamics by adaptively considering spatial-temporal dependencies from coarse to fine. Unlike the trivial spatial-temporal MLP-Mixer, our proposed approach can more effectively capture both local and global spatial-temporal relationships simultaneously. We extensively evaluated our proposed framework on three commonly used benchmarks (Human3.6M, AMASS, 3DPW MoCap), demonstrating comparable or better performance than existing state-of-the-art methods in both short and long-term predictions, despite having significantly fewer parameters. Overall, our proposed framework provides a novel and efficient solution for human motion prediction with adaptive graph learning.

Publisher

Research Square Platform LLC

Reference61 articles.

1. Julieta Martinez and Michael J. Black and Javier Romero (2017) On Human Motion Prediction Using Recurrent Neural Networks. 4674--4683, {IEEE} Conference on Computer Vision and Pattern Recognition, {CVPR}

2. Chen Li and Zhen Zhang and Wee Sun Lee and Gim Hee Lee (2018) Convolutional Sequence to Sequence Model for Human Dynamics. 5226--5234, {IEEE} Conference on Computer Vision and Pattern Recognition, {CVPR}

3. Zhenguang Liu and Shuang Wu and Shuyuan Jin and Qi Liu and Shijian Lu and Roger Zimmermann and Li Cheng (2019) Towards Natural and Accurate Future Motion Prediction of Humans and Animals. 10004--10012, {IEEE} Conference on Computer Vision and Pattern Recognition, {CVPR}

4. Wei Mao and Miaomiao Liu and Mathieu Salzmann and Hongdong Li (2019) Learning Trajectory Dependencies for Human Motion Prediction. 9488--9496, {IEEE} International Conference on Computer Vision, {ICCV}

5. Maosen Li and Siheng Chen and Yangheng Zhao and Ya Zhang and Yanfeng Wang and Qi Tian (2020) Dynamic Multiscale Graph Neural Networks for 3D Skeleton Based Human Motion Prediction. 211--220, {IEEE} Conference on Computer Vision and Pattern Recognition, {CVPR}