A Lightweight Hierarchical Model with Frame-Level Joints Adaptive Graph Convolution for Skeleton-Based Action Recognition-Reference-Cited by-同舟云学术

A Lightweight Hierarchical Model with Frame-Level Joints Adaptive Graph Convolution for Skeleton-Based Action Recognition

Published:2021-11-01 Issue: Volume:2021 Page:1-13
ISSN:1939-0122
Container-title:Security and Communication Networks
language:en
Short-container-title:Security and Communication Networks

Author:

Jiang Yujian¹²³^ORCID,Yang Xue¹²³,Liu Jingyu¹²³,Zhang Junming¹²³

Affiliation:

1. State Key Laboratory of Media Convergence of Communication, Communication University of China, Beijing 100024, China

2. Key Laboratory of Acoustic Visual Technology and Intelligent Control System, Communication University of China, Ministry of Culture and Tourism, Beijing 100024, China

3. Beijing Key Laboratory of Modern Entertainment Technology, Communication University of China, Beijing 100024, China

Abstract

In skeleton-based human action recognition methods, human behaviours can be analysed through temporal and spatial changes in the human skeleton. Skeletons are not limited by clothing changes, lighting conditions, or complex backgrounds. This recognition method is robust and has aroused great interest; however, many existing studies used deep-layer networks with large numbers of required parameters to improve the model performance and thus lost the advantage of less computation of skeleton data. It is difficult to deploy previously established models to real-life applications based on low-cost embedded devices. To obtain a model with fewer parameters and a higher accuracy, this study designed a lightweight frame-level joints adaptive graph convolutional network (FLAGCN) model to solve skeleton-based action recognition tasks. Compared with the classical 2s-AGCN model, the new model obtained a higher precision with 1/8 of the parameters and 1/9 of the floating-point operations (FLOPs). Our proposed network characterises three main improvements. First, a previous feature-fusion method replaces the multistream network and reduces the number of required parameters. Second, at the spatial level, two kinds of graph convolution methods capture different aspects of human action information. A frame-level graph convolution constructs a human topological structure for each data frame, whereas an adjacency graph convolution captures the characteristics of the adjacent joints. Third, the model proposed in this study hierarchically extracts different levels of action sequence features, making the model clear and easy to understand; further, it reduces the depth of the model and the number of parameters. A large number of experiments on the NTU RGB + D 60 and 120 data sets show that this method has the advantages of few required parameters, low computational costs, and fast speeds. It also has a simple structure and training process that make it easy to deploy in real-time recognition systems based on low-cost embedded devices.

Funder

Key Laboratory of Ministry of Culture and Tourism

Publisher

Hindawi Limited

Subject

Computer Networks and Communications,Information Systems

Link

http://downloads.hindawi.com/journals/scn/2021/2290304.pdf

Reference51 articles.

1. A survey on vision-based human action recognition

2. Visual perception of biological motion and a model for its analysis

3. Understanding the Gap between 2D and 3D Skeleton-Based Action Recognition

4. A Comparative Review of Recent Kinect-Based Action Recognition Algorithms

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multi-Scale Adaptive Graph Convolution Network for Skeleton-Based Action Recognition;IEEE Access;2024

2. A Lightweight Human Action Classification Method for Green IoT Sport Applications;Journal of Sensors;2022-07-01