Interactive Learning of a Dual Convolution Neural Network for Multi-Modal Action Recognition

Author:

Li Qingxia,Gao Dali,Zhang QieshiORCID,Wei Wenhong,Ren ZiliangORCID

Abstract

RGB and depth modalities contain more abundant and interactive information, and convolutional neural networks (ConvNets) based on multi-modal data have achieved successful progress in action recognition. Due to the limitation of a single stream, it is difficult to improve recognition performance by learning multi-modal interactive features. Inspired by the multi-stream learning mechanism and spatial-temporal information representation methods, we construct dynamic images by using the rank pooling method and design an interactive learning dual-ConvNet (ILD-ConvNet) with a multiplexer module to improve action recognition performance. Built on the rank pooling method, the constructed visual dynamic images can capture the spatial-temporal information from entire RGB videos. We extend this method to depth sequences to obtain more abundant multi-modal spatial-temporal information as the inputs of the ConvNets. In addition, we design a dual ILD-ConvNet with multiplexer modules to jointly learn the interactive features of two-stream from RGB and depth modalities. The proposed recognition framework has been tested on two benchmark multi-modal datasets—NTU RGB + D 120 and PKU-MMD. The proposed ILD-ConvNet with a temporal segmentation mechanism achieves an accuracy of 86.9% and 89.4% for Cross-Subject (C-Sub) and Cross-Setup (C-Set) on NTU RGB + D 120, 92.0% and 93.1% for Cross-Subject (C-Sub) and Cross-View (C-View) on PKU-MMD, which are comparable with the state of the art. The experimental results shown that our proposed ILD-ConvNet with a multiplexer module can extract interactive features from different modalities to enhance action recognition performance.

Funder

Ministry of Science and Technology of China

National Natural Science Foundation of China

Key Projects of Artificial Intelligence of High School in Guangdong Province

Innovation Project of High School in Guangdong Province

Dongguan Science and Technology Special Commissioner Project

Dongguan Social Development Science and Technology Project

Publisher

MDPI AG

Subject

General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Reference52 articles.

1. Deep image-to-video adaptation and fusion networks for action recognition;Liu;IEEE Trans. Image Process. TIP,2020

2. Temporal reasoning graph for activity recognition;Zhang;IEEE Trans. Image Process. TIP,2020

3. A self-supervised gait encoding approach with locality-awareness for 3D skeleton based person re-identification;Rao;IEEE Trans. Pattern Anal. Mach. Intell. TPAMI,2021

4. BoMW: Bag of manifold words for one-shot learning gesture recognition from Kinect;Zhang;IEEE Trans. Circuits Syst. Vid. Technol. TCSVT,2017

5. Exploiting spatio-temporal representation for 3D human action recognition from depth map sequences;Ji;Knowl.-Based Syst.,2021

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3