PointMapNet: Point Cloud Feature Map Network for 3D Human Action Recognition
Author:
Li Xing12, Huang Qian12ORCID, Zhang Yunfei12, Yang Tianjin12, Wang Zhijian12
Affiliation:
1. The Key Laboratory of Water Big Data Technology of Ministry of Water Resources, Hohai University, Nanjing 211100, China 2. School of Computer and Information, Hohai University, Nanjing 211100, China
Abstract
3D human action recognition is crucial in broad industrial application scenarios such as robotics, video surveillance, autonomous driving, or intellectual education, etc. In this paper, we present a new point cloud sequence network called PointMapNet for 3D human action recognition. In PointMapNet, two point cloud feature maps symmetrical to depth feature maps are proposed to summarize appearance and motion representations from point cloud sequences. Specifically, we first convert the point cloud frames to virtual action frames using static point cloud techniques. The virtual action frame is a 1D vector used to characterize the structural details in the point cloud frame. Then, inspired by feature map-based human action recognition on depth sequences, two point cloud feature maps are symmetrically constructed to recognize human action from the point cloud sequence, i.e., Point Cloud Appearance Map (PCAM) and Point Cloud Motion Map (PCMM). To construct PCAM, an MLP-like network architecture is designed and used to capture the spatio-temporal appearance feature of the human action in a virtual action sequence. To construct PCMM, the MLP-like network architecture is used to capture the motion feature of the human action in a virtual action difference sequence. Finally, the two point cloud feature map descriptors are concatenated and fed to a fully connected classifier for human action recognition. In order to evaluate the performance of the proposed approach, extensive experiments are conducted. The proposed method achieves impressive results on three benchmark datasets, namely NTU RGB+D 60 (89.4% cross-subject and 96.7% cross-view), UTD-MHAD (91.61%), and MSR Action3D (91.91%). The experimental results outperform existing state-of-the-art point cloud sequence classification networks, demonstrating the effectiveness of our method.
Funder
the National Key Research and Development Program of China the 14th Five-Year Plan for Educational Science of Jiangsu Province the Jiangsu Higher Education Reform Research Project the 2022 Undergraduate Practice Teaching Reform Research Project of Hohai University
Subject
Physics and Astronomy (miscellaneous),General Mathematics,Chemistry (miscellaneous),Computer Science (miscellaneous)
Reference46 articles.
1. Yang, W., Zhang, J., Cai, J., and Xu, Z. (2021). Relation Selective Graph Convolutional Network for Skeleton-Based Action Recognition. Symmetry, 13. 2. Yang, X., Zhang, C., and Tian, Y. (2012). Recognizing Actions Using Depth Motion Maps-Based Histograms of Oriented Gradients, Association for Computing Machinery. 3. The recognition of human movement using temporal templates;Bobick;IEEE Trans. Pattern Anal. Mach. Intell.,2001 4. Shahroudy, A., Liu, J., Ng, T.T., and Wang, G. (2016, January 27–30). Ntu rgb+ d: A large scale dataset for 3d human activity analysis. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA. 5. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7–12). Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|