Affiliation:
1. Uttarakhand Technical University, India
2. College of Engineering, Roorkee, India
Abstract
RGBD-based activity recognition is quite an interesting task in computer vision. Inspired by the exemplary results obtained from automatic features learning from RGBD data, in this work a six-stream CNN fusion approach has been addressed, which is developed on 2D-convolution neural network (2DCNN) and spatial-temporal 3D-convolution neural networks (ST3DCNN). The proposed approach has six streams and runs in parallel, where the first and second streams are used to extract space and time features with the help of a ST3DCNN model. Similarly, the remaining four streams have been used to extract the temporal features by means of two motion templates on motion history image (MHI) and motion energy image (MEI) via a 2DCNN. Further, a support vector machine (SVM) is employed to generate the score from each stream. Finally, a decision level fusion scheme particularly a weighted product model (WPM) to fuse the scores is obtained from all the streams. The effectiveness of the proposed approach has been tested on popular benchmark public datasets, namely UTD-MHAD, and gives promising results.