Gait-CNN-ViT: Multi-Model Gait Recognition with Convolutional Neural Networks and Vision Transformer
Author:
Mogan Jashila Nair1ORCID, Lee Chin Poo1ORCID, Lim Kian Ming1ORCID, Ali Mohammed2ORCID, Alqahtani Ali23ORCID
Affiliation:
1. Faculty of Information Science and Technology, Multimedia University, Melaka 75450, Malaysia 2. Department of Computer Science, King Khalid University, Abha 61421, Saudi Arabia 3. Center for Artificial Intelligence (CAI), King Khalid University, Abha 61421, Saudi Arabia
Abstract
Gait recognition, the task of identifying an individual based on their unique walking style, can be difficult because walking styles can be influenced by external factors such as clothing, viewing angle, and carrying conditions. To address these challenges, this paper proposes a multi-model gait recognition system that integrates Convolutional Neural Networks (CNNs) and Vision Transformer. The first step in the process is to obtain a gait energy image, which is achieved by applying an averaging technique to a gait cycle. The gait energy image is then fed into three different models, DenseNet-201, VGG-16, and a Vision Transformer. These models are pre-trained and fine-tuned to encode the salient gait features that are specific to an individual’s walking style. Each model provides prediction scores for the classes based on the encoded features, and these scores are then summed and averaged to produce the final class label. The performance of this multi-model gait recognition system was evaluated on three datasets, CASIA-B, OU-ISIR dataset D, and OU-ISIR Large Population dataset. The experimental results showed substantial improvement compared to existing methods on all three datasets. The integration of CNNs and ViT allows the system to learn both the pre-defined and distinct features, providing a robust solution for gait recognition even under the influence of covariates.
Funder
Fundamental Research Grant Scheme of the Ministry of Higher Education Deanship of Scientific Research, King Khalid University, Saudi Arabia Multimedia University Internal Research Grant Yayasan Universiti Multimedia
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Reference48 articles.
1. He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27–30). Deep residual learning for image recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA. 2. Imagenet classification with deep convolutional neural networks;Krizhevsky;Adv. Neural Inf. Process. Syst.,2012 3. Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K.Q. (2017, January 21–26). Densely connected convolutional networks. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA. 4. Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv. 5. Ahmed, M., Al-Jawad, N., and Sabir, A.T. (2014, January 16–17). Gait recognition based on Kinect sensor. Proceedings of the Real-Time Image and Video Processing 2014, SPIE, Brussels, Belgium.
Cited by
15 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|