Next-Gen Dynamic Hand Gesture Recognition: MediaPipe, Inception-v3 and LSTM-Based Enhanced Deep Learning Model-Reference-Cited by-同舟云学术

Next-Gen Dynamic Hand Gesture Recognition: MediaPipe, Inception-v3 and LSTM-Based Enhanced Deep Learning Model

Published:2024-08-15 Issue:16 Volume:13 Page:3233
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Yaseen ¹^ORCID,Kwon Oh-Jin¹^ORCID,Kim Jaeho²^ORCID,Jamil Sonain³^ORCID,Lee Jinhee¹,Ullah Faiz¹^ORCID

Affiliation:

1. Department of Electronics Engineering, Sejong University, Seoul 05006, Republic of Korea

2. Department of Electrical Engineering, Sejong University, Seoul 05006, Republic of Korea

3. Department of Computer Science, Norwegian University of Science and Technology (NTNU), 2815 Gjovik, Norway

Abstract

Gesture recognition is crucial in computer vision-based applications, such as drone control, gaming, virtual and augmented reality (VR/AR), and security, especially in human–computer interaction (HCI)-based systems. There are two types of gesture recognition systems, i.e., static and dynamic. However, our focus in this paper is on dynamic gesture recognition. In dynamic hand gesture recognition systems, the sequences of frames, i.e., temporal data, pose significant processing challenges and reduce efficiency compared to static gestures. These data become multi-dimensional compared to static images because spatial and temporal data are being processed, which demands complex deep learning (DL) models with increased computational costs. This article presents a novel triple-layer algorithm that efficiently reduces the 3D feature map into 1D row vectors and enhances the overall performance. First, we process the individual images in a given sequence using the MediaPipe framework and extract the regions of interest (ROI). The processed cropped image is then passed to the Inception-v3 for the 2D feature extractor. Finally, a long short-term memory (LSTM) network is used as a temporal feature extractor and classifier. Our proposed method achieves an average accuracy of more than 89.7%. The experimental results also show that the proposed framework outperforms existing state-of-the-art methods.

Funder

Ministry of Science and ICT, the Republic of Korea

MSIT (Ministry of Science and ICT), the Republic of Korea

Publisher

MDPI AG

Link

https://www.mdpi.com/2079-9292/13/16/3233/pdf

Reference39 articles.

1. Multi-modal zero-shot dynamic hand gesture recognition;Rastgoo;Expert Syst. Appl.,2024

2. Multimodal fusion hierarchical self-attention network for dynamic hand gesture recognition;Balaji;J. Vis. Commun. Image Represent.,2024

3. A Novel Hybrid Deep Learning Architecture for Dynamic Hand Gesture Recognition;Hax;IEEE Access,2024

4. mXception and dynamic image for hand gesture recognition;Karsh;Neural Comput. Appl.,2024

5. A novel feature fusion technique for robust hand gesture recognition;Sunanda;Multimed. Tools Appl.,2024