Isolated Video-Based Sign Language Recognition Using a Hybrid CNN-LSTM Framework Based on Attention Mechanism-Reference-Cited by-同舟云学术

Isolated Video-Based Sign Language Recognition Using a Hybrid CNN-LSTM Framework Based on Attention Mechanism

Published:2024-03-26 Issue:7 Volume:13 Page:1229
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Kumari Diksha¹,Anand Radhey Shyam¹

Affiliation:

1. Department of Electrical Engineering, IIT (Indian Institute of Technology) Roorkee, Roorkee 247667, India

Abstract

Sign language is a complex language that uses hand gestures, body movements, and facial expressions and is majorly used by the deaf community. Sign language recognition (SLR) is a popular research domain as it provides an efficient and reliable solution to bridge the communication gap between people who are hard of hearing and those with good hearing. Recognizing isolated sign language words from video is a challenging research area in computer vision. This paper proposes a hybrid SLR framework that combines a convolutional neural network (CNN) and an attention-based long-short-term memory (LSTM) neural network. We used MobileNetV2 as a backbone model due to its lightweight structure, which reduces the complexity of the model architecture for deriving meaningful features from the video frame sequence. The spatial features are fed to LSTM optimized with an attention mechanism to select the significant gesture cues from the video frames and focus on salient features from the sequential data. The proposed method is evaluated on a benchmark WLASL dataset with 100 classes based on precision, recall, F1-score, and 5-fold cross-validation metrics. Our methodology acquired an average accuracy of 84.65%. The experiment results illustrate that our model performed effectively and computationally efficiently compared to other state-of-the-art methods.

Publisher

MDPI AG

Link

https://www.mdpi.com/2079-9292/13/7/1229/pdf

Reference33 articles.

1. Signgraph: An Efficient and Accurate Pose-Based Graph Convolution Approach Toward Sign Language Recognition;Naz;IEEE Access,2023

2. MIPA-ResGCN: A multi-input part attention enhanced residual graph convolutional framework for sign language recognition;Naz;Comput. Electr. Eng.,2023

3. TIM-SLR: A lightweight network for video isolated sign language recognition;Wang;Neural Comput. Appl.,2023

4. Attention-based 3D-CNNs for large-vocabulary sign language recognition;Huang;IEEE Trans. Circuits Syst. Video Technol.,2018

5. A deep sign language recognition system for Indian sign language;Das;Neural Comput. Appl.,2023

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. STVDNet: spatio-temporal interactive video de-raining network;The Visual Computer;2024-07-20

2. Efficient YOLO-Based Deep Learning Model for Arabic Sign Language Recognition;Journal of Disability Research;2024-05-07