Multi-cue temporal modeling for skeleton-based sign language recognition-Reference-Cited by-同舟云学术

Multi-cue temporal modeling for skeleton-based sign language recognition

Published:2023-04-05 Issue: Volume:17 Page:
ISSN:1662-453X
Container-title:Frontiers in Neuroscience
language:
Short-container-title:Front. Neurosci.

Author:

Özdemir Oğulcan,Baytaş İnci M.,Akarun Lale

Abstract

Sign languages are visual languages used as the primary communication medium for the Deaf community. The signs comprise manual and non-manual articulators such as hand shapes, upper body movement, and facial expressions. Sign Language Recognition (SLR) aims to learn spatial and temporal representations from the videos of the signs. Most SLR studies focus on manual features often extracted from the shape of the dominant hand or the entire frame. However, facial expressions combined with hand and body gestures may also play a significant role in discriminating the context represented in the sign videos. In this study, we propose an isolated SLR framework based on Spatial-Temporal Graph Convolutional Networks (ST-GCNs) and Multi-Cue Long Short-Term Memorys (MC-LSTMs) to exploit multi-articulatory (e.g., body, hands, and face) information for recognizing sign glosses. We train an ST-GCN model for learning representations from the upper body and hands. Meanwhile, spatial embeddings of hand shape and facial expression cues are extracted from Convolutional Neural Networks (CNNs) pre-trained on large-scale hand and facial expression datasets. Thus, the proposed framework coupling ST-GCNs with MC-LSTMs for multi-articulatory temporal modeling can provide insights into the contribution of each visual Sign Language (SL) cue to recognition performance. To evaluate the proposed framework, we conducted extensive analyzes on two Turkish SL benchmark datasets with different linguistic properties, BosphorusSign22k and AUTSL. While we obtained comparable recognition performance with the skeleton-based state-of-the-art, we observe that incorporating multiple visual SL cues improves the recognition performance, especially in certain sign classes where multi-cue information is vital. The code is available at: https://github.com/ogulcanozdemir/multicue-slr.

Publisher

Frontiers Media SA

Subject

General Neuroscience

Reference97 articles.

1. American sign language words recognition of skeletal videos using processed video driven multi-stacked deep lstm;Abdullahi;Sensors,2022

2. “BSL-1K: scaling up co-articulated sign language recognition using mouthing cues,”;Albanie,2020

3. AranO. Ph.D. thesisVision-based Sign Language Recognition: Modeling and Recognizing Isolated Signs with Manual and Non-Manual Components2008

4. “Sign pose-based transformer for word-level sign language recognition,”;Boháček,2022

5. “Sign language recognition for assisting the deaf in hospitals,”;Camgöz

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Tinysign: sign language recognition in low resolution settings;Signal, Image and Video Processing;2024-06-28

2. Hand Graph Topology Selection for Skeleton-Based Sign Language Recognition;2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG);2024-05-27

3. Isolated sign language recognition through integrating pose data and motion history images;PeerJ Computer Science;2024-05-21

4. Signer-independent sign language recognition with feature disentanglement;Turkish Journal of Electrical Engineering and Computer Sciences;2024-05-20

5. Multi-Stream Isolated Sign Language Recognition Based on Finger Features Derived from Pose Data;Electronics;2024-04-22