Affiliation:
1. School of Informatics Xiamen University Xiamen Fujian China
2. Key Laboratory of Digital Protection and Intelligent Processing of Intangible Cultural Heritage of Fujian and Taiwan (Xiamen University) Ministry of Culture and Tourism Xiamen Fujian China
3. Department of Information Technology, Faculty of Computing Bayero University Kano Kano Nigeria
Abstract
SummarySign language serves as a vital communication medium for the deaf community, encompassing a diverse array of signs conveyed through distinct hand shapes along with non‐manual gestures like facial expressions and body movements. Accurate recognition of sign language is crucial for bridging the communication gap between deaf and hearing individuals, yet the scarcity of large‐scale datasets poses a significant challenge in developing robust recognition technologies. Existing works address this challenge by employing various strategies, such as enhancing visual modules, incorporating pretrained visual models, and leveraging multiple modalities to improve performance and mitigate overfitting. However, the exploration of the contextual module, responsible for modeling long‐term dependencies, remains limited. This work introduces an Adversarial Autoencoder for Continuous Sign Language Recognition, AA‐CSLR, to address the constraints imposed by limited data availability, leveraging the capabilities of generative models. The integration of pretrained knowledge, coupled with cross‐modal alignment, enhances the representation of sign language by effectively aligning visual and textual features. Through extensive experiments on publicly available datasets (PHOENIX‐2014, PHOENIX‐2014T, and CSL‐Daily), we demonstrate the effectiveness of our proposed method in achieving competitive performance in continuous sign language recognition.
Funder
National Natural Science Foundation of China