Vision Transformers and Transfer Learning Approaches for Arabic Sign Language Recognition

Author:

Alharthi Nojood M.1,Alzahrani Salha M.1ORCID

Affiliation:

1. Department of Computer Science, College of Computers and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia

Abstract

Sign languages are complex, but there are ongoing research efforts in engineering and data science to recognize, understand, and utilize them in real-time applications. Arabic sign language recognition (ArSL) has been examined and applied using various traditional and intelligent methods. However, there have been limited attempts to enhance this process by utilizing pretrained models and large-sized vision transformers designed for image classification tasks. This study aimed to create robust transfer learning models trained on a dataset of 54,049 images depicting 32 alphabets from an ArSL dataset. The goal was to accurately classify these images into their corresponding Arabic alphabets. This study included two methodological parts. The first one was the transfer learning approach, wherein we utilized various pretrained models namely MobileNet, Xception, Inception, InceptionResNet, DenseNet, and BiT, and two vision transformers namely ViT, and Swin. We evaluated different variants from base-sized to large-sized pretrained models and vision transformers with weights initialized from the ImageNet dataset or otherwise randomly. The second part was the deep learning approach using convolutional neural networks (CNNs), wherein several CNN architectures were trained from scratch to be compared with the transfer learning approach. The proposed methods were evaluated using the accuracy, AUC, precision, recall, F1 and loss metrics. The transfer learning approach consistently performed well on the ArSL dataset and outperformed other CNN models. ResNet and InceptionResNet obtained a comparably high performance of 98%. By combining the concepts of transformer-based architecture and pretraining, ViT and Swin leveraged the strengths of both architectures and reduced the number of parameters required for training, making them more efficient and stable than other models and existing studies for ArSL classification. This demonstrates the effectiveness and robustness of using transfer learning with vision transformers for sign language recognition for other low-resourced languages.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Cited by 8 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Intelligent real-life key-pixel image detection system for early Arabic sign language learners;PeerJ Computer Science;2024-06-14

2. Convolutional Neural Networks for Indian Sign Language Recognition;International Journal of Innovative Science and Research Technology (IJISRT);2024-06-11

3. Efhamni: A Deep Learning-Based Saudi Sign Language Recognition Application;Sensors;2024-05-14

4. Applying Swin Architecture to Diverse Sign Language Datasets;Electronics;2024-04-16

5. A Brief Review of Sign Language Recognition Methods and Cutting-edge Technologies;2024 5th International Conference on Computer Engineering and Application (ICCEA);2024-04-12

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3