A Comparative Study of Two Deep learning Architectures for Gesture Recognition on ArSL2018 Dataset
Author:
Lahiani Houssem1, Frikha Mondher1
Abstract
Abstract
This research paper conducts a comparative analysis of two convolutional neural network (CNN) architectures to examine their performance in recognizing gestures using the ArSL2018 dataset, a significant resource comprising 54,049 images across 32 classes representing Arabic Alphabet Sign Language (ArASL). Our goal is to determine the most effective technological application for facilitating communication within the Arabic-speaking deaf community, thereby enhancing their interaction with digital platforms and everyday technology interfaces. The first architecture employs a pre-trained MobileNetV2 model as a feature extractor followed by a fully connected layer, while the second architecture builds upon the MobileNetV2 by incorporating additional convolutional and pooling layers. Through rigorous evaluation using multiple metrics including accuracy, precision, recall, and F1-score, we discovered that the first architecture achieved a higher overall accuracy of 95% on the test set compared to 93.85% for the second, with per-class accuracies ranging from 82.91–99.10%. These findings suggest that simpler CNN architectures with pre-trained feature extractors are not only effective but also potentially more efficient for integrating into assistive technologies. This study underscores the potential of gesture recognition systems to improve the quality of life for the deaf and hard-of-hearing by providing more natural, intuitive ways to interact with technology. By focusing on user-centric design and ethical AI deployment, our findings contribute to the broader discourse on developing responsible, inclusive technologies that uphold human dignity and foster social inclusion.
Publisher
Springer Science and Business Media LLC
Reference20 articles.
1. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25, edited by F. Pereira, C. J. C. Burges, L. Bottou and K. Q. Weinberger, 1097–1105. : Curran Associates, Inc., (2012) 2. Design of a Hand Pose Recognition System for Mobile and Embedded Devices;Lahiani H;Int. J. Recent. Contrib. Eng. Sci. IT (iJES),2022 3. Latif, G., Mohammad, N., Alghazo, J., AlKhalaf, R., AlKhalaf, R.: ArASL: Arabic Alphabets Sign Language Dataset. Data in Brief, Volume 23, 103777, (2019). https://doi.org/10.1016/j.dib.2019.103777 4. Lahiani, H., Neji, M.: Hand pose estimation system based on combined features for mobile devices. International Journal of Intelligent Information and Database Systems 13, no. 2–4 (2020): 436–453. (2020). https://doi.org/10.1504/IJIIDS.2020.109465 5. Alzohairi, R., Alghonaim, R., Alshehri, W., Aloqeely, S.: Image based Arabic sign language recognition system, International Journal of Advanced Computer Science and Applications, vol. 9, no. 3, 2018, (2018). https://doi.org/10.14569/IJACSA.2018.090327
|
|