Sign language recognition from digital videos using feature pyramid network with detection transformer-Reference-Cited by-同舟云学术

Sign language recognition from digital videos using feature pyramid network with detection transformer

Published:2023-02-28 Issue:14 Volume:82 Page:21673-21685
ISSN:1380-7501
Container-title:Multimedia Tools and Applications
language:en
Short-container-title:Multimed Tools Appl

Author:

Liu Yu,Nand Parma,Hossain Md Akbar,Nguyen Minh,Yan Wei Qi

Abstract

AbstractSign language recognition is one of the fundamental ways to assist deaf people to communicate with others. An accurate vision-based sign language recognition system using deep learning is a fundamental goal for many researchers. Deep convolutional neural networks have been extensively considered in the last few years, and a slew of architectures have been proposed. Recently, Vision Transformer and other Transformers have shown apparent advantages in object recognition compared to traditional computer vision models such as Faster R-CNN, YOLO, SSD, and other deep learning models. In this paper, we propose a Vision Transformer-based sign language recognition method called DETR (Detection Transformer), aiming to improve the current state-of-the-art sign language recognition accuracy. The DETR method proposed in this paper is able to recognize sign language from digital videos with a high accuracy using a new deep learning model ResNet152 + FPN (i.e., Feature Pyramid Network), which is based on Detection Transformer. Our experiments show that the method has excellent potential for improving sign language recognition accuracy. For instance, our newly proposed net ResNet152 + FPN is able to enhance the detection accuracy up to 1.70% on the test dataset of sign language compared to the standard Detection Transformer models. Besides, an overall accuracy 96.45% was attained by using the proposed method.

Publisher

Springer Science and Business Media LLC

Subject

Computer Networks and Communications,Hardware and Architecture,Media Technology,Software

Link

https://link.springer.com/content/pdf/10.1007/s11042-023-14646-0.pdf

Reference34 articles.

1. Bastanfard A, Rezaei NA, Mottaghizadeh M, Fazel M (2010) A novel multimedia educational speech therapy system for hearing impaired children. Springer, pp. 705–715

2. Bauer B, Hienz H, Kraiss KF (2000) Video-based continuous sign language recognition using statistical methods. In: International Conference on Pattern Recognition (ICPR), pp. 463–466

3. Bauer, B., Hienz, H., Kraiss, K. (2000) Video-based continuous sign language recognition using statistical methods. In: International Conference on Pattern Recognition (ICPR)

4. Bhatti UA, Huang M, Wu D, Zhang Y, Mehmood A, Han H (2019) Recommendation system using feature extraction and pattern recognition in clinical care systems. Enterprise Inform Syst 13(3):329–351

5. Camgoz NC, Koller O, Hadfield S, Bowden R (2020) Sign language Transformers: Joint end-to-end sign language recognition and translation. arXiv: 2003.13830

Cited by 13 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A two-stream sign language recognition network based on keyframe extraction method;Expert Systems with Applications;2024-11

2. Refined Intelligent Landslide Identification Based on Multi-Source Information Fusion;Remote Sensing;2024-08-23

3. Enhancing Indian sign language recognition through data augmentation and visual transformer;Neural Computing and Applications;2024-05-13

4. A signer-independent sign language recognition method for the single-frequency dataset;Neurocomputing;2024-05

5. Twin Residual Network for Sign Language Recognition from Video;2024 International Conference on Automation and Computation (AUTOCOM);2024-03-14