Affiliation:
1. Department of Computer Engineering, Datta Meghe College of Engineering Navi Mumbai, India
Abstract
Hand gesture is one of the methods used in sign language for non-verbal communication. It is most commonly used by hearing & speech impaired people who have hearing or speech problems to communicate among themselves or with normal people. Developing sign language applications for hearing impaired people can be very important, as hearing & speech impaired people will be able to communicate easily with even those who don’t understand sign language. This project aims at taking the basic step in bridging the communication gap between normal people, deaf and dumb people using sign language. The main focus of this work is to create a vision based system to identify sign language gestures from the video sequences. The reason for choosing a system based on vision relates to the fact that it provides a simpler and more intuitive way of communication between a human and a computer. Video sequences contain both temporal as well as spatial features. In this project, two different models are used to train the temporal as well as spatial features. To train the model on the spatial features of the video sequences a deep Convolutional Neural Network. Convolutional Neural Network was trained on the frames obtained from the video sequences of train data. To train the model on the temporal features Recurrent Neural Network is used. The Trained Convolutional Neural Network model was used to make predictions for individual frames to obtain a sequence of predictions. Now this sequence of prediction outputs was given to the Recurrent Neural Network to train on the temporal features. Collectively both the trained models i.e. Convolutional Neural Network and Recurrent Neural Network will produce the text output of the respective gesture.