Affiliation:
1. Department of Computer Science and Engineering, Anna University, Chennai, India
Abstract
Dynamic signs in the sentence form are conveyed in continuous sign-language videos. A series of frames are used to depict a single sign or a phrase in sign videos. Most of these frames are noninformational and they hardly effect on sign recognition. By removing them from the frameset, the recognition algorithm will only need to input a minimal number of frames for each sign. This reduces the time and spatial complexity of such systems. The algorithm deals with the challenge of identifying tiny motion frames such as tapping, stroking, and caressing as keyframes on continuous sign-language videos with a high reduction ratio and accuracy. The proposed method maintains the continuity of sign motion instead of isolating signs, unlike previous studies. It also supports the scalability and stability of the dataset. The algorithm measures angular displacements between adjacent frames to identify potential keyframes. Then, noninformational frames are discarded using the sequence check technique. Pheonix14, a German continuous sign-language benchmark dataset, has been reduced to 74.9% with an accuracy of 83.1%, and American sign language (ASL) How2Sign is reduced to 76.9% with 84.2% accuracy. A low word error rate (WER) is also achieved on the Phoenix14 dataset.