Author:
Golash Richa,Kumar Jain Yogendra
Abstract
Real-time visual hand tracking is quite different from commonly tracked objects in RGB videos. Because the hand is a biological object and hence suffers from both physical and behavioral variations during its movement. Furthermore, the hand acquires a very small area in the image frame, and due to its erratic pattern of movement, the quality of images in the video is affected considerably, if recorded from a simple RGB camera. In this chapter, we propose a hybrid framework to track the hand movement in RGB video sequences. The framework integrates the unique features of the Faster Region-based Convolutional Neural Network (Faster R-CNN) built on Residual Network and Scale-Invariant Feature Transform (SIFT) algorithm. This combination is enriched with the discriminative learning power of deep neural networks and the fast detection capability of hand-crafted features SIFT. Thus, our method online adapts the variations occurring in real-time hand movement and exhibits high efficiency in cognitive recognition of hand trajectory. The empirical results shown in the chapter demonstrate that the approach can withstand the intrinsic as well as extrinsic challenges associated with visual tracking of hand gestures in RGB videos.