Related Works

Hand Gesture Recognition is one of the difficult problems in Computer Vision and Deep Learning. Unlike Body Pose Estimation or Action Recognition, Hand Gesture Recognition suffers from problems of high degree self-occlusions and a high freedom of movement for each joint in the hand. This necessitates use of sophisticated architectures when trying to solve this problem. For example, Narayana et al [2] use as many as 12 data channels and measure optical flow in their proposed architecture. Such implementations introduce an execution penalty and render such applications not ideal for real time applications. Since most of the proposed use cases of such technology require a real time gesture sensing system, such a problem remains largely unsolved. Attempts at real time gesture recognitions have been made earlier as well. For example, Köpüklü et al. [3] try to alleviate the problem by dividing their architecture into a detector and a classifier. The detector is a light weight model that only detects when a gesture has occurred in the input video. When a gesture is detected, only then the heavy weight classifier is activated to detect the said gesture. This method shows promising results however, splitting the process into two different sub architectures less than desirable. Since an end-to-end learning solution will introduce fewer points of failure, it is more suitable for wide applications. In this project, I attempt to build a light weight model that can carry out hand gesture recognition from an input stream.