Development Of Hand Gesture Recognition
Introduction
In this whole world, there is a vast development of computing techniques and due to ubiquitous methods of computing, current user interaction with the pointing and positioning devices such as mouse, keyboard and pen are not that sufficient. These devices are only limited so the commands set is also limited. Making use of body parts for interaction such as use of hands, is a better option. Hands can be used as an input device for providing natural interaction.
Generally two approaches for hand gesture recognition are there, which are hardware based, in which the user must wear a device and the other is vision based which uses image processing techniques with inputs from a camera. The proposed system is uses vision based techniques which ironically make use of image processing techniques and inputs from a webcam attached with the system. In vision based, inputs are taken from a webcam and systems are generally broken down into four stages, which are skin detection, hand contour extraction, hand the skin region would be detected using skin detection. The hand contour would then found and used for hand tracking and gesture recognition.
Hand tracking is used to navigate the computer cursor and hand gestures are used to perform mouse operations such as right click, left click, scroll up and scroll down. Complex programming ability and intuitiveness are critical attributes of computer programmer to survive in a competitive environment. Computer programmers have been successful in establishing a communication between computers and humans. Basically the idea is to make a machine intelligent and develop a routine that the machine understand the human language efficiently. Development of a user friendly Human Computer Interfaces (HCI) is the basic goal, so that the computer can understand the speech, facial expression and human gestures. Gestures are the non-verbally exchanged information like posing a victory sign from fingers in front of smart phone camera for clicking photos.
Gestures
A gesture is a form of non-verbal communication or non-vocal communication in which visible body actions communicate particular messages, either in place of, or in conjunction with, speech. Gestures include movement of the hands, face, or other parts of the body. Gestures allow individuals to communicate a variety of feelings and thoughts, from contempt and hostility to approval and affection, often together with body language in addition to words when they speak. Bobick and Wilson have defined gestures as the motion of the body that is intended to communicate with other agents. Gesture is an expressive movement of body parts which communicates a particular message between a sender and receiver. Gesture is basically categorized into two categories.
A - Dynamic gesture - Change over a period of timeExamples of dynamic gestureWaving of hand means “goodbye”.
B - Static gesture – observed over a spurt of time. stop sign is an example of static gesture.
Understanding a full message, interpretation of all static and dynamic gestures is necessary for over a period of time. This complex process is gesture recognition.
Gesture based applications
Gesture based applications can be used for many purposes. But here we mostly using for purposes that are
- Controlling multidirectional translation and
- Non linguistic Communication like Sign language.
- Dimensional Geometry Design: Auto CAD (computer aided design) is a Human Computer Interface which is used for designing and drafting of 2 dimensional and 3 dimensional images. By using mouse and keyboard it is difficult for a programmer or user to design 3D design because making a 3D images involves all 6 Degree of Freedom (DOF) and allocating points in space using mouse is very hectic and complex.
Now CAD provides facility to translate points or rotate points of image in any direction. Using this we can also see image from each and every direction according to our requirement to analyze it.
Tele presence: Tele Presence is the use of digital video and networking technology to enable remote individuals to interact as if they were in the same room. The main component of tele presence is that the users are stimulated in such a way that the feels as though they are a part of the remote location. For instances, in corporate settings, meetings between employees who are located in other towns or citie scan be conducted using microphones, videos camera and large video screens. The tele Presence include collaborations especially instructions often, depends on the physical act of one person showing another person how to do something and even if your tele presence robot has an arm of two it may not be at all intuitive for a remote user have effective direct interactions.
Virtual reality: Virtual reality is computer generated 3 dimensional environment with the help of software for users so that user can run its program, test its system by assuming it be a real environment. The virtual environment that are using in present days can be displayed on screens and allow user to implement all of system applications through it. We can divide Virtual reality into:
a) Forming a real environment copy or simulation for testing and training of software, project or systems. We can also use this for education purpose.
b) We can develop a environment which is like real life places but actually doesn't exist. For example in Games like Pubg and GTA Vice city we see many maps or places which is same as real life places but in actual don't exist.
Sign Language: A language that employs signs made with the hands and other movements, including facial expressions and postures of the body, used primarily by people who are deaf. There are many different sign languages as, for example, British and American sign languages. Unlike ASL, BSL uses a two-handed alphabet. In developing countries, deaf people may use the sign language of educators and missionaries from elsewhere in the world. For example, some deaf individuals in Madagascar use Norwegian sign language. By contrast, deaf children in Nicaragua have created their own sign language. Study of the emerging Nicaruagan sign language (NSL) has revealed that children naturally possess learning abilities capable of giving language its fundamental structure.
Algorithmic techniques used for recognition of hand gestures
To collect raw data we use either vision or glove based data collection system and various algorithms used in order to collect the raw data smoothly and correctly. Various algorithms used are:
A. Template Matching
The template matching method for hand recognition postures and gestures recognition used experimental method to know the required number of template of a certain gesture to be taken that should be saved on the database for the matching process of the algorithm. If the system will not be able to detect and recognizes the gesture given with the templates an additional templates must be trained and stored in the database until the system accurately recognize the gestures. The proponents will sum up all the time in second under a certain number the same number of template gesture.
B. Feature Extraction Analysis
In pattern recognition and in image processing, feature extraction is a special form of low-level information from the raw data in order to produce high level information and are used Transforming the input data into the set of features is called Feature Extraction. A robust feature will be invariant, meaning that if the image is rotates shrunk or enlarged, or translated, the value for the recognize hand gestures and postures.
C. Active Shapes Model
The Active Shape Model is trained from manually drawn contours (Surfaces in 3D) In training images. The Active shape Model find the main variations in the training data using Principal Component Analysis, which enables contour is a possible good object contour. Active shape Model is applied to each frame and use the position of the frame as an initial contour is deformed by finding the best texture match for the control points. This is an iterative process, in which the movement of the control points is limited by what the Active Shape Model recognizes from the training data as a normal object contour.
D. Principal Component Analysis
The Principal component analysis is a important technique to understand in the field of statistics and data science interrelated variables is called Principal Components Analysis. When putting the students online to technical didn’t fully address our needs and provided conflicting reducing the dimensions of the features spaces is called dimensionality reduction information.
PCA as accessible as possible the algorithm well cover is pretty technical. Some all of the following will make this article and PCA as a method easier to understand: matrix operation/ linear algebra (matrix multiplication, matrix transposition, matrix inverse, matrix inverses, matrix decomposition, eigenvectors / eigen values). When the dealing image it is highly sensitive to position, orientation, and scaling of the hand in the image in Principal component analysis.
E. Linear Fingertip Model
The linear finger model are finger movement are assume the linear rotational movement. Finger tissue modelling requires linear deformation models. In addition to a deformation model capable of capturing the linear behaviour of finger pad mechanics, appropriate model parameter must be found that reproduce the behaviour of finger of each person’s finger’s. Applied to a wide variety of deformation effects, such as captured video sequence with little control over boundary conditions or more controlled setup with computer vision-based tracking.
Hand detection and recognition models
A. Hidden Markov Model
In Hidden Markov Model(HMM), gestures are captures from every picture that makes a video and in this skin colour blobs are tracked corresponding to hand-face space centered on the face of the user. Hiddn Markov Model is a Markov Model in which made by using unobserved states. These unobserved states is also called Hidden states. In hidden Morkov model, the output is visible which dependent on the states while state is not visible directly.
B. YUV colour space and camshift algorithm
How hand gestures are going to be recognized is dealt with by this algorithms. Following steps are included in order to recognition of hand gesture:
- System Camera or digital camera first takes input of frames which are making video stream of hand motion gestures.
- The existing fames into video stream that are going to be our input are grasped and then process of segmentation is performed and this is based on YUV color space.
- The YUV colour space system is used for distinguish intensity and chrominance of frames that are captured from video frames. In word YUV, Y specifies the intensity in video frame and UV indicates chrominance components in video frames.
- After all this CAMSHIFT algorithm is used for hand to bifurcate it from body as hand and other body parts colours are same. To segment the hand from other body parts we use logic that is the hand is largest connected region.
- Hand position is calculated in each frame of video stream and to calculate it centroid of hand is calculated. To calculate centroid, from initial to last position is calculated.
- To find the path of our input i. e. hand movement we have to join the all centroid points which form a trajectory and by using all these procedures we can track the hand movement.
C. Naïve Bayes’ Classifier
Naïve Bayes Classifier is based on Baye's theorem which is collection of different classification algorithms. This is collection of different algorithms which is based on same logic that is every feature should be classified and these features should not be depend on each other. Bayes’ Theorem finds the probability of an event occurring given the probability of another event that has already occurred. P(E|F)=P(F|E)P(E) / P(F)This method is used for recognition of static hand gestures. It is very efficient and quick method for recognition. As Naive Bayes collect different input according to their features so here different gestures are classified according to their different geometry and orientation. Every frame in the video sequence is used to extract gestures in which background is static.
Data collection for hand gestures
The Input which is the Raw data is collected basically in two ways. The first is to use input devices worn by the user. This measures the various joint angles of the hand and a six degree of freedom (6 DOF) tracking. This consists of one or two instrumented gloves that measure device that gathers hand position and orientation data. The second way to collect raw hand data is to use a computer-vision-based approach by which one or more cameras collect images of the users hands. The cameras grab an arbitrary number of images per second and send the images to image processing routines in order to perform posture and gesture recognition as well to 3D triangulation in order to find the hands‟ position in space. The third way is hybrid approach to collect raw hand data is to combine the previous two methods with the aim of achieving a more accurate level of recognition by using the two data streams to reduce each other's error.
A. Instrumented Gloves
Finger movement through various kinds of sensor technology is measured by Instrumented Gloves. Usually on the back of the hand, the sensors are embedded in a glove or placed on it. Glove-based Input devices are basically categorized based on the production in marketplace and based on their companies. Light-based sensors are used in this glove with flexible tubes with a light source at one end and a photocell at the other. The amount of light that hit the photocells varied as the fingers were bent, thus providing a measure of finger flexion. Meta carpophalangeal joints of the four fingers and thumb along with the interphalangeal joints of the index and middle fingers could be measured by the glove.
B. Vision-Based Technology
Main difficulties in using glove-based input devices to collection of raw posture and gesture recognition data which is possible only by wearing the gloves by the user and attached to the computer. This will restrict freedom of movement similar to the traditional interaction methods. Collection of data for hand posture and gesture recognition requires by vision-based solution consist of four equally important components:The first is the placement and number of cameras used. Placing the cameras is critical because the visibility of the hand or hands being tracked must be maximized for robust recognition. Visibility is important because of the many occlusion problems present in vision-based tracking. The number of cameras used for tracking is another important issue. The second component in a vision-based solution for hand posture and gesture recognition is to make the hands more visible to the camera for simpler extraction of hand data. The third component of a vision-based solution for hand gesture and posture recognition is the extraction of features from the stream or streams of raw image data; the Fourth component is to apply recognition algorithms to these extracted features.
Challenges in hand gesture recognition system
Hand gesture recognition system confronts many challenges, these challenges are:
- Changed illumination: Change in light effects can affects our gesture input as it can change extracted skin region.
- Rotation problem: Problem of Degree of freedom. if degree of freedom is changes then our gesture input may differ and by this output could vary.
- Distinguishing problem: If with hand there is other things which have shape and colour like our skin but that is not our gesture input then it can create problem for system to distinguish between our hand gesture input and background.
- Size problem: This As per human being, we have different shapes and sizes of hand like small child have small hands and adult have big hands so it can create problem for system.
- Position problem: While giving input if hands position differs like hand placed in corner of screen or all dots which detecting hand position doesn't lies on hand then it can create problem to capture input from user.
Conclusion
Hand Gesture Recognition is an important Human Computer interface for interaction between living thing like human and machine system. Hand Gesture Recognition System works like this : first user give input to the system by making hand gestures, then system scanned the gestures by using cam or sensor and deducts it into signal and passes the program, now its program responsibility to first accept the signal then examine what is the input given using gestures, then check if there is any corresponding data is saved into dataset then we will get our result.
System performance enhance when we train our system with maximum number of datasets. As we want to be our system more and more reliable for that we should train it with maximum number of datasets.