Using Networks for Object Recognition
USING NEURAL NETWORK TO VISUALIZE HANDWRITTEN DIGITS USING OBJECT RECOGNITION
Antony Boro Wanyutu Bachelor of Computer Science SCT211-5249/2015
Digital Image Processing
1 [bookmark: Introduction]Introduction
When it comes to identifying images, we humans can clearly recognize and distinguish different features of objects. This is because our brains have been trained unconsciously with the same set of images that has resulted in the development of capabilities to differentiate between things effortlessly. We are hardly conscious when we interpret the real world. Encountering different entities of the visual world and distinguishing with ease is a no challenge to us. Our subconscious mind carries out all the processes without any hassle
Image recognition in humans is well described through pattern recognition. Pattern recognition is a cognitive process that matches information from a stimulus with information retrieved from memory. In the human brain, the parietal lobes control pattern recognition. For example, for facial recognition, the brain has a part called the fusiform gyrus that is able to match the faces located in the temporal lobe in conjunction with the occipital lobe That may seem very complicated, but happens naturally, with just vision and memory. So when it comes to computer, image recognition uses artificial intelligence technology to automatically identify objects, people, places and actions in images
For computer vision (ability of computers to acquire, process, and analyze data coming primarily from visual sources), pattern recognition can also be defined as the process of classifying input data into objects or classes based on key features. There are two classification methods in pattern recognition: unsupervised classification and supervised classification.
Unsupervised classification
Is where the outcomes (groupings of pixels with common characteristics) are based on the software analysis of an image without the user providing sample classes. The computer uses techniques to determine which pixels are related and groups them into classes. The user can specify which algorithm the software will use and the desired number of output classes but otherwise does not aid in the classification process. However, the user must have knowledge of the area being classified when the groupings of pixels with common characteristics produced by the computer have to be related to actual features on the ground.
Supervised classification
Is based on the idea that a user can select sample pixels in an image that are representative of specific classes and then direct the image processing software to use these training sites as references for the classification of all other pixels in the image. Training sets are selected based on the knowledge of the user. The user also sets the bounds for how similar other pixels must be to group them together. These bounds are often set based on the spectral characteristics of the training area, plus or minus a certain increment (often based on 'brightness' or strength of reflection in specific spectral bands). The user also designates the number of classes that the image is classified into.
Image recognition is the ability of a system or software to identify objects, people, places, and actions in images. It uses machine vision technologies with artificial intelligence and trained algorithms to recognize images through a camera system.
How does image recognition work?
Computers ‘sees’ an image as a set of vectors. The values represent the data associated with the pixel of the image. The intensity of the different pixels, averages to a single value, representing itself in a matrix format. In the process of image recognition, the vector encoding of the image is turned into constructs that depict physical objects and features. The vision systems can logically analyze these constructs, first by simplifying images and extracting the most important information, then by organizing data through feature extraction and classification. After the completion of the training process, the system performance on test data is validated. In order to improve the accuracy of the system to recognize images, intermittent weights to the neural networks are modified to improve the accuracy of the systems.
Object recognition is the process of finding instances of objects in images. In the case of deep learning, Object recognition uses the basis of bottom-up processing (collection all the information and trying to piece up what it is) and top-down processing (applying previous perceptual experiences to see whether the environment fits the expectation. It uses the principles of Gestalt Laws to recognize images, e.g. Law of similarity, law of closure, etc. Object recognition involves matching representations of objects stored in memory to representations extracted from the visual image. The key issue in object recognition is the nature of the representation extracted from the image. Theories of object recognition are characterized in terms of five logically independent dimensions: the primitive features or parts extracted from the visual image, the stability of the set of features to transformations of the image, the type of relationships used to describe configurations of features, and the stability of configurations across transformations of the image.
How does object recognition work?
Image Segmentation is the best way of detecting objects (or we can say the region of interest) in images. But, some pre-processing techniques are to be added before going to segment the image. Image enhancement techniques are often proved useful for this purpose. Neighborhood processing (under spatial image enhancement techniques) are found more beneficial in this case.
There are two techniques used in used object recognition:
1. Deep Learning Deep learning is a collection of algorithms used in machine learning, used to model high-level abstractions in data through the use of model architectures, which are composed of multiple nonlinear transformations Deep learning techniques have become a popular method for doing object recognition. Deep learning methods such as convolutional neural networks (CNNs) are used to automatically learn an object’s inherent features in order to identify that object. For example, a CNN can learn to identify differences between cats and dogs by analyzing thousands of training images and learning the features that make cats and dogs different.
2. Machine learning Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. To perform object recognition using a standard machine learning approach, you start with a collection of images (or video), and select the relevant features in each image. For example, a feature extraction algorithm might extract edge or corner features that can be used to differentiate between classes in your data. These features are added to a machine learning model, which will separate these features into their distinct categories, and then use this information when analyzing and classifying new objects. You can use a variety of machine learning algorithms and feature extraction methods, which offer many combinations to create an accurate object recognition model
Object detection is a subset of object recognition, where the object is not only identified but also located in an image. This allows for multiple objects to be identified and located within the same image.
General object detection framework
Typically, there are three steps in an object detection framework.
- First, a model or algorithm is used to generate regions of interest or region proposals. These region proposals are a large set of bounding boxes spanning the full image (that is, an object localization component).
- In the second step, visual features are extracted for each of the bounding boxes, they are evaluated and it is determined whether and which objects are present in the proposals based on visual features (i.e. an object classification component).
- In the final post-processing step, overlapping boxes are combined into a single bounding box (that is, non maximum suppression)
Object Recognition: which object is depicted in the image?
- input: an image containing unknown object(s). Possibly, the position of the object can be marked in the input, or the input might be only a clear image of (not-occluded) object.
- output: position(s) and label(s) (names) of the objects in the image. The positions of objects are either acquired form the input, or determined based on the input image. When labeling objects, there is usually a set of categories/labels which the system 'knows' and between which the system can differentiate (e.g. object is either dog, car, horse, cow or bird)
Object detection: where is this object in the image?
- input: a clear image of an object, or some kind of model of an object (e.g. duck) and an image (possibly) containing the object of interest
- output: position, or a bounding box of the input object if it exists in the image (e.g. the duck is in the upper left corner of the image)
In general, if you want to classify an image into a certain category, you use image classification. On the other hand, if you aim to identify the object in the image, you use object recognition. If you aim to go further and identify the location of objects in an image, and, for example, count the number of instances of an object, you can use object detection.