Image Recognition Mobile App Using Neural Networks
Abstract— In the tourism field most of the time tourists go to the places where they interested with or without a guide person. Most of the cases they know about the place where they visited. But sometimes they suddenly see beautiful places, and they get pictures from their camera or a mobile phone without knowing the details of the place. After that they want to know about that place or thing, so they want to search in the internet to get that details. In our project we develop a mobile application to get the details where the visitors captured using neural networks. So tourist can get detail or the name of the place or thing. We think this application is very useful to the industry because this application can run without the internet connection. When we use normal mobile applications, we must have internet connection to run most of the applications.
Introduction
THIS project is to make easier the identification of images: specially for tourists. The mobile application is created to recognize images which entered by the users. User can enter an image from a gallery or can capture image from his mobile phone to identify that image. Then system will give the name of the place or the thing. If your paper is intended for a conference, please contact your conference editor concerning acceptable word processor formats for your particular conference.
A challenging for natural lovers is the difficult of flower recognition, even with large and heavy flower guide books it is difficult to identify each flower exactly, and for amateurs finding anything is sing these guides is a monumental task by itself. This project was built out of a request from one of the plant guide writer, prof. Avi Shamida, who requested Prof. Ron Kimmel to build an application for him and his team that will help those identifying plants. This project implemented a system for classifying and collecting flower images. This system has various components such as the CNN model, the web server and it is API and the Android application which uses that API. Real time & Object Detection an Android using Tensor flow [3]This project they have worked upon a model based on scalable Object detection use the deep Neural networks to localize and track people, potted plants, cars and 16 others Real-time camera preview categories.
The large visual recognition ImageNet package from Google is used, it is called ‘Inception5h’. This is a trained model, with the images of the respective categories, which is using neural networks converted it to a graph file. Graphs node are usually huge in number and those are optimized for the use in android. In this system training images will need huge computational speeds and more than one computer supporting GPU. Also a. jar file built with the help of brazel is added to android studio to support the Java and tensorflow integration. The jar file is built with the help of openCV, which is library of programming functions mainly aimed at real-time computer vison. Mobile Application with Optical Character Recognition Using Neural Networks OCR technology is allows the conversion of image which is scanned of printed character into text or any other information that user want using android mobile. OCR technology uses three phases first is Scanning of documents as optical images. Next is Recognition which involves converting those images to character streams representing letters of recognized words and the final element used to accessing or storing the text which are already converted? Converted text is nothing but the extracted text. When, the user begins by capturing an image using mobile camera containing text.
To convert extracted text into the Marathi text synthesizer is used. They converted their extracted text into their language Marathi. In very first step analyses text is done, pronounceable form is done using transforms text. Speech synthesizer are used to perform conversion of English text into Marathi text. Most of the character recognition systems will be recognized through the input image with computer software. There is a large amount space require for computer software and scanner. In order to overcome this problem of computer and scanner occupying a large space, optical character recognition (OCR) system based on android phone is proposed. To overcome this problem with large spacing scanner and computer software, Optical Character Recognition OCR based on android phone camera is used. Because the performances of smart phone is high than computer. There are some phases involves in OCR construction using neural networks.
Those are:
- Scanning
- Segmentation
- Pre-processing
- Feature extraction
- Recognition
- Output
This project was developed using Kohonen algorithm in neural network. Kohonen neural network differs both in how it is trained and how it recalls a pattern. Output from the Kohonen neural network does not consist of the output of several neurons. One of the output neurons is selected as a "winner". This "winning" neuron is the output when a pattern is presented to a Kohonen network. Often these "winning" neurons represent groups in the data that is presented to the Kohonen network.
Google Vision API
Google provide a service as Vision API Easily detect broad sets of objects in user’s images, from flowers, animals, or transportation to thousands of other object categories commonly found within images. Vision API improves over time as new concepts are introduced and accuracy is improved. With AutoML Vision, we can create custom models that highlight specific concepts from our images.
This enables use cases ranging from categorizing product images to diagnosing diseases. But the thing is we must have the internet connection to use this API. Without internet connection we can’t do anything in the system. And also there are some other API to face recognition for mobile devices. Now they are coming with smart phones. But as we said above most of the systems must have internet connection to do that task Recognition of Tourist Attractions [5]With the mass of images they are presented with every day, there are many times when they saw a photo of a place that they would like to visit but are unable to determine where it is. The ability to recognize landmarks from images can be extremely useful both when choosing a travel destination and when trying to identify landmarks in a foreign place. Their project aims to be able to recognize the specific location using machine learning methods. Due to time and complexity constraints, they limited their selves to recognizing ten famous tourist attractions in Beijing. The input to our algorithm is an image. Then they use a Convolutional neural network to extract image features and an SVM which outputs the predicted attraction. They used the following ten attractions in our model: Tiananmen Square, the Forbidden City, Yuanmingyuan Park, Beihai Park,Beijing National Stadium (Bird Nest), Beijing National Aquatics Centre (Water Cube),CCTV Headquarters, the Great Wall, National Centre for the Performing Arts (BirdEgg) and Fragrant Hills Park.
These attractions were chosen because they each have distinct features that make the classification task easier and are well known enough to amass aLarge dataset for training. They created their own dataset of 10000 images and further split this into 7000 training images, 2000 validation images and 1000 test images. These images were obtained through Google image search by modifying a script to download the images returned. To ensure their dataset included images of each attraction from different directions and in different weather conditions, they varied the key terms in our search. They downloaded a total of 1000 images for each category, then manually went through the images to remove any irrelevant results. They further augmented the data by cropping and horizontally reflecting to achieve 1000 images per category. They extracted two sets of features from each image:
- Places - CNN
- Pixel Ratio
Each image was fed through the Places-CNN model, pre-trained on 2,448,873 images from 205 scene classes. The model parameters were obtained from the caffemodel and converted into Tensorflow. They extracted the size 4096 output from the fc7 layer to use as the input features for our classifier.
After examining their raw images and the features extracted by Places-CNN, they found differences between day and night images and features of the same attraction. Initial training of the classifier also showed a drop in accuracy when the dataset contained both day and night images, thus they decided to train separate classifiers for day and night and use an additional binary classifier to determine which model to use at test time.
Methodology
First we identified the need of this application for the tourism industry. Then searched the methods to do it. TensorFlow is an open source software library for high performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices. Originally developed by researchers and engineers from the Google Brain team within Google’s AI organization, it comes with strong support for machine learning and deep learning and the flexible numerical computation core is used across many other scientific domains. They are written in Python.
When creating neural network we used Convolutional neural networks (CNN). It is a special architecture of artificial neural networks, proposed by Yann LeCun in 1988. CNN uses some features of the visual cortex. One of the most popular uses of this architecture is image classification. For example Facebook uses CNN for automatic tagging algorithms, Amazon - for generating product recommendations and Google -for search through among users’ photos.
Let us consider the use of CNN for image classification in more detail. The main task of image classification is acceptance of the input image and the following definition of its class. This is a skill that people learn from their birth and are able to easily determine that the image in the picture is an elephant. But the computer sees the pictures quite differently. Figure 3 What I see vs What Computer see Instead of the picture, the computer identify an array of pixels. For example, if picture size is 300 x 300. In this case, array size will be 300x300x3. Where 300 is width, next 300 is height and 3 is RGB channel values. The computer is assigned a value from 0 to 255 to each of these numbers. This value describes the intensity of the pixel at each point. To solve this issue the computer looks for the characteristics of the base level. In human understanding such characteristics are for example the trunk or large ears. For the computer, these characteristics are boundaries or curvatures. And then through the groups of convolutional layers the computer constructs more abstract concepts. In more detail: the image is passed through a series of convolutional, nonlinear, pooling layers and fully connected layers, and then generates the output.
Example of Image recognition Process
The Convolution layer is always the first. The image (matrix with pixel values) is entered into it. Imagine that the reading of the input matrix begins at the top left of image. Next the software selects a smaller matrix there, which is called a filter (or neuron, or core). Then the filter produces convolution, i. e. moves along the input image. The filter’s task is to multiply its values by the original pixel values. All these multiplications are summed up. One number is obtained in the end. Since the filter has read the image only in the upper left corner, it moves further and further right by 1 unit performing a similar operation.
After passing the filter across all positions, a matrix is obtained, but smaller than an input matrix. This operation, from a human perspective, is analogous to identifying boundaries and simple colors on the image. But in order to recognize the properties of a higher level such as the trunk or large ears the whole network is needed. The network will consist of several convolutional networks mixed with nonlinear and pooling layers. When the image passes through one convolution layer, the output of the first layer becomes the input for the second layer. And this happens with every further convolutional layer.
By Using Convolutional
Neural networks system can identify entered image without having internet connection. It is the most significant advantage for the user. It using Python language and runs in the command prompt. Then created the android application to get images from the camera or the phone gallery. User must insert photo from gallery or by taking photo from camera. Then user can click the show details button and then system will show the name of the place or thing. Those are our expected outcomes.
The Android Application is for the convenience of the user
We have developed the neural network and it can run in the console by writing some code. After typing that code the neural network is starting to proceed. After proceeding, the results will display as follows. If we enter the image of Sigiriya the system will result highest probability to Sigiriya than other trained images.
Limitations
When considering limitations of the project the CNN recognize images by identifying factors of the image. So sometimes there may be places with same look. So they identify a CNN by same places, because that images have same factors. As an example if we select two waterfalls both of them have water and stones. So the CNN identifies image content as same factors, then the system will occur ambiguous so both images identify as the same place. DiscussionFirst we developed the android application using Android Studio. Then we developed the neural network by Using Python. After developing those two parts. We had to link those two to get an output in the android application. But we faced some difficulties with connecting Application and neural network. It wants a powerful machine (GPU) and advanced knowledge about neural networks and android. So the neural network is already working with the ability of training. But the android application couldn’t connected to the neural network. So we may run the application in the command prompt.
Conclusion
As a conclusion this project is developed specially for the tourism industry. In the industry most of the times the tourist must hire a guide person to visit the places. The major reason to hire a guide person is to get direction to the places where they visit and to get details of the places they hope to visit. In our project we support the tourist to get detail of the places from an image recognition Application. In other words by using this application user can visit any area without having a guide person. If a tourist hire a guide person then they must paid. And also sometimes the tourist can be cheated by the guide person by giving wrong details about places. With having this mobile application the tourist can identify what are the places they visited and then they can search that place in the internet.
When developing the application we used the Neural Network Theories to recognize images which the tourist taken. And we developed android application to make easier the task of recognition to the user. If user don’t know about a place, user can insert image from gallery or take a picture from camera and select it to the application. Then the system must give the name of the place. Then the tourist can get details from searching that name from the internet. The system will run without internet connection, because of the neural network.