A Literature Survey On Image And Object Recognition Using Convolutional Neural Networks In Autonomous Vehicles
Abstract
Autonomous cars have the potential to solve traffic problems such as accidents and congestion using cognition with the help of CNN’s. However in the current scenario complete autonomy is still to be achieved. Although today’s CNN’s have brought us closer to autonomy than ever before. Convolutional Neural Networks are deep neural networks that include artificial neurons. These neurons are trained using preset rules and these rules determine whether will provide an output when given several inputs. CNN’s start to learn and make future decisions on the basis of the situations they encounter. One major application of CNN’s is object and image classification.
CNN opens up wide applications in the field of autonomous vehicles where they could analyze various forms of on road footage which includes various scenarios such as collisions, empty roads, traffic blocks, etc. CNN’s will narrow down the image to image grids which could include the possibility of an obstacle. The errors that take place are fed back for reclassification and deeper analysis. Post analysis the CNN will send the appropriate instructions to the car for example accelerate, brake, etc. A literature survey on the use of CNN’s in image recognition and object detection is presented in this paper.
Driving has become an integral part of our daily lives. From driving to carrying out errands to taking long road trips driving can be extremely risky due to an uncontrollable factor which is human error. Distractions such as entertainment systems, cellphones, etc. are one of the biggest reasons of accidents and collisions. Nearly 1.25 million people die in road crashes each year, on average 3,287 deaths a day. This horrific statistic proves that driving is a high risk activity and the realm of autonomous vehicles can help reduce these deaths and eliminate human errors completely. Self-driving cars can just know the destination and let the passengers carry on with their task while the car takes them to their destination. This will eliminate the threat and risk of travelling for daily activities.
With the implementation of deep neural networks an autonomous vehicle can be achieved. Deep neural networks are computerized decision-making networks that mimic the mammalian visual cortex. The structures of deep neural networks consist of multiple layers of neuron-like components. With the use of multiple layers in the network the neurons are able to process and receive input from various parameters. A Convolutional Neural Network is a subtype of a deep neural network. The use of CNN’s is being expanded for the specific purpose of autonomous vehicles. CNN’s are used for obstacle detection and image recognition. In 2016, NVIDIA created an autonomous car using CNN technology. Their car exemplifies and demonstrates the validity of using CNNs in autonomous transportation. Although CNNs have the potential to increase road safety they bring several ethical conundrums into question such as, ‘is a computer going to keep a passenger safer than a human?’ Achieving autonomy through CNN’s will add more sustainability to the driver’s environment. This means that this will be more beneficial for the environment and will also aid in preserving vehicle parts. Even though CNNs are still new, they are the new emerging technology in self-driving vehicles
Convolutional Neural Networks include several layers through which data input is received. These layers are organized in an orderly structure and include convolutional layers, a pooling layer, a fully connected layer and a layer for loss. Every layer has its own functions and as the images progresses from layer to layer the analysis becomes more abstract. This translates that the first layers of the neural network react to different stimuli such as oriented fields, change in light intensity, etc., while the layers ahead concentrate on the identification and recognition of objects and make independent and intelligent decisions about its importance.
In the ReLU (Rectified Linear Units) layer a stack of images becomes a stack of images with no negative values. ReLU is used as the classification function in deep neural networks. (DNN) This layers is used as an activation function in deep neural networks, with Softmax function used as the classification function. Mostly used only in the output layer, the Softmax function is to represent probability distributions of all the possible outcomes generated by the CNN. The fully connected layer handles the task of merging all the data processed from all the layers into one final output. The Fully connected layers produce inner products. All neurons in the full connected layer are connected to all the other outputs provided by the previous layers. The fully connected layer analyzes all the data provided at the same time without the need of a convolution function.
Convolutional Neural Networks learn using what is known as stochastic gradient descent and back propagation. Backpropagation is an algorithm used for learning. The goal is to make the predictions of the CNN match the ground-truth (original input image) by minimizing a cost function. The CNN must be able run in both a feedback and feed forward configuration. During the forward run the errors are collected and processed by the loss layer. Error are reduced with the help of stochastic gradient descent. Stochastic simply means that the training images are fed through the network in small, random subsets.
The demand for CNN’s is growing rapidly when it comes to image recognition. A rough localization is performed by presenting each pixel with its neighbourhood to a neural net which is able to indicate whether this pixel and its neighbourhood are the image of the search object. However in the current scenario convolutional neural networks are being used to identify specific objects which means that the network processes the given image and tries to locate or identify special features in the input image such as other cars, obstacles, pedestrians, etc. To help the CNN in classification of objects in the given image the CNN must be trained first through several test images. The general hierarchy for the identification of an image is as follows: pixel → edge → texton → motif → part → object. Pixels and edges are just as generic as one might expect. Textons are micro structures and form the basic elements in pre attentive visual identification. Textons are small patterns which are merged into motifs. Motifs are sections of repeating patterns that can later be combined into larger image parts. These parts are then combined to form a whole image to be identified. I mage classification begins with the division of the input image into sections/pixels. The input then passes through the CNN for analysis. The kernels in the convolutional, pooling, ReLU and full connected layers identify special features in the given image. The matrix of values becomes more detailed and accurate with the progression of the layers.
Each grid square in figure 5 represents one kernel which is passed over each pixel of the image. The final output is a representation of the original ground-truth image. In the current scenario most Convolutional Neural Networks are made to identify specific objects such as faces, wildlife, handwriting, etc. For the CNN classification to work well and efficiently in self-driving vehicles the network must be able to classify various objects and should be able to detect possible obstacles in the image. Convolutional Neural Networks in Self-Driving cars needs to process and analyse an all-round 360 degree constantly changing environment. The car can include a rotating video camera to collect all the required driving data. The machine must be able to recognize metric, symbolic, and conceptual knowledge. Metric knowledge is required to keep the vehicle in its lane and a safe distance from other vehicles. Symbolic knowledge allows the vehicle to classify lanes and conform to basic rules of the road. Conceptual knowledge gives the vehicle to the capability to understand and formulate trends between traffic participants and the driving scene.
Object detection is a technique through which bounding boxes and class labels are created. Bounding boxes surround the objects detected in the image. A machine overtook human level performance in the ImageNet classification challenge for the very first time in 2015. Compared to image classification, object detection is far more complex and several concepts such as super human performance through deep learning is still not clear and remains puzzling. The problem arises when we need to locate several objects at once with the class and the number of instances. Surpassing this problem in an efficient way could be a major breakthrough in the development of self-driving cars. The genesis of R-CNN came in 2014, when a group at UC Berkley aimed to generalize the successes achieved with CNN to the task of object detection. Their technique combines an autonomous regional proposal algorithm and a Convolutional Neural Network, which identifies and tells us the regions containing objects and compresses them into a fixed length vector. All the feature vectors are then classified by class-specific support vector machines and the region proposals are reduced using non maximum suppression. NMS is a greedy algorithm which sorts detections by their object confidence scores, takes the highest scoring detection and removes lower-scoring detections which have an IOU greater than some threshold. At last localization is refined using linear regression using the CNN, forming the anticipated bounding-boxes of objects in the input image.
Conclusion
The multi-layered trainable structure of the CNN’s sets them apart from other neural networks. They include parameters which can be varied to best fit the intended purpose. The neurons are constantly improving the accuracy of their outputs by learning from each piece of input data. This is specifically very useful in the application of self-driving vehicles to differentiate both the existence and distance of obstacles in front of the vehicle. CNN’s are the backbone in achieving full autonomy and will continue to become more and more advanced.
References
- McNeal, M. (2015, August 07). Fei-Fei Li: If We Want Machines to Think, We Need to Teach Them to See. Retrieved August 2, 2019, from https://www.wired.com/brandlab/2015/04/fei-fei-li-want-machines-think-need-teach-see/
- Schmidhuber, J. (january 2015). Deep learning in neural networks: An overview. Elsevier,61, 85-117. Retrieved August 1, 2019, from https://www.sciencedirect.com/science/article/pii/S0893608014002135.
- Road Safety Facts. (n.d.). Retrieved August 2, 2019, from https://www.asirt.org/safe-travel/road-safety-facts/
- Erin Welling 'CONVOLUTIONAL NEURAL NETWORKS IN AUTONOMOUS CONTROL SYSTEMS' University of Pittsburgh Swanson School of Engineering. 10.2.2017 Accessed 8.8.2019 from https://pdfs.semanticscholar.org/545b/2ce4bc5ed7b1c1089020b3e53c1d67186370.pdf
- “Convolutional Neural Network.” Wikimedia Commons. n.d. Accessed 2.21.2017. https://commons.wikimedia.org/wiki/File:Typical_cnn.png
- E. Weisstein. “Convolution.” Wolfram Math World. n.d. Accessed. 2.7.2017. http://mathworld.wolfram.com/Convolution.html
- T. Hartley. “When Parallelism Get Tricky: Accelerating Floyd-Steinberg on the Mali CPU.” ARM Community. 11.25.2014. Accessed 2.7.2017 https://community.arm.com/graphics/b/blog/posts/whenparallelism-gets-tricky-accelerating-floyd-steinberg-on-themali-gpu
- Honglak Lee, Roger Grosse, Rajesh Ranganath, Andrew Y. Ng “Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations” Retrieved August 4, 2019
- J. Wu. “Introduction to Convolutional Neural Networks.” National Key Lab for Novel Software Technology, Nanjing University. 4.27.2016. Accessed 1.25.2017. http://cs.nju.edu.cn/wujx/paper/CNN.pdf (Learning in a CNN)
- R., C., & Y. (august 1994). Original approach for the localisation of objects in images. IEEE,141(4), 245-250. Retrieved August 5, 2019, from https://ieeexplore.ieee.org/document/318027.
- A. Chernodub, G. Paschenko. “Lazy Deep Learning for Images Recognition in ZZ Photo App.” Al&Big Data Lab. 4.23.2015. Accessed 2.21.2017. https://www.slideshare.net/Geeks_Lab/9-48711415
- Dónal Scanlan and Lucía Diego Solana 'Deep Learning for Robust Road Object Detection' Department of Mathematical Sciences CHALMERS UNIVERSITY OF TECHNOLOGY Gothenburg, Sweden 2017 Accessed 4.7.2017 http://publications.lib.chalmers.se/records/fulltext/249747/249747.pdf