Ships Detection In Planet Satellite Imagery Using Deep Learning

This paper highlights Hadoop Distributed File System (HDFS) functionality along with thedevelopment of a Deep Learning Application that classifies and detects ships using planet satelliteimagery, which captured at San Francisco Bay. Hadoop Distributed File System (HDFS), used as a storage warehouse, by creating specificdirectories, where dataset consists of ships extracted from planet satellite imagery, essentially forAPI’s development, as long as extracted detection results, such as ships detection on satelliteimages, predictions results, are stored. Pythonic interfaces that were presented in chapters 3 and 4are used for the specific purpose of interaction.

Application Description

Satellite imagery provides unique insights into various markets, including agriculture, defence andintelligence, energy, and finance. New commercial imagery providers, such as planet, are usingconstellations of small satellites to capture images of the entire Earth every day. This flood of new imagery is outgrowing the ability for organizations to manually look at eachimage that gets captured, and there is a need for machine learning and computer vision algorithmsto help automate the analysis process. The aim of this dataset is to help address the difficult task of detecting the location of large ships insatellite images. Automating this process can be applied to many issues including monitoring portactivity levels and supply chain analysis.

Ships Detection Dataset Parameters

The dataset consists of image chips extracted from Planet satellite imagery collected over the SanFrancisco Bay and San Pedro Bay areas of California. It includes 4000 80x80 RGB images labelledwith either a "ship" or "no-ship" classification. Image chips were derived from PlanetScope full-frame visual scene products, which are orthorectified to a 3 meter pixel size. Provided is a zipped directory called shipsnet. zip that contains the entire dataset as. png imagechips. Each individual image filename follows a specific format: {label} __ {scene id} __{longitude} _ {latitude}. png format, where:

  • label: Valued 1 or 0, representing the "ship" class and "no-ship" class, respectively.
  • Scene id: The unique identifier of the PlanetScope visual scene the image chip was extractedfrom. The scene id can be used with the Planet application to discover and download theentire scene.
  • longitude_latitude: The longitude and latitude coordinates of the image center point, withvalues separated by a single underscore.
  • The dataset is also distributed as a JSON formatted text file shipsnet. json. The loaded objectcontains data, label, scene_id, and location lists. The whole dataset was stored and manipulated using Hadoop Distributed File System (HDFS)pythonic interface. Using pythonic interface application threenew storage directions were created-called-‘shipsnet’, ‘scenes’ and ‘json_files’, where images ofships or non ships images, San Francisco and San Pedro Bay images and JSON formatted file werestored, respectively. Furthermore, ‘zip_files’ directory, includes the entire dataset The pixel value data for each 80x80 RGB image is stored as a list of 19200 integers within the datalist. The first 6400 entries contain the red channel values, the next 6400 the green, and the final6400 the blue. The image is stored in row-major order, so that the first 80 entries of the array are thered channel values of the first row of the image. The list values at index i in labels, scene_ids, and locations each correspond to the i-th image in thedata list.

    The ‘ship’ class includes 1000 images. Images in this class are near-centered on the body of a singleship. Ships of different sizes, orientations, and atmospheric collection conditions are included.

    The "no-ship" class includes 3000 images. A third of these are a random sampling of different land-cover features - water, vegetation, bare earth, buildings, etc. - that do not include any portion of anship. The next third are "partial ships" that contain only a portion of an ship, but not enough to meetthe full definition of the "ship" class. The last third are images that have previously beenmislabelled by machine learning models, typically caused by bright pixels or strong linear features. Example images from this class are shown below.

    Convolutional Neural Network Architecture

    Structure of Convolutional Neural Networks is typically composed of three different types of layers. Layer can be either Convolutional, Pooling or fully connected. Each type of layer has different rulesfor forward and error backward signal propagation. There are no precise rules on how the structureof individual layers should be organized. However with exception of recent development CNNs aretypically structured in two parts.

    First part, usually called feature extraction, is using combinations of convolutional and poolinglayers. Second part called classification is using fully connected layers. During the construction of current ‘Ships Detection Application’, a usual Convolutional NeuralNetwork that (CNN) involves four major steps, was added:

    1. Convolution step
    2. Pooling step
    3. Flattening step
    4. Full Connection step.

    The first step was used to perform convolution on the training images, which was done by the firstconvolutional layer. Layer two (2) is two-dimensional (2d), as the images are also two-dimensionalpixel data arrays. After the convolutional layer, a pooling layer was used, which performs the pooling operation usinga max-pooling function. Max-pooling was used because for each region of interest, the maximumpixel is needed. After the first convolutional and pooling, there are three (3) other layers of convolution and poolingwith the same parameters. The output from the last pooling layer was flattened from the two-dimensional (2d) array into a one-dimensional (1d) array, which-after that-was fed into the feed-forward neural network accepting4096 values array.

    The convolutional layer uses 32 filters, where each filter is in the shape of 3X3. The input to theconvolutional layer was a 80X80 pixel coloured image in a RGB format and the layer used rectifierfunction for processing. The pooling layer performs pooling operation. Convolutional operation outputs multiple featuremaps per image and pooling operation runs on this output. Pooling layer takes in the feature mapsfrom convolutional operation and uses a 2X2 matrix to minimize the pixel loss while getting aprecise region around feature locations. The output from pooling layer was finally flattened to get a one-dimensional (1d) single vector, which was then fed to the hidden layer just like in simple feed-forward network introduced before, needed for the two-class classification. Furthermore, a dropout layer was added to overcome the problem of overfitting to some extent.

    Dropout randomly turns off a fraction of neurons during the training process, reducing thedependency on the training set by some amount. How many fractions of neurons someone wants toturn off is decided by a hyperparameter, which can be tuned accordingly. This way, turning off someneurons will not allow the network to memorize the training data since not all the neurons will beactive at the same time and the inactive neurons will not be able to learn anything.

    During the training process of the CNN, Stochastic Gradient Descent (SGD) was used in order totrain the CNN, due to its indication for this kind of networks. Validation split was used was 0. 2, thebatch size was 128, while network was trained for 12 epochs.

    Prediction Results

    After the implementaion of training process, the model on ships satellite images for 12 epochs, andby observing the training accuracy and loss, concluded that the model did a good job, since after theabove number of epochs the training accuracy is 98, 52% and the training loss is quite low, almost4, 38%, while validation accuracy is 99, 14% and validation loss 3, 26%. Evaluating the performance of the model on the test set, it is obvious that the validation loss andvalidation accuracy both are in sync with the training loss and training accuracy. Even though thevalidation loss and accuracy line are not linear, but it shows that the model is not overfitting: thevalidation loss is decreasing and not increasing, and there is not much gap between training andvalidation accuracy. Therefore, it is approved that model's generalization capability became muchbetter since the loss on both test set and validation set was only slightly more compared to thetraining loss.

    Visualization of Prediction Results

    After the completion of training process, evaluating neural network consists the last step. In thiscase, we tried to get a glimpse of well your model performs by picking 10 random images andreceiving, as label, the predicted result. Looking at a random sample of twenty (20) images into the validation data, we assured thatpredicted labels of the previous trained model are quite close to the real labels, confirming by thisway, the accuracy of the current model.

    Except on making predictions of ships or non-ships labels, on images, current algorithm alsosearches on real captured bays satellite images for ship detection into these. HDFS storage directorycalled ‘scenes’ includes 8 Planet satellite images, from bays areas, captured from San Francisco andSan Pedro. Figure 48 illustrates the prediction results during satellite image scanning, usingprevious model prediction results. White, rectangle patches used for highlighting possible hotspotsof ships detection. Contributing with Hadoop Distributed File System, by using pythonic interface application, a newlystorage direction, was created.

    Conclusions

    Big data has become highly prevalent in organization’s day-to-day activities. Amount of big dataand rate at which it’s growing is enormous. And big data technology is sure to soon knock on thedoor of every enterprise, organization, and domain.

    The thesis has given a brief introduction to the core thechnology of Hadoop Ecosystem but there arestill many applications and projects developed on it. Hadoop is the most widely accepted and usedopen source framework to compute big data analytics in an easily scalable environment. It’s a faulttolerant, reliable, highly scalable, cost-effective solution that’s supports distributed parallel clustercomputing on thousands of nodes and can handle petabytes of data. Two main components HDFSand MapReduce contribute to the success of Hadoop. It very well handles storing and analyzingunstructured data. Hadoop is a tried and tested solution in the production environment and welladopted by industry leading organizations like Google, Yahoo, and Facebook Hadoop. In many scientific domains such as astronomy, social science and medicine researchers are facedwith a data avalanche. Cloud computing paradigms are being used in these domains for data-intensive science. Column-oriented databases built on the Hadoop, such as HBase, are known tohave several advantages over traditional row-oriented databases.

    Relational Databases (RDBMS), even with multiple partitioning and parallelizing abilities fails to easily and cost-effectively scale togrowing data needs. At the same time it expects data to be structured and is not so capable of storingand analyzing raw unstructured data which is common to encounter with the advent of wearabletechnologies, smartphones, and social networking websites.

    Python programming language features a dynamic type system and automatic memorymanagement. It supports multiple programming paradigms including object-oriented, imperative, functional and procedural, while has also a large and comprehensive standard library. Using Python, the interaction with Hadoop Distributed File System and HBase environments can be more easilyand in a user-friendly way, by creating the appropriate programming interfaces appropriate, using aquite easy to learn and powerful programming language, which is used in many scientific projects, as long as machine learning, hacking and web developing fields. Our results indicate that the Hadoop and HBase ecosystem including the dependencies (serviceslike zookeeper) are quite mature in terms of stability as long as promising in terms of theperformance characteristics with regard to latency and throughput. The issues pertaining to stabilityof Hadoop and HBase are being investigated by the project developers in more recent releases.

    Future work involves design and benchmarking of machine learning algorithms-such as Ships Detection Deep Learning Application that was created in the current master thesis-on thisinfrastructure and pattern matching from large scale data.

    15 July 2020
close
Your Email

By clicking “Send”, you agree to our Terms of service and  Privacy statement. We will occasionally send you account related emails.

close thanks-icon
Thanks!

Your essay sample has been sent.

Order now
exit-popup-close
exit-popup-image
Still can’t find what you need?

Order custom paper and save your time
for priority classes!

Order paper now