Utilization Of Satellite Data For Socio Economic Development
Abstract
In any developing country, it is a challenging task to estimate socio-economic development. India, one of the largest developing countries of the world, faces the same challenge. National surveys, which currently provide the estimate of socio-economic development, are unhelpful due to their own respective limitations. The most crucial national survey is Census data in India, which provides the closest estimate of socio-economic development states, is compiled once in a decade. Thus, any planning done based on this data can be erroneous over the longitudinal period of time. Recent advancement in satellite imagery and its easy availability has propelled researchers to use this as a better and economically viable solution. Correlation of satellite data using big data analytics with the census data can reveal interesting patterns and thus reduce the timeframe in which data is analysed and nation does not have to wait for span of a decade for any policy to be optimally implemented. Aim of this report centralises around surveys which used satellite data for socio economic development and proposed work of how Indian Census data can be used collaborated with satellite data for faster analysis. This report discusses satellites like Landsat, VIIRS, and MODIS that can be used for extrapolation of facts.
KEYWORDS: Remote sensing-Satellite imagery-Landsat-VIIRS-Google Earth Engine- Convolutional Neural Networks
Introduction
Satellite data offers a great opportunity for quantifying some real world variables without much human intervention. Remote sensing data like satellite imagery is becoming increasingly available, detailed and inexpensive. New technologies fuelling the Big Data are creating unprecedented opportunities for designing, monitoring, and evaluating policy decisions and for directing humanitarian efforts. Measuring human development has long been a focus of international development research and policy. Timely and accurate data can assist government actors in optimally targeting policies and efficiently allocating resources. Unfortunately, reliable data is typically very expensive to collect, and thus a major obstacle to effective policy design has been the lack of timely and reliable socioeconomic data. In the past several years, recent developments in machine learning and geospatial analysis have enabled novel data-intensive approaches to the measurement of poverty.
In a country like India, where there is a dearth of reliable and high-frequency data, an evidence-based design that is grounded on accurate estimates of socio-economic development indicators are difficult. Census data collection [The Ministry of Home Affairs, Government of India, 2011] for the 1. 2 billion population is cumbersome and expensive, and is carried out infrequently only about once in a decade. Census is also error prone and noisy due to the large variability in the data collection processes across the geography, and there is often no validation [Brown, 1971; Vemuri, 1994; Bose, 2008]. Smaller sample surveys [The Ministry of Statistics and Programme Implementation, Government of India, 2017; The Ministry of Health and Family Welfare, Government of India, 2016] tend to be more accurate, but they too are infrequent, and, in general, they do not comprehensively address all aspects of the economy. Frequent assessments of Economic Developments is possible using Satellite Data. The aim revolves around predicting census variables using Satellite Data. We investigate possibility of predicting census data with the help of various daytime and night-time satellites and how will satellite data aid to predict values useful for imposing a better economy and improvements in the country.
Motivation
The use of satellite imagery to serve as a proxy for socio-economic growth has seen a lot of attention in recent years. Satellite data is collected at a high spatial frequency, several times a year for many satellites, and is available over many years. Much of this analysis using satellite data has been facilitated with rapid advances in machine learning systems which allow for the processing of very large datasets. New techniques are used for enhancement of the society like Satellite Data Analysis. Thus we can eradicate the errors involved in manual collection of Census data and completely rely on the satellite data available throughout the year. The motivation driving this analysis would be advancements in the technologies and using them for intermediate predictions and thus leading to better implementation of rules for the development at Village or District level of India. It has always been a challenging task to statistically explain this hypothesis based on the data a country collects as its Census. Modern day Computer Science methodologies allow us to leverage the vast amount of resources available in the field of Machine Learning, Big Data Analytics and Statistics. The hypothesis if explained properly can go a long way in aiding the formulation and implementation of economic policies in an organized way. Big-Data analysis on the socio-economic aspects can be systematically structured to reveal interesting trends and patterns which can assist in developing a better understanding of the relationships that exist between the various socio-economic parameters.
Objective
The objective to the report involves case study of the work that has been contributed in Satellite Data for the advancement of socio economic development and thus proposing what can be done for the Census data of India for the betterment at the village level or district level. The objective involves using the daytime and night-time satellites. Various machine learning algorithms help in achieving the objective to predict the socio economic variables of the census data using satellite data.
Application
Applications in this domain involves usage of satellite data to predict census values. If satellite data encounters deviation from what the normal city should be; it thus becomes an indicator of aberration in the city there and government can take actions to assign mercenaries to visit the place and serve the people of the nation for their betterment. The proposed project plans predictive satellite data analysis and in the end building a web portal for the government officials to get notified with the satellite data values and improvise the place or implement a policy. This deviates and is better from the traditional methods. Former methods were collection of census data in a decade and intermediate requirements were not well predicted till the next census data collection.
Theoritical Background And Literature Survey
In the previous chapter it has been discussed about the introduction of Satellite Data, motivation behind this report and the objectives of the report. A literature survey is a text of a scholarly paper, which includes related works and advancements in the fields of Satellite Imagery. Literature review proves to a reference by which we can extend our problem statement and overcome the shortcomings of the research papers found and achieve our desired goals.
Theoritical Background
In the subsequent section the report elaborates on some concepts required to understand the proposed methodologies and data collection.
Google Earth Engine
Google Earth Engine combines a multi-petabyte catalogue of satellite imagery and geospatial datasets with planetary-scale analysis capabilities and makes it available for scientists, researchers, and developers to detect changes, map trends, and quantify differences on the Earth's surface. [1] Satellites give this map images of the Earth and they are transformed by Google Earth into a 3D globe. Google earth engine enables us to extract satellite data. We can select which satellite we want and also there exist an option to select the timestamp we require for the satellite data.
K Means Clustering
K means clustering is a simple unsupervised learning technique used when we need to cluster our data into K clusters where K is an integer and we do not have a Y value for the particular data.
Anova Test
Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among group means in a sample.
Transfer Learning Approach
Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task. It is a popular approach in deep learning where pre-trained models are used as the starting point on computer vision and natural language processing tasks given the vast compute and time resources required to develop neural network models on these problems and from the huge jumps in skill that they provide on related problems. Pre-trained Model Approach1. Select Source Model. A pre-trained source model is chosen from available models. Many research institutions release models on large and challenging datasets that may be included in the pool of candidate models from which to choose from.
Reuse Model. The model pre-trained model can then be used as the starting point for a model on the second task of interest. This may involve using all or parts of the model, depending on the modeling technique used.
Tune Model. Optionally, the model may need to be adapted or refined on the input-output pair data available for the task of interest
Convolutional Neural Networks
To teach an algorithm how to recognise objects in images, we use a specific type of Artificial Neural Network: a Convolutional Neural Network (CNN). Their name stems from one of the most important operations in the network: convolution. Convolutional Neural Networks have a different architecture than regular Neural Networks. Regular Neural Networks transform an input by putting it through a series of hidden layers. Every layer is made up of a set of neurons, where each layer is fully connected to all neurons in the layer before. Finally, there is a last fully-connected layer the output layer that represent the predictions. First of all, the layers are organised in 3 dimensions: width, height and depth. Further, the neurons in one layer do not connect to all the neurons in the next layer but only to a small region of it. Lastly, the final output will be reduced to a single vector of probability scores, organized along the depth dimension.
Literature Survey
In any nation, human development outcomes are a function of economic growth, social policy, and poverty reduction measures at the macro-level states [India Human Development Report, 2011]. Monitoring socio-economic growth is a challenge especially in a country like India with a population of more than 1. 2 billion. In 2011, India had 28 states and 640 districts. Combining satellite imagery and machine learning to predict poverty[4]is a paper which uses DHS(Demographic Health Survey) data to predict poverty in sab saharan district using CNN and transfer learning approaches. They have predicted poverty in Rwanda district with a R2 of 0. 67.
In this paper[4], satellite imagery is underwent through convolutional neural networks to measure set of human development indicators. It replicates the work that wealth index that is poverty can be predicted using CNN and satellite imagery. Further it generalises prediction of wealth to other countries. They also surmise that this approach is not getting generalised to prediction of other parameters like drinking water accessibility, health indicators etc. It also concluded that outside Sub Saharan African regions these algorithms did not work well. In [5], Suban Banergee develops a machine learning based tool for accurate prediction of socio-economic indicators from daytime satellite imagery. The diverse set of indicators are often not intuitively related to observable features in satellite images, and are not even always well correlated with each other. The predictive tool is more accurate than using night light as a proxy, and can be used to predict missing data, smooth out noise in surveys, monitor development progress of a region, and flag potential anomalies.
Finally, they used predicted variables to do robustness analysis of a regression study of high rate of stunting in India. They used machine learning to build a deep CNN based regression model for a hand crafted asset vector from input satellite images. Though the model is static and is trained with cross-sectional data, they demonstrated that it can be effectively used to predict the asset model from satellite images acquired at different times, making it an extremely useful alternative between surveys. Further, the asset model can be used for transfer learning and prediction of a variety of other socio-economic and health parameters, and demonstrate the use of predicted variables using a regression case study to understand the determinants of stunting. Nighttime lights also have a tendency to extend into neighbouring regions, called the blooming effect, [7]. Our work explores the prediction power of nightlights and shows significant improvement when combined with MODIS data another type of satellite data. We first show the use of unsupervised machine learning techniques to generate true labels as indicators of socio-economic development in districts of India and then develop a regression model to predict these true labels by combining Nightlights with MODIS data.
Data
The data has been categorized into 2 parts. One of them is Census data 2011 and other is the Satellite Data available. Data extraction and categorization is been discussed in the following subsections.
Census Data 2011
Census Data available on the official website of the government of India for the year 2011 has been used for the study. A crawler was made to extract the data from the website. Data was collected for every state which had values for each village for approximately 135 socio-economic variables. Census Data for the year 2011 gives values for approx. 135 socio-economic variables. Studying the impact of these variables, we categorized them as social development and economic development parameters. The categorization of variables is itself debatable as there are variables which can be categorized as impacting both social and economic development. Nevertheless aim was to study the relation between these variables and categorization was not a hindrance in our study.
Economic Parameter
In a country like India employment in many districts is predominantly agriculture based. While many districts have gone the industrial way. We also take into consideration the fact that on an average, people employed in non-agricultural avenues earn more. The Primary census abstract table contain 90 categories about population enumeration while Houselisting & Housing table contains data about 140 amenities and assets in the household. Studying the impact of these variables, the socio-economic indicators for the analysis were created by merging and pre processing both the files. In a country like India employment in many districts is predominantly agriculture based. While many districts have gone the industrial way. The type of employment was therefore taken as a socio-economic indicator. The Living conditions of people were grouped as being advanced(ADV), intermediate(INT) and rudimentary(RUD).