Prediction Of Flood Risk Using Bayesian Approach: Literature Review

People know that the country will experience heavy rains due to the Rainy Seasons or Northeast Monsoon every year. Various incidents can occur when the country is experiencing rainy season such as flash floods and landslides that can lead to destruction of property and loss of life.

As it is known, in 2014, Malaysia has experienced tremendous rain that caused floods that affected the lives of almost 200,000 people, especially those living in northeast states like Kelantan, Terengganu and Pahang. there are also grievances that are received mainly through social media from the affected victims regarding the effectiveness of the flood management system, especially involving the delivery of emergency assistance to affected people with the natural disaster. Making this flood prediction is important so that relevant parties have a strategy to handle the situation to be more effective. The existing flood disaster management system needs to be adapted to suit current conditions and can be applied in the face of changing climate and weather. The relevant parties within the country should work together in a consistent manner to handle the flood issues that are well organized and organized. If all the steps of this flood management can be implemented properly, it is doubtful that the aspects of flood management can be better managed and able to provide assistance to the victims faster.

The Malaysian Meteorological Department (MMD) is a department responsible for providing meteorological, climatic and geophysical services to meet the country's needs in meteorological, climate and geophysical services for wellbeing, safety and sustainable development. MMD monitors the weather on land, sea and air continuously throughout the country. Forecasts, advices and weather and climate alerts and climate surveys are issued by the MMD to reduce disaster risk. To carry out continuous weather monitoring operations 24 hours, 7 days a week.

Classification Techniques

Classification is one of the most important tasks in data mining. In classification, a classifier is built from a set of training examples with class labels. Classification is used to classify each item in a set of data into one of predefined set of classes or groups (G. Kesavaraj and Dr. S. Sukumaran, 2013). The classification is a form of data analysis that can be used to extract models describing important data classes or to predict future data trends. The aim of classification is to accurately predict the target class for each case in the data.

Bayesian Networks

A Bayesian network represents the causal probabilistic relationship among a set of random variables, their conditional dependences, and it provides a compact representation of a joint probability distribution, Murphy (1998). It consists of two major parts which are a Directed Acyclic Graph (DAG) and a set of conditional probability distributions. The variable and direct edge are from a Directed Acyclic Graph (DAG). Each variable A with parents B1, B2. . Bn there is attached a conditional probability table P(A| B1, B2. . Bn). As shown in Figure 1 the graph on the left is valid Bayesian network. The probabilities to specify are P(A), P(B|A), P (C|A, E), P(D|B), P(E|B) and P (F|C, D, E). The one on the right is not valid Bayesian network as the cycle ABEC exists.

Naive Bayes

A Naive Bayes Networks is a simple probabilistic model to classify data into specific classes based on different data features (Friedman et al. , 1997; Ong, 2011). Naive Bayes does not represent any variable dependencies given the class variable. This assumption is called class conditional independence. It is made to simplify the computations involved is considered “naive”. Tree augmented naive Bayes (TAN) is an extended tree-like naive Bayes (Friedman et al. , 1997), in which the class node directly points to all attribute nodes and an attribute node can have only one parent from another attribute node in addition to the class node.

Tree Augmented Naive Bayes

Another prominent finding to enhance Naive Bayes is the Tree Augmented Naive Bayes (TAN) by (Friedman et al. , 1997). Tree Augmented naive Bayes (TAN) is an extended tree-like naive Bayes in which the class node directly points to all attribute nodes and an attribute node can have only one parent from another attribute node which in addition to the class node. In TAN, each node has at most two parents were one is the class node. TAN outperforms naive Bayes in terms of accuracy and still maintains a considerably simple structure. Their algorithm, called SuperParent, also has the time complexity. SuperParent consists of two major steps. The first step searches for a best super parent that improves the predictive accuracy the most. A super parent is a node with arcs pointing to all other nodes without a parent (not counting the class label). The second step determines one best child for the super parent chosen at the first step, again based on the predictive accuracy. After this iteration of the two steps, one arc is added on the TAN, and this process repeats until no improvement is achieved, or n − 1 arcs are added into the tree.

Study of Existing Research

Bayesian approach is chosen for this research to predict risk of flood occurrences in Kuala Krai, Kelantan using Bayesian Networks, Naive Bayes, and Tree Augmented Naive Bayes algorithms. There are many techniques can be used to predict flood. Hence, in this study of existing research, there are three topic of research that will be compared based on the same vision which are to predict the flood.

Predicting flood risk using Spiking Neural Network

In 2007, (Nor Amy Masturah and Muhaini,2007) use Spiking Neural Network to predict flood by use three different algorithms with different tools which are Support Vector Machine (SVM), Multi-Layer Perceptron (MLP) and Dynamic Evolving Spiking Neural Network (deSNN). The objectives of this research is to prove that using Neucube tool by deSNNs method is the most accuracy result compared to SVM and MLP method. The historical data of this research is Spatio/Spectro Temporal Data Modelling (SSTD) where the data cannot use directly in WEKA. The analysis of the data was based on the analysis of space and time. As a comparative experiment, conventional machine learning methods such as MLP and SVM are used as a baseline performance and accuracy measures. The performance set up by take the same time length percentage as the experiments with NeuCube for deSNNs algorithm and Weka for MLP and SVM algorithms. The researcher conduct two experiment. The first experiment will take the whole 100% time length while the second experiment will take 80% time length. Therefore, it was observed for these baseline algorithms, the time length of the training samples and validating samples need to be equal in NeuCube and WEKA tools. For 100% time length, deSNNs was 83. 33%, SVM was 68. 75% and MLP was 54. 17%, while 80% time length, deSNNs was 66. 67% SVM and MLP was 38. 89%. It shows that deSNN algorithm is perform compared to other algorithms.

Big data analytics for flood information management in Kelantan, Malaysia

In 2015, (Aziyati Yusoff et al. , 2015) are trying to utilize this facility for flood disaster management specifically for the state of Kelantan, Malaysia. The research is considering the ordinal type data obtained from the state authorities and proposing on data manipulation through statistical inferences and big data analytics. They are focusing on the ordinal data type from perspective of statistical method. The data that analyzed are from the rainfall and water level reading especially during the flood occurrence. In this research, the researcher used to determine the validity of the water level from its normal readings to danger level when the flood occurred is quite normal. Hence the Z-test was carried out to prove on the proposed hypothesis. By using the Z-test with 95% confidence level. They were reject null hypothesis and the flood incidence and data readings from the rural areas do have significance effects on the flood occurrences in its urban areas. From this research, the researcher identifies the relationship between the rainfall and water level is weak.

Rainfall analysis of the Kelantan big yellow flood

In 2004, (Nor Eliza Alias et al. , 2004) did the rainfall exceed historical records, how rare are they and what cause them. The researcher make an estimation of the return periods uses the Generalized Extreme Value (GEV) distribution model and stations with more than 25 years’ record. Spatial distribution plots of the cumulated rainfall depths were constructed using Inverse Distance Weighting (IDW) interpolation method.

An analysis and assessment of the extreme rainfall of Kelantan show several major key points on the factors influencing the extreme flood in December 2014. First, there were two phases of rainfall. The first phase on 15 to 19 December has heavy rainfalls falling at both the eastern and western side downstream of the Kelantan river basin. This contributes to the increase of water levels of Sungai Galas, Sungai Lebir and Sungai Kelantan. The second phase of rainfall was from 20 to 24 December. During this time higher intensities of rainfall were recorded especially at the upstream of the River Basin in Gunung Gagau. Flood situations become more severe due to the full capacity of soils, rivers and drainage. Areas worst hit was Kuala Krai, Dabong and Manik Urai on 22 to 30 December 2014. The second key point is that a lot of record breaking rainfall events occurred during the flood period and the last key point on the reason of these extreme rainfall events.

In conclusion, global warming and climate change were proven by many of its occurrence and will influence these extreme rainfall events. Table 2. 1 Comparison among techniques in existing research Articles Author/Year Techniques Description Predicting flood risk using Spiking Neural Network (Nor Amy Masturah and Muhaini,2007) Support Vector Machine (SVM), Multi-Layer Perceptron (MLP) and Dynamic Evolving Spiking Neural Network (deSNN). This author wants to proves that using SNN for personalized modelling is more suitable for analysing the SSTD compared to other methods. Big data analytics for flood information management in Kelantan, Malaysia. (Aziyati Yusoff et al. , 2015) Sematic network and flood ontology By using the sematic network and flood ontology, the author expecting that an early warning alert will be produced by this research. Rainfall analysis of the Kelantan big yellow flood (Nor Eliza Alias et al. , 2004) Generalized Extreme Value (GEV) distribution model and Inverse Distance Weighting (IDW) The author wants to know did the rainfall exceed historical records, how rare are they and what causes them. Estimation of the return periods uses the GEV distribution model and stations with more than 25 years’ records. Spatial distribution plots of the cumulated rainfall depths were constructed using IDW interpolation method.

18 May 2020
close
Your Email

By clicking “Send”, you agree to our Terms of service and  Privacy statement. We will occasionally send you account related emails.

close thanks-icon
Thanks!

Your essay sample has been sent.

Order now
exit-popup-close
exit-popup-image
Still can’t find what you need?

Order custom paper and save your time
for priority classes!

Order paper now