A Review On Flight Delay Predication Systems Using Conventional And Modern Techniques
Abstract
This research paper is written with an objective to minimize the Airline delay by using Data analytics and modelling techniques. This Methodology deals with reviewing all major factors that influence the Airline delay and finding the rational link between them and finding the rational link between them by Understanding various entity models for delay prediction. Also, this paper discusses what are the possible challenges faced by different approaches and critically analyses them to check the necessary reason.
Index Terms
Bureau of Transport and Statistics (BTS), Delay Propagation Tree (DPT), Enhanced Delay Propagation Tree (EDPT), Federal Aviation Administration (FAA), Gradient Booster Classifier (GBC), International Air Transport Association (IATA), National Aviation System (NAS), National Oceanic and Atmospheric Administration (NOAA), Non-Independent and Identically Distributed Variables (non-IID), Stochastic Gradient Descent (SGD), Successive Short Path Method (SSPM),Synthetic Minority Oversampling Technique (SMOTE),Traffic Aware Strategic Aircrew Request (TASAR).
Introduction
Airline industry has seen a tremendous growth in the 21st century. Ever since air travel was made easy, accessible and affordable it had an everlasting impact on this particular industry. Air travel has been a popular option for most of the people who choose to cut their travel time. The frequency of passengers traveling by Air has been growing rapidly with approximately 3300 planes with 660,000 people are traveling by air at a given point of time. On an average 8 million people travel by air from different parts of the world every single-day and 8. 4 billion passengers annually. Worldwide commercial airlines carried just over four billion passengers on scheduled flights in 2017. With increase in number of flights and its passengers, it becomes crucial for airlines to provide the quality services every single time.
International Air Transport Association (IATA), the industry’s revenue doubled over recent years from US$ 369 billion in 2004 to US$ 824 billion in 2018.
Flight Delay is one of the major issues that has to be dealt every day. The Federal Aviation Administration (FAA) defines a flight to be delayed if it is 15 minutes later than its scheduled time of Departure or Arrival. On a global scale at least 3000-4000 flights are delayed due to various factors. Some of the major factors that cause delay are Weather, National Aviation System (NAS), Late Arrivals, Security issues, Maintenance issues, Air traffic Congestion. The Following Fig. 1 shows the arrival and departure delay used in airline industry. Weather caused delays occur mostly when the conditions for an aircraft to fly over a specific region are minimal. It may cause serious harm to the aircraft and passengers; hence airlines do not prefer to operate the aircraft in such conditions which ultimately is a loss-making factor. According to study conducted by (Stouffer, et al. 2017) where the description of weather-based delay was quite significant. A preliminary investigation suggested that annually nine million minutes of time is lost due to unpredicted weather-related issues. This is considered a key factor in terms of flight handling and time scheduling. Weather related delay are always accounted to be a major factor which has to be dealt with on a regular basis.
The National Airspace System (NAS) is accountable for delays as they are responsible for scheduling and routing a plane. Unless they handle the data in balanced form, delay cannot be minimized. Security Delays are caused by evacuation of a terminal or concourse, re-boarding of aircraft because of security breach, inoperative screening equipment and/or long lines in excess of 29 minutes at screening areas. If it is discovered that there is a maintenance issue with your aircraft, the flight will not embark until the issue has been fully addressed. Sometimes, these issues are being worked on even as passengers board the plane, meaning the delay you experience might take place entirely on the tarmac. Other times, in the case of larger issues, your airline might make the call to switch planes entirely for the safety of everyone involved. The major problem with flight delay is that airlines lose a lot of money. According to the US Congress Joint Economic Committee, the cost associated with domestic flight delays in the United States during 2007 was estimated at $25. 7 billion ($12. 2 billion in increased airline operating costs, $7. 4 billion in passenger time lost, and $6. 1 billion in costs to related industries). This also can be one major factor why most of the well-established airline may run into financial crisis.
Methods
Successive Shortest Path Method:
Flight delay maybe unpredicted but using the historical flight data, delay maybe reduced using data modelling and Predictive analytic techniques. To justify the statement which is made, this paper is used to review some of the conventional and modern techniques and they are critically evaluated. This model gives a starting point for advancing further into delay prediction analysis. The first conventional technique used to evaluate delay due to cancellations is the SUCCESSIVE SHORT PATH METHOD (SSPM) (Gershkoff, I. 1989). In this approach the cancellation has been linked to a major factor for flight delay. This model focuses on good dataset for flight cancellations to find a relationship between delay and shortage. This model can be used as a starting reference for the flight delay prediction as it has investigated on one aspect of flight delay using cancellation as a key. In this model Time is used as an attribute and is placed on the Y-axis, and flight data is placed on the X-axis. The flight data consists of various flights scheduled in different airports. There are different nodes representing the different flights which are termed as Movement Groups. The following Fig. 2 shows the relationship between x and y axis. Capital blocks are the nodes which define a single or group of aircrafts.
Now when there is a shortage in one of the nodes let’s take example B in this case. Then there will be cancellation in B or one of the nodes. Although this model was designed for specific number of aircrafts it falls short in resolving the problems like excess aircrafts, delay propagation etc. To further develop the SSPM there was another model developed (Jarrah, Ahmad I. Z. , et al 1993) which was used to overcome the limitations of SSPM. Here the network model was able to analyze multiple delays and cancellations at a given point. The delay model was tested for 3 stations and a reasonable amount of delays were time delays which were manageable in real-time application. However, there were many limitations to such a preliminary model as weather delay and maintenance delays weren’t accountable in such a case study due to fact that very limited tools and equipment’s were available.
Statistical approach using Long-term Trend and Short-Term Pattern: (Tu, Yufeng, et al 2008), used statistics for observing 2 types of delay patterns. Seasonal trends and Daily flight scheduling patterns. This paper was able to put flight three divisions: due to seasonal trend, due to daily delay operations, random errors. EM algorithm was used in this case where it is an iterative process which oscillates between the E step and M step. The main advantage of EM algorithm is that it would give a convergence. The following Fig. 3 shows delay propagation. Error related delays stated in the previous text generally originate from the unwanted factors. These factors act as a mixed distribution. The mixture distribution is maximized to log form. This helps to make the EM algorithm more effective in this case. E and M process is use until they arrive at convergence, which is obtained by adjusting the log functions r other related parameters.
However, minute changes were made to the EM algorithm and it was termed as “G. A- EM” algorithm. These modifications were made based on observations and analysis of another genetic algorithm (Holland 1975). For the case study data from the year 2000 and 2001 were used as the model was applicable for general data forms. Data from Denver International Airport and United airlines were selected for this study. The most common factors which had an impact on the delay were combined and the natural occurring errors were tried to minimize using the G. A-EM algorithm. In order to satisfy the model requirements, the data was cleaned in and special dates such as March 20th were used to study. A Time series plot was plotted against average delay for 366 days. Using the information, the data was managed and cleaned, and the necessary requirements were met. Delay prediction based on the seasonal changes were also observed for 365 days with patterns arising with change in the season. The results obtained from the model were reasonable. At first an estimate delay due to seasonal variation was done followed by day to day delay and Random-error delay were computed. These results were matched with actual seasonal, Day-day, Random-error results.
The estimate and actual results were visually similar to an extent. The major advantage of this model is that it provides a model which is used for a statistical approach of characterizing delays for the same year. This model was able to predict delays only at the tail and middle parts in the result which also has some significant errors in them. (Tu, Yufeng, et al 2008) suggests that the model was able to function with less errors by smoothening some parameters. Another advantage of this model is that it opens the scope of high data and factors management by proposing the rolling horizontal method to detect S+1- day delay. The following Fig. 4 shows the horizontal propagation developed for future adjustments. However, this model has many disadvantages for it to be deployed for actual flight delay prediction. The EM algorithm is a method which focusses on the local optimum solution which is closest to the actual starting value. When there are several optimum solutions then it becomes a challenge to incorporate this algorithm and model. Also, for mixture models EM provides convergence at one Optimum. The algorithm is less efficient in case of larger parameters.
Bayesian Network Tree Model
The tree model implemented in (Ahmadbeygi, et al. 2008) was applied for flight scheduling purpose. This model proved to be an advantage for delay analysis it takes in the factors such as flight network visualization and was able to detect potential flight delay during the scheduling stage. The Delay Propagation Tree (DPT) model contained an entire routing starting from the root node. Generally, this model was used to track delays after the flight was completed. (Cheng-Lung Wu. et al. 2018) wanted to redesign this tree and called it Enhanced-DPT model. The basic idea was to reverse the idea of (Ahmadbeygi, et al. 2008). The newer model focused on predicting the flight delay before or during the scheduling of the flight was in progress. The following Fig. 5 shows the newer Tree model which works in reverse direction of the earlier (Ahmadbeygi, et al. 2008) model.
For the following analysis different scenarios were setup for analyzing the most common delay and factors influencing them. Buffer time was a key term introduced in the Enhanced version of DPT-BN where it implied to ground related delays. Departure profiles of Flights were analyzed as it impacts the arrival and rerouting to another destination. A comparison between early departure and arrival was done to analyze the Flight delay factor. A comparison between delay categories and the frequency showed how the routing was leading to delay. A detailed study was done in finding the anomalies in delay during the morning and evening peak hours which may cause traffic congestion. The following Fig-6 shows the DPT-BN model with a causal relationship.
The major advantage of this DPT-BN model is that the flight Arrival was dependent on Departure time of the flight hence, it provided a framework for Flight delay by showing the factors that affect the delay and which progress along the network of flight scheduling. It gives an in-dept understanding for flight scheduling. Using the DPT-BN model airline flights scheduling would be affected while assessing the ground related operations and their reliability. The major setback of this model is it mostly depends on the mean values in delay propagation. It does not have the capability to determine delay predictions with stochastic variables. This model does not work very well with the non-independent and identically distributed variables (non-IID). (Chakrabarty. et al, 2018) later developed a model which uses Gradient Booster Classifier (GBC). The main objective of GBC is to help the weaker decision tree models. Regression tree models are subjected to addition of newer factors gradually in a sequential manner. The argument given by (Chakrabarty. et al, 2018) in the paper is that the Decision tree models generally yield a non-linear result, due to the fact that they are not able to use non-IID variables. Which makes it inconvenient to use various factors leading to flight delay and use them at once in the decision tree model. By using a GDC it improved the prediction performance of regression tree models.
The Data was collected from BTS and were cleaned until to the necessary requirements were met. The dataset included 97,360 flight instances and after the data was processed 1602 instances were termed as missing and were deleted. The remaining 95,758 were used and analyzed. The results of GBC proved to be 79. 7 percent with associated factors. This predictive model used a GBC model to boost the Weaker regression tree model and the results drastically improved.
Weather Delay Model
A paper by (Stouffer, et al. 2017) had used weather as a factor to predict flight delay by collecting data based on extreme weather conditions from BTS, National Oceanic and Atmospheric Administration (NOAA). Three reference datasets were used to predict the occurrence of weather and related delay of a given airport and region. Another factor which was highlighted in this weather-related delay model is that the delays which were caused can be put in to categories such as, extreme-weather based delays, NAS based delays, late arrivals due to previous delays. (Evans, Antony D. , et al. 2016) in his paper described the Dynamic Weather Routes (DWR) as an ideal model which would decrease the weather-related delays. DWR is a ground-based navigation system which works throughout the flight which is enroute to its destination. It vigorously tracks and analyses the trajectory of the flight and the airspace around it. This provides an option for DWR to modify the flight route plan by considering factors such as weather and airspace congestion.
Although this model was able to prove its results by decreasing the weather-based delay with marginal amount it lacked many other significant factors. Ultimately this model was not encouraged since the criteria was not met and accepted by Air Traffic Controllers (ATC). To enhance this (Stouffer, et al. 2017) proposed a delay prediction using DWR and Traffic Aware Strategic Aircrew Request (TASAR). TASAR was idealized as a tool which would allow pilots to assess the weather conditions in their route and demand for a safer trajectory from ATC. This proved to be more advantageous in terms of simultaneous time route resources and fuel saving methods. The model was tested, and this method of analyzing was helpful in influencing the confidence level of DWR and TASAR up to 95 percent. (Choi, Sun, et al. 2016) in a paper introduced a Weather prediction model which trains using machine learning algorithms. Synthetic Minority Oversampling Technique (SMOTE) was introduced in this paper. SMOTE and a combined effort with random under-sampling is used to train the data for more accurate and better results. To develop the model another set of algorithms were also used.
The first algorithm used is Decision Tree (DT) which has a sole purpose of classifying data forms in node with a random given attribute. Random Forest (RF) is another algorithm used which is collection of DT. It mostly acts a collector of DT and checks class vote of each individual DT. AdaBoost is used in case of a weak classifier. It enables it to increase its prediction rule efficiency. Samples which provide most inaccurate predictions are chosen and modifies with AdaBoost. The following figure shows the Model which is used to train the data. [image: ]Fig 8The model works on 2 different stages. The first stage is training, and the second stage is Prediction. The data is cleaned and pre-processed. Then the prediction step takes in place. Data is collected from BTS, NOAA which includes a historical data from 2005 – 2015. Although the results were never discussed yet the model proved to be a modification of various other tests and their significance.
The scope of machine learning was enhanced in this paper. (Kim, Young Jin, et al 2016) used Deep learning technique to understand flight delay prediction. The use of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) was the main tolls used in this particular study. Basic RNNs region unit a community of neuron-like nodes geared up into ordered "layers", each node in an exceedingly given layer is linked with a directed (one-way) affiliation to every distinctive node within the next ordered layer each node (neuron) facets a time-varying real-valued activation. each association (synapse) features a modifiable real-valued weight. Nodes area unit either input nodes (receiving data from outdoor the network), output nodes (yielding results), or hidden nodes (that regulate the data on the way from input to output). For supervised getting to know in separate time settings, sequences of real-valued enter vectors attain the input nodes, one vector at a time. At any given time, step, each non-input unit computes its present-day activation (result) as a nonlinear perform of the weighted complete of the activations of all devices that join with it. Supervisor-given target activations can also be equipped for a few output devices at certain time steps. for instance, if the enter sequence could be a speech sign adore a spoken digit, the ultimate target output at the top of the sequence could additionally be a label classifying the digit.
In reinforcement studying settings, no trainer affords target signals. Instead a fitness function or reward function is frequently accustomed judge the RNN's performance, that influences it enter stream via output gadgets related to actuators that influence the setting. This can also be accustomed play a sport all through which development is measured with the quantity of points won. Each sequence produces a blunder due to the fact the total of the deviations of all target signals from the corresponding activations computed by way of the network. For a coaching set of numerous sequences, the entire error is that the complete of the errors of all character sequences. The following figure. 9 shows the model used. The model first predicts delay using RNN and later it is used to predict day-day delay from historical data and factors like weather, previous delay etc. The main advantage if the first stage is to set the day to day model.
When the delay of one day is acquired it is then given to the second stage model. The following figure. 10 shows the day-day delay model. The models were trained using Stochastic Gradient Descent (SGD) algorithm. It uses a single sample data at every iteration and training step optimization. The major drawback of this is that it reduces the efficiency of the model as it delays the prediction time of the model. But in terms of huge datasets this problem fades away as SGD proves to recognize overfitting and optimize the overall performance level. This study proves that the deep learning techniques provide an overall better solution to increase the accuracy in delay prediction. The application of LTSM and RNN to a predictive model proved as an befitting step in-terms of delay prediction. Further the paper provides a greater perspective of applying Deep-learning to other airline related problems.
Results and Discussion
Various approaches were given in this paper after analyzing techniques such as SSPM, Statistical approach, Decision Tree, Bayesian Networks, Machine Learning, Deep Learning. The Statistical approach and SSPM were the base of this paper as they showcased the initial analyzing of flight delay predictions. Although SSPM and Statistical approach were researched based on the tools that were available at that time they weren’t able to fit the evolving problems of Flight delay. With increasing flights and data these models lost their ability to fit in the factors that were responsible for delays. Bayesian Tree Models were later taken in account for delay prediction. They were able to accommodate factors that were continuously evolving. These models lacked the capability of accommodating stochastic variables and were having weaker links within them. Later GBC were included to support these models and it was able to predict airline delays with a good amount of accuracy. Weather delay models were deployed to study the delay accuracy which also involved many factors.
Although not all the models were accepted but with the evolving technology and the use of machine learning techniques such as RNN and LSTM provided the base to accommodate various types of data and factors associated with such a data. Deep learning techniques using algorithms such as Adaboost, SDC K mean were enabled to train the predictive models and improve the accuracy. This paper focuses on the traditional methods and the modern ways of approaches used in flight delay analysis.
Conclusions
The main objective of this paper is to study and analyze various techniques and models used for delay prediction in airline industry which was done using some traditional and modern-day techniques. It showcases the contrasts of predictive models which were used in the early days and models which are continuously evolving. Critical analysis was done for all the models that were described in this paper. The advantages and disadvantages of the models were described in the paper.