A Solution For Securing A Network Based On Intrusion Detection Systems
Introduction
World Wide Web has seen as massive growth in different kinds of web services that include social networking, blogs. Sites like Facebook, Twitter and LinkedIn are the most viewed websites on the Web.
Intrusion Detection System (IDS) is software application which monitors the network or system activities and found if any malicious operations occurs. Intrusion means attempting to break into or misuse your system. Growth and usage of internet raises concerns about how to protect the digital information. Nowadays, hackers use different types of attacks to get the valuable information. There are different ways of classifying IDS, based on anomaly detection, signature based misuse, host based, network based, Stack based. Anomaly intrusion detection deals with the detecting of unknown attacks in the network traffics.
Therefore, they are difficult to identify without any human intervention. The primary strength is it has ability to recognize novel attacks. Many intrusion detection methods and algorithms help to detect these attacks. IT administrators struggle to keep up with Intrusion Detection System (IDS) alerts. Machine learning and data mining techniques are widely used in order to improve network intrusion detection in recent years. The KDD CUP 1991 is the first published dataset to be used in intrusion detection which has been used widely by researchers.
The KDD-Cup 1999 dataset has 24 attacks types and test KDD -Cup 1999 dataset has 37 attack types but attacks in KDD-Cup dataset is different from the test KDD-Cup dataset.
The existing irrelevant and redundant features are omitted from the dataset resulting not only for the faster training and testing process but also less resource consumption while maintaining high detection rates. IDS capture network traffic dataset to learn about normal and abnormal behavior. These dataset has millions of packets with hundreds of features. Some features may be irrelevant and does not participate in the decision making but only increases the processing time. Therefore, a subset of significant features in detecting intrusion can be proposed by using machine learning techniques.
Finally we address the problem of securing large networks with complex architectures, based on intrusion detection systems by using 41 features. These features can be used in the design of Intrusion Detection Systems (IDS) and working towards the automating anomaly detection with less overhead
KDD dataset preprocessing and analysis
KDD dataset gave a good understanding of several intrusion behaviors, in the same time it is widely used in several areas for testing and evaluation intrusion detection algorithms. The first publicized of KDD dataset was 1999 by MIT Lincoln labs at University of California. It includes 4898431 instances with 41 attributes. In this work KDD dataset was imported to the SQL server 2008 to implement various statistical measurements values e. g. distribution of instances records, attacks types and occurrence ratios.
Statistical measurements provide a deep understanding of this dataset in order to extract impartial experiments. Table I illustrates the distribution of attacks types within KDD dataset. It can be concluded that there are 21 type of attacks categorized into four groups with different number of instances and occurrences in the KDD dataset. The DOS attacks present 79% of KDD dataset while normal packets present 19% and other attacks types recorded 2% of existing. Based on these values the KDD dataset appears as an unbalanced dataset but at the same time it includes the largest number (41) of packet attributes.
Implementation
The implementation of an intrusion detector is based on two important aspects.
Behavioral Approach: This approach is based on tracking the behavior of a user, service or any application to infer a probable intrusion. If any of the entities mentioned above changes its behavior or the habits of its operation, the detector deduced that There's suspicious behavior and eventually transmit early warning. This approach itself uses either a probabilistic method in order to estimate a suspect traffic or a statistical method whose principle is to compare quantitatively the behavior of parameters related to the user such as the occupancy rate of bandwidth or the number of network access per day.
Scenario based approach: The principle of this approach is based on known techniques used by hackers to perform intrusions, already enrolled in a signature, for comparison with the behavior of the user in question without recourse to its history and determine if this behavior is legal or not. The signature is actually a series of rules for analyzing packets that flow through the network (pattern matching) or the compliance of the protocol (protocol approach). The use of both approaches in parallel will serve as a powerful solution for intrusion detection.
The implementation of the IDS required the main following two steps:
Step 1: This step is to install an antivirus internet security on a central server. This antivirus has built-in intrusion detection system and its database alert rules are updated automatically through the official website. All computers connected to the network operate as clients and retrieve updates from the server including intrusion detection signatures. In this way, it provides the functionality of an intrusion detector host-reaction with H-IDS as it is not only to alerts but to intervene to block possible attacks on a component.
Step 2: This step is to install an intrusion detection SNORT as alert nodes on different zones of the network in order to collect all the intrusion attempts that are logged to a log file. If this attempt is blocked automatically by the firewall, Snort does not, else, the intrusion detector alerts the attempt by placing an entry in the log file. By adding these signatures of intrusions into a guardian of active network that operates in parallel with SNORT, all attempts with the same signature will be blocked or rejected.
Classification methods
The wide literature on classification models that is simply called “classifiers” offers a range of possible solutions and methods for facing classification problems. Multilayer Perceptron, Naive Bayes, IBk, Random Forest, Random Tree stand among the most famous classifiers, each having its own advantages and downsides.
Multilayer Perceptron
A multilayer perceptron is a feed ahead synthetic neural network model that tries to map a hard and fast of input facts onto a set of corresponding outputs. An MLP may be described as an instantaneous graph wherein nodes are known as neurons and they are organized in three forms of layers.
- input nodes
- output nodes
- hidden layers.
Each layer is absolutely related to the following one and each neuron has a nonlinear activation function. Every neuron updates its value deliberating the values of the related neuron and the weights of those connections.
Numerous supervised learning algorithms had been proposed in literature for converting connection weights after each example of historical information is provided, based on the quantity of mistakes within the output in assessment to the anticipated result. Again propagation is perhaps the maximum extended one, consisting in a generalization of the least means squares algorithm in the linear perceptron. MLP have been broadly utilized in literature for category issues of numerous foundation. By way of other techniques like RF or SVM are said to be very aggressive.
Naive Bayes
The naive Bayes classifier approach is based on the so-called Bayesian theorem it is far long way especially suitable while the dimensionality of the input is high. It is incredibly scalable requiring a number of variables in a learning problem. Maximum-chance training can be achieved with the aid of comparing a closed-form expression, which takes linear time, instead of via highly-priced iterative approximation as under for lots different types of classifiers.
Some types of possibility models, naive Bayes classifiers may be skilled very successfully in a supervised learning setting. In many realistic programs, parameter estimation for naive Bayes models uses the technique of maximum probability. In different phrases, one can work with the naive Bayes model without accepting Bayesian probability or using any Bayesian methods. Benefit of naive Bayes is that it only requires a small number of training data to estimate the parameters necessary for classification.
Random Forest
It is a well-known ensemble gaining learning method for supervised category or regression. This machine learning technique operates by using building an ensemble of random decision trees at training time and outputting the magnificence this is the mode of the commands or mean prediction of the individual trees. Consequently a RF is a classifier consisting in a collection of tree installed dependent classifiers which make use of random choice in two moments. In a number one step, the algorithm selects several bootstrap samples from the historical information. For each bootstrap selection k, the size of the chosen information is extra or less 2/3rd of the total training data. Cases are decided on randomly with substitute from the historical information and observations in the unique facts set that do not occur in a bootstrap sample are called out-of-bag observation. In a second step, a classification tree is trained using each bootstrap pattern, but simplest a small wide variety of randomly selected variables are used for partitioning the tree. The OOB error rate is computed for each tree, using the rest of historical data. The overall OOB error rate is then aggregated, observe that RF does not require a split sampling technique to assess accuracy of the vision. The final output of the model is the mode of the predictions from each individual tree. Random Forest comes at the expense of a some loss of interpretability, however generally greatly boosts the performance of the final model, becoming one of the most possibly to be the best performing classifier in real-world classification problems.
Random Tree
Random Trees is a collection of individual decision trees where each tree is generated from different samples and subsets of the training facts. The idea behind calling these decision trees is that for every pixel that is categorized, Some of decisions are made in rank order of significance. When you graph those out for a pixel, it looks like a branch. While you classify the whole dataset, the branches form a tree. This method is called random trees because you are actually classifying the dataset a number of times based totally on a random sub selection of training pixels, hence resulting in many decision trees. To make a final decision, each tree has a vote. This technique works to mitigate over fitting. Random Trees is a supervised machine-learning classifier based on constructing a multitude of decision trees, deciding on random subsets of variables for every tree, and using the maximum frequent tree output as the overall Type. Random Trees corrects for the decision trees' propensity for over fitting to their training sample facts. On this approach, a number of trees are grown by an analogy, a forest and variation among the trees is introduced by projecting the training data into a randomly chosen subspace before fitting each tree. The decision at each node is optimized by a randomized procedure.
Decision Tree
In the context of machine learning, a decision tree is a tree-like graph structure, where in every node represents a test on an attribute. Each branch represents the final result of the test and the leaf nodes constitute the class label received in the end of all decisions made via that branch. The paths from root to leaf constitute classification guidelines. The Purpose in this scheme then is to symbolize the data even as minimizing the complexity of the model. Several algorithms for constructing such optimized trees have been proposed. As an instance, derives from the well-known divide-and-conquer approach and have been widely used in several software fields, being one of the most popular machine learning algorithms. This scheme builds decision trees from a set of training data using the concept of information entropy. At each node of the tree, C4. 5 chooses the attribute of the data that maximum effectively splits its set in step with the normalized data gain. It is important to note that this easy, yet efficient technique, is capable of handling lacking values in the data sets and both numerical and categorical attributes.
Conclusion
The statistical analysis of the Corrected KDD-CUP 1999 indicated that feature selection can reduce the high dimensions (curse of dimensionality) of the dataset and computational time while it does not have significant effect on intrusion detection rate. The proposed subset of features (3, 5, 6, & 39) can be used in data mining tasks which performed better intrusion detections than the other subsets of features suggested by (CfsSubsetEval + GreedyStepwise) and (InfoGainVal + Ranker). The subset of 10 features produced by InfoGainVal + Ranker algorithm performed better than the other subsets however, it is a matter of trade off (adding more dimensions) in order to improve the detection rate slightly. The statistical analysis on NSL-KDD dataset confirmed the above results. To protect a network against attacks including intrusion, we must study its architecture, analyse vulnerabilities, up to date with new threats, a purpose to minimize the risks that may occur. We proposed and implemented a solution for securing a network based on intrusion detection systems. We performed several experiments to validate our solution.