Application Of Deep Learning Techniques In Detecting Malicious Codes

In the recent past, cyber-crime has seen a lot of attacks by injecting some script into pages and making content malicious. All web content is rendering in html and malicious content are certain code that can run with JavaScript which is embedded in the page. So any program that can detect such malicious code should be efficient and robust and in order to do that the machine learning models should be efficient and fast as that detection and blocking should not slow down user experience. Finding these code snippets can be like finding needles in a haystack and hence tough to get it right.

Deep Learning techniques can help build a model which can tokenize web content quickly and identify the content in question. This is done using multiple hierarchical scales by dividing the document into many parts and running this regular expression model and later aggregate. Small Snippets of such malicious content is generally embedded in a relatively large web page and identifying requires examining the document using the model which can remove commonly used words making it closer to what can be a malicious content.

This process also involves parsing various HTML documents which include JavaScript, html and CSS hence complicating the implementation of the program which can do that. Instead of merely using a flat, bag-of-tokens depiction combined over the entire text, we use a depiction that captures section at numerous spatial scales indicating different levels of localization and combination, allowing our model to find needle-in-the haystack in the content. This approach has a feature extractor which parses out the tokens from HTML documents and builds a neural network model which helps in making classification decisions by examining equally weighted features and arranging them hierarchically for aggregation at a later point in time. We tokenize the target html content with the following regular expression: ([ˆx00-x7F]+|w+), which separates the content along non-alphanumeric word precincts. The inspector component applies weight and converts the document into a vector and master network is responsible for using the output of the vector to make final classification.

The value of using structural important features of a domain such as locality based and translational invariance in case of images has been a major theme in the recent success of such models using applied deep learning. These channel the model's learning ability in the right direction as the structure is already known. While such deep learning techniques are more specialized, slow turn of events in the security attack world will create a need for this the near future where using domain knowledge would help in providing a valuable inductive prejudice of hierarchical spatial scaling, which we trust makes our model more operative at handling recognition problems within html pages of possibly widely-varying dimension. With such a model, there is a 97. 5% chance of detection and a very miniature chance of a false positive, and even recognize malicious content not formerly caught by the open community, with a chastely static token approach that avoids the need for multifaceted analyzing or imitation systems.

New Browser Extensions and Addons are coming up with the ability to hold a model and help in clearing out or warning the user when a page has malicious web content through such deep learning models and this can further use a server side program which can get inputs and retrain the models for better accuracy and speed.

18 March 2020
close
Your Email

By clicking “Send”, you agree to our Terms of service and  Privacy statement. We will occasionally send you account related emails.

close thanks-icon
Thanks!

Your essay sample has been sent.

Order now
exit-popup-close
exit-popup-image
Still can’t find what you need?

Order custom paper and save your time
for priority classes!

Order paper now