AI VILLAGE - Detecting Web Attacks with Recurrent Neural Networks

Video thumbnail (Frame 0) Video thumbnail (Frame 14318) Video thumbnail (Frame 16539) Video thumbnail (Frame 20612) Video thumbnail (Frame 28147) Video thumbnail (Frame 35682) Video thumbnail (Frame 37202) Video thumbnail (Frame 38602) Video thumbnail (Frame 40550) Video thumbnail (Frame 42616) Video thumbnail (Frame 43843)
Video in TIB AV-Portal: AI VILLAGE - Detecting Web Attacks with Recurrent Neural Networks

Formal Metadata

AI VILLAGE - Detecting Web Attacks with Recurrent Neural Networks
Title of Series
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Classic Web Application Firewalls (WAFs) mostly use rule-based approach for attack detection. This approach is known to have its pros and cons. Despite offering decent protection from automated attacks and predictable detection results rule-based approach has and always will have certain disadvantages. We all know that it’s useless against 0-day attacks or that even the most sophisticated rules are easily evaded by skilled professionals. That is why a more effective approach should involve some kind of heuristics. Let’s give a chance to artificial intelligence to find something non-obvious for human perception in raw data and try to explain its results. To this day AI has been more often used for cat classification rather than for detecting application-level attacks on HTTP applications. Our team decided to test the hypothesis that Deep Learning is able to detect web-based attacks effectively. We started with very simple neural network architectures and tried to use them for classification. After some experiments it became clear that we needed more complex networks so we abandoned our attempts to use classification shifting to anomaly detection. Eventually, we ended up using seq2seq model with attention mechanisms which is able to detect zero-day web attacks with minimal number of false positives.
State observer Presentation of a group Injektivität Decision theory Multiplication sign Parameter (computer programming) Mereology Perspective (visual) Front and back ends Software bug Web 2.0 Mathematics Machine learning Type theory Object (grammar) Pattern language Cuboid Endliche Modelltheorie Website Extension (kinesiology) Information security Partial derivative Position operator Vulnerability (computing) Physical system Injektivität Source code Machine learning Extrapolation Email Firewall (computing) Software developer Sampling (statistics) Bit Sequence Recurrence relation Web application Type theory Pattern matching Data model Sample (statistics) Vector space Series (mathematics) Website Pattern language Text editor Information security Metric system Physical system Resultant Classical physics Web page Shortest path problem Server (computing) Proxy server Computer file Open source Dependent and independent variables Firewall (computing) Virtual machine Time series Rule of inference Power (physics) 2 (number) Time domain Revision control Force Hacker (term) Vector graphics Konturfindung Energy level Proxy server Computing platform Addition Focus (optics) Dependent and independent variables Demo (music) Artificial neural network Cellular automaton Projective plane Planning Denial-of-service attack Basis <Mathematik> Cartesian coordinate system Perspective (visual) Data flow diagram Marginal distribution
State observer Context awareness Group action Scripting language Markup language Real number System administrator Parameter (computer programming) Image registration Mereology Web 2.0 Mathematics Sign (mathematics) Hacker (term) String (computer science) Software framework Endliche Modelltheorie Website Information security Physical system Vulnerability (computing) Injektivität Module (mathematics) Scripting language Email Cone penetration test Web page Content (media) Message passing Data management Query language Website
Email Greatest element State of matter Multiplication sign Decision theory Set (mathematics) Insertion loss Parameter (computer programming) Function (mathematics) Mereology IP address Software bug Web 2.0 Data model Mechanism design Different (Kate Ryan album) Vector space Software framework Process (computing) Endliche Modelltheorie Information security Multiplication Social class Point cloud Injektivität Software bug Email Algorithm Electric generator Decision theory Sampling (statistics) Parameter (computer programming) Bit Translation (relic) Sequence Virtual machine Connected space Recurrence relation Web application Proof theory Type theory Data model Process (computing) Sample (statistics) Vector space Phase transition Normal (geometry) output Resultant Link (knot theory) MIDI Virtual machine Content (media) Sequence Natural number Maschinelle Übersetzung Codierung <Programmierung> Communications protocol output Computing platform Task (computing) Artificial neural network Cellular automaton Length Expert system Line (geometry) Cartesian coordinate system Cross-site scripting Similarity (geometry) Musical ensemble Communications protocol
Stapeldatei Length Parameter (computer programming) Mereology Software bug Endliche Modelltheorie output Injektivität Noise (electronics) Data dictionary Software bug Stapeldatei Artificial neural network Lemma (mathematics) Bit Maxima and minima Sequence Word Process (computing) String (computer science) Einbettung <Mathematik> output HTTP cookie Integer Reverse engineering Row (database)
Module (mathematics) Injektivität Data model Process (computing) Demo (music) Multiplication sign Demo (music) Sampling (statistics) Virtual machine Software testing Parameter (computer programming) Cartesian coordinate system
Injektivität Enterprise architecture Heegaard splitting Interface (computing) Virtual machine Sampling (statistics) Musical ensemble Parameter (computer programming) Mereology Immersion (album)
Laptop Point (geometry) Mobile app Open source Link (knot theory) Code Multiplication sign Calculation Merkmalsextraktion Set (mathematics) Client (computing) Parameter (computer programming) Code Thresholding (image processing) Software bug Hypothesis Sequence Data model Endliche Modelltheorie Graphics processing unit Injektivität User interface Graphics processing unit Software bug Distribution (mathematics) Artificial neural network Thresholding (image processing) Cartesian coordinate system Sequence Type theory Web application Exponential distribution Personal digital assistant Calculation Phase transition Formal grammar Task (computing) Resultant
hello everyone my name is our scenario Joe this is father sacr? and today we'll be speaking about rejected web attacks with recurrent neural networks so before starch a little bit about our cells I'm application security researcher at positive com+ that come is an application security company and further is a software developer at some calm some is just a decentralized computing platform so our presentation will be divided into three parts we will start with the problem the challenges of web app attack detection then we will move to the actual solution the anomaly detection for HTTP requests with attacks using deep learning and then we will finish with our results demo and our plans so the first part I would like to speak about the problem so we are solving the problem of web attack attraction so watch our web web attacks the systems that aim edge detection web attacks are called web application firewalls these are the systems that project websites and web applications against the attacks on the higher higher level of OSI else ml7 attacks the first commercial valves appeared about 20 years ago and the most known open source WAP is most huge typically valve operates as a reverse proxy so we have some intermediate server that processes the traffic web traffic then proxies it to the backend and mostly laughs use part in march in nowadays still use it to the checks whether checks from one perspective there are basically two types of web attacks the first one is time series based so that it means that the checker tries to make different multiple requests to perform his check it may be a web scraping brute-forcing for example elegant page fingerprinting to reject the version of your web server or skinning for example for vulnerabilities and the second one the second type of checks is based on a single request a single pair of HTTP requests and HTTP response these attacks can be detected on tour request basis these are for example SQL injection cross-site scripting XML external entities injections injections and all other attacks that are basically some kind of injection into some HTTP parameter and the focus of our research will be the second type of a check so we do not and wise at present the sequences but we analyze single requests and try to detect and analyze single requests to detect some patterns of web attack injections now I would like to compare classic pattern much into machine learning approach Parton margin is effective to detect known attack vectors and it can be easily maintainable adjust the text file you can open it in any text editor and add your additions or you can delete anything or you can main change your version control such as each and it can be a pretty fast the results are always predictable and rules and margins I can work out of the box so you don't need to mostly tune it for particular website so it can work for majority of websites much of course for false positives can happen there are also some disadvantages of partial motion such that their subject checks themselves so if you write a fortune that is not well sort off that is that can change some vulnerabilities itself then it can be subject to such a checks by creatures so you can perform denial of service on the Parchin that is buggy and fortunes are typically easy bypassed by hackers so if they have evolved edges that uses patterns to block for example SQL injections it's it can be easily bypassed and of course it's not so effective at catching zero days so unknown lectures that so patient mention cannot extrapolate to unknown lectures and to write the Parchin to write some rule you of course need to understand what your projection from so you need extensive web security technology and yeah party margin causes lots of false positives so let's see what benefits and disadvantages presents machine learning the biggest advantage of course is that machine learning can extrapolate it came to check previously unseen samples and usually it's not so easy to buy pens it's also pretty fast if we are speaking about just a for us not training and it doesn't require web security knowledge especially if we speak about deep learning when you don't need to extract features reward to training and the disadvantages a machine is that it requires some time of course to change the results are difficult in to interpret it just a decision zero or one it just tells that there is something in this particular request that is uncommon but I cannot tell you what is uncommon watch kind of vulnerability or a check is in this particular quest and you can predict the behavior of a trained model if we compare two rules and patterns and models are not so easy to maintain you can adjust orphanage in the checks file and modify it you always need to retrain it so these were the problem metrics now I would like to state the goals of our research there are three of them the first one it is that we would like to create deep learning model that does not require feature extraction power to training the second one is that these models should solve the anomaly detection problem in HTTP requests and finally this model should yield interpretable results which is and normally if we are speaking about HTTP requests in fact it can be any and it can be just a quest that has for example three headers HT headers instead of normal 20 or it can be a spam or it can be even 0j a check and the model should understand the intention of this entity request whether it is malicious or not like in classic movie sentiment problem and of course malicious and benign classification is greatly depends on the samples and the history of previous observations there are three
examples that I want to share with you just to show how important to be able to take into account the context and the history of the previous observations if you're a web security enthusiast of course you would notice here that this request has a string that is very resembles an SQL injection a change but without context we can say that it is a real SQL injection or not so in fact this request comes from backtracking system known as JIRA and they use the special query syntax called jql and Martin Martin approach which to check this request isn't a check because obviously there are some keywords that that are a sign of an SQL injection but in fact this request is benign and it shouldn't be classified as an attack let's see this example in this request
you can see a post I'm sure that she has some HTML markup just looking at this example we cannot say that this request is actually in a check again but web security hackers would definitely try to inject their some for example script checks to see if there is an XSS vulnerability again a part in much an approach would definitely block this request because there are some HTML checks much in fact this is a request that is also benign and this markup is allowed for this particular web site and only low in the previous states previous observations with the module can tell that this HTTP request is actually benign and this search example this request is as you can see in the host header comes from content management framework called Joomla and looking at the transparent ER you can see that this is probably a typical user registration but in fact there is an additional parameter as you can see user groups which is equal to 7 which is in fact an exploit for Joomla which escalates privileges and allows to register anyone as administrator on any joomla 3 or 2 3 6 4 and again this is a contrary example when a marching marching approach would just let this request pass because it doesn't contain any injections it just has an additional parameter with the value 7 so it typically would not be blocked but a module that is properly trained and our model would check this request as an attack so the next part is
actual solution how our portal works and how which was built alright please raise your hands those of you who know what a neural network is all right now those of you who know what is a convert convolutional neural network is right and recurrent neural network ok lsdm good so I'll guess I'll have to spend a little bit more time on this we decided to build this machine learning algorithms just like a proof proof of concept in inside our web application framework the protection so first of all we tried to build a classifier and well what we try to do is we try to collect some benign Diet data and obviously we didn't have any malicious data for some example web application so we had to generate generate some malicious examples so it would look something like this there would be requests which are labeled malicious and some of them would be labeled benign so let's try to build a classifier well what is an HTTP protocol it is a text-based protocol and each line of it is an independent send sentence it consists of headers URI which are not that long some bottom it may be somehow encoded like whatever and well it's text in the sequential in nature and for example the values of the parameters they depend on the names of the parameters so it would be a weird thing to see n IP address in the connection header or something weird so we decided to use recurrent neural networks for analyzing text data they are deal pretty well with text and it's the cost of neural networks that can work with sequential data the data which is ver eyes in its size and its sequences we're like normal texts or music or movies or whatever first we tried to use simple recurrent neural networks and we built a classifier on top of it and tried to evaluate our results which was somewhat good but however there are no problems the results are not interpretable and classification a you just get a label and you don't know why model considers something to be malicious or benign so your user which is probably not a security expert will have no way of understanding the decisions of your model also you have a need to construct the malicious data which is tricky because you can just like take your benign samples in some known attack sequences to them and call that your malicious data set which is kind of weird because most likely real attacks to your web application will not look like the data set you generated and it also needs manual labeling for example you want to detect SQL injections XSS attacks benign data whatever else you want to detect so it becomes your classes but you have to label it all and well as has a certain problem if you encounter a new type of attack it's not clear which class it will belong to okay so what we try to do is we try to improve our class difficulty fire and the first thing we decided to add the attention layer attention mechanism is something that solves a lot of a lot of problems well
first of all it aids the process of learning and it also makes the results of your decision process interpretable you can use it to highlight the parts of the data that your model considered the most important in its decisions and not highlight something that if considered not important but it's still and it improves our situation but it still doesn't solve the problems of the classification but what if we try to detect anomalies instead of trying to classify our data into SQL injections XSS attacks and so on and so forth well the initial task of attack detection is more similar to it and if you think about it it's exactly what the human brain would do well if you try to detect some anomalous attack in your web application you would probably first detect something weird about the request and only then you would understand that it's an SQL attack for example well the advantage of that is that if we built a reasonable anomaly detection platform we would no longer have to mutually label the data to generate our malicious samples and they would do it now the link embedded well well that there's a class of recurrent neural networks that I used mostly for machine translation or for like music generation or whatever it is so called encoder decoder models and they are basically two different recurrent networks with lsdm cells which are connected with each other with in the following fashion they encoder processes the inputs and it outputs some state which is a fixed sized vector then it's fed into the decoder and also our sequence fed into it and decoder outputs some target well for machine translation that's used obviously the sequences are sentences say in English and the target sentence is the sentence in French so the encoder and decoder will possibly translate your English sequence to French in a nice fashion it works and like Google Translate on a lot of applications nowadays but what if our target sequence would be the same as the input sequence we could built a model where we feed the web so yeah so what about if we build a model which where the targets are the same as the inputs but not really the same but the probabilities of the inputs which during the learning phase would be one hot vectors for example if we have a request the first letters of it g-e-t we also feed it to the decoder and the decoder outputs the possibilities of the letters the first output is the possibility of the letter G the hanging of the letter e and so on and so forth so we what we're trying to do is to teach our encoder decoder model to reconstruct the data with some possibilities now the outputs of the model are the probabilities of each letter in the sequence and also we collect the whole loss of this reconstruction process and if the model so so-called failed to reconstruct our requests then we consider that this request is anomalous and it turns out that they are probably - their probabilities of the anomalous characters in the request with the which would be considered anomalous are quite low so here is a bit about how actually
the input data is protest well earlier this day there was this great work by and also a gentleman about the reverse engineering and record neural networks he explained a bit about embeddings well we also use embeddings but we don't use the word embeddings we use letter embeddings and just use some vocabulary pre-created to transfer our sequences to numeric data and also the batches the data inside the batches is better to the maximum length in the batch it's it improves the learning process I don't know why so in the end
we built a model which detects anomalies which can be visualized in the following matter here we have an SQL injection and you see that the detection is a bit noisy but some but the letters which are considered anomalous they are highlighted with red and it seems that apart from the some small noise in the parts which aren't actually anomalous like the first letters in the parameters and some unknown one letter anomalous in the PHP session ID cookie we have detected the SQL injection quite well
okay now it's I think it's the time for our demo okay here's a process which we
actually presented to some well first we
generate some normal traffic to our test back-end well it's actually a pursuit nothing special well there's the parameters of the traffic will be set now yeah we use Europe intruder to generate some traffic to get trained to get mono trained so we have sent some requests and resumed that the module is ready and now we will try to send some checks to the target we have application its sample it's a demo banking application that we have built for Cheston our different machine learning models so firstly try to submit an SQL injection and and each was
refused then we logged in as a regular user now we try to perform some payment we do it my new transaction and in the comment
felt you have seen the part of some access as a check just a sample and this is the interface of enterprise soif and you can see how this enormous requests are locked into into some actual love [Music] now is there some interesting part we are trying to do the I don't know how it's called the SQL split injection it's yeah it's fragmented SQL injection as you see it switch into two parameters as traditional parts immersion would typically just analyze each parameter one by one so it would miss this attack because it doesn't see the whole payload much machine learning approach on the contrary would detect it as it analyzes the whole HTTP request instead of particular parameters inside it
and again you can see the detection being locked to the web interface of our Wow yeah and it's quite interesting because well this problem with these kind of SQL injections we addressed it separately and we even built a separate tool which tried to detect some sophisticated SQL injections by like parsing SQL grammar and whatever and it turned out that the generalized neural network can also detect these kinds of sophisticated attacks just as well as the tool that we built specifically for detecting these types of injections all right so it
looks like we have created a deep learning model that does not require require the prior feature extraction it just can look at the data online which is flowing to their web applications over your for example client for your own web applications and use it to learn how the normal requests to your apps look like it detects the anomalous in the future traffic after the learning phase and it which I think like the most important result of our work it yields interpretable results so your clients or you yourself would know why the model considered something anomalous or not we have open sourced this work as a by Jupiter notebook you can see it on this github link provided below please do run it so the I think there's some some data from from the bank application on the banking application please run it please try to train the model please try to verify our results that will be very helpful and I hope some of you will come up with the new ideas how to improve it there will be some weird things in the codes which you will probably notice feel free to ping us whenever you want about them like for example how the tres holds for anomalies are chosen it's quite obvious in the code but it looks like kind of magic now and for the future work as well there's a lot to be improved first of all we need to optimize the learning time because now it takes like five hours for the data set that we have open source on a high-end GPUs we also need it will be a great idea to now build the classifier on top of the anomaly detection so if we have detected some anomalous sequence it has a it is a good point to try to classify it whether it is an SQL injection or XSS now because we have stripped all the normal data from our request and improve the transport calculation well now the Transvaal is calculated in some sophisticated way we have a hypothesis that we have a normal distribution and for our learning data set we calculate the parameters of these distributions which will a to use in the threshold calculation and this is used for anomaly detection well cases like that have to be proven or whatever in a more scientific manner because it may look like an exponential distribution more to someone so that has to be improved a lot all right thank you