The Network Behavior of Targeted Attacks
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Untertitel |
| |
Alternativer Titel |
| |
Serientitel | ||
Teil | 24 | |
Anzahl der Teile | 29 | |
Autor | ||
Lizenz | CC-Namensnennung 3.0 Deutschland: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/18851 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
Hacktivity 201524 / 29
1
4
6
17
00:00
Endliche ModelltheorieMalwareRechnernetzSystemidentifikationAnalysisFormale SpracheInformationSoftwareTypentheorieProgrammierumgebungBinärcodeBitIndexberechnungMereologieProjektive EbeneTeilbarkeitCASE <Informatik>NormalvektorRPCNetzadresseFirewallElektronischer FingerabdruckDomain <Netzwerk>MalwareAlgorithmische LerntheorieMultiplikationsoperatorURLRechter WinkelVirenscannerGamecontrollerMathematische LogikRechnernetzProdukt <Mathematik>Arithmetisches MittelForcingGeradeBAYESDatenfeldWort <Informatik>QuellcodeMotion CapturingEndliche ModelltheorieSoftwareschwachstelleSnake <Bildverarbeitung>VolumenvisualisierungVorlesung/Konferenz
07:10
RechnernetzSystemidentifikationEndliche ModelltheorieMalwareOpen SourceMaschinelles LernenAnalysisBildgebendes VerfahrenInformationMehrrechnersystemRoboterSelbst organisierendes SystemSoftwareTypentheorieGenerator <Informatik>Kategorie <Mathematik>SoftwaretestProgrammverifikationTotal <Mathematik>GrundraumIndexberechnungLeistung <Physik>MereologieProjektive EbeneSpeicherabzugE-MailReelle ZahlComputervirusNormalvektorProgrammfehlerFehlermeldungVollständigkeitPunktNetzadresseInnerer PunktBaumechanikMailing-ListeVorzeichen <Mathematik>Elektronischer FingerabdruckEndliche ModelltheorieDomain <Netzwerk>p-BlockMalwareSondierungAlgorithmische LerntheorieMultiplikationsoperatorOrtsoperatorMusterspracheMathematikSpieltheorieVariableGefangenendilemmaCOMKomplex <Algebra>ResultanteZahlenbereichZellularer AutomatFlächeninhaltGewicht <Ausgleichsrechnung>DruckverlaufOffene MengeWort <Informatik>Inklusion <Mathematik>Monster-GruppeFlächentheorieSchreib-Lese-KopfRechter WinkelVorlesung/Konferenz
14:21
RechnernetzSystemidentifikationEndliche ModelltheorieMalwareATMMarketinginformationssystemW3C-StandardDatenmodellGruppenkeimAggregatzustandInformationNatürliche ZahlSoftwareRechnernetzFrequenzIterationVideokonferenzProgrammierumgebungProgrammverifikationElementargeometrieAggregatzustandArithmetisches MittelBitEinfach zusammenhängender RaumGeradeGruppenoperationSpeicherabzugTabelleVirtuelle MaschineZahlenbereichE-MailDatenflussSystemaufrufReelle ZahlServerZeitrichtungInternetworkingCASE <Informatik>DatenfeldPunktWeb-SeiteRichtungVollständiger VerbandDirekte numerische SimulationDatenmissbrauchWeb SiteEndliche ModelltheorieSelbstrepräsentationNeuroinformatikFitnessfunktionSondierungSymboltabelleMultiplikationsoperatorRechter WinkelDienst <Informatik>BenutzerbeteiligungAssoziativgesetzFehlermeldungMalwareAlgorithmische LerntheorieMusterspracheFacebookVorlesung/Konferenz
21:31
RechnernetzSystemidentifikationEndliche ModelltheorieMalwareNormierter RaumAggregatzustandATMMathematikOrdnung <Mathematik>SoftwareStabFrequenzTypentheorieSynchronisierungTaskGrenzschichtablösungAggregatzustandAnalytische MengeAuswahlaxiomEinfach zusammenhängender RaumGeradeLastMereologieMomentenproblemVirtuelle MaschineDatenflussReelle ZahlNormalvektorFehlermeldungCoxeter-GruppePunktFeuchteleitungRankingNetzadresseInformationsspeicherungWeb-SeiteWorkstation <Musikinstrument>DatenmissbrauchNominalskaliertes MerkmalWhiteboardEndliche ModelltheorieNeuroinformatikWeb ServicesSondierungMultiplikationsoperatorExploitSnake <Bildverarbeitung>Rechter WinkelAssoziativgesetzMusterspracheGamecontrollerRoboterZeichenketteProgrammverifikationProjektive EbeneGewicht <Ausgleichsrechnung>Motion CapturingDifferenteMalwareAlgorithmische LerntheorieDienst <Informatik>VirenscannerComputeranimationVorlesung/Konferenz
30:01
Endliche ModelltheorieKette <Mathematik>Stationäre VerteilungRechnernetzSystemidentifikationMalwareWellenpaketPaarvergleichDatenmodellAbstandATMZeitrichtungCOMNeunzehnLokal kompakter RaumAchtMarketinginformationssystemAggregatzustandBildgebendes VerfahrenDiagrammGravitationMathematikSoftwareDeskriptive StatistikTypentheorieMatrizenrechnungMAPEreignisdatenanalyseSoftwaretestBitCOMGruppenoperationIndexberechnungInterpretiererKette <Mathematik>MomentenproblemPaarvergleichQuantisierung <Physik>Stationäre VerteilungDatenflussFlächeninhaltSystemaufrufEinflussgrößeAbstandCASE <Informatik>FehlermeldungPunktNetzadresseNegative ZahlVollständiger VerbandFramework <Informatik>SichtenkonzeptWeb SiteEndliche ModelltheorieDifferenteNeuroinformatikMultiplikationsoperatorRandwertRechter WinkelOrtsoperatorGamecontrollerEinsGeradeProjektive EbeneMatchingBitrateMalwareZweiComputeranimation
38:31
SpieltheorieRechnernetzSystemidentifikationEndliche ModelltheorieMalwareComputerspielDatenbankInformationSoftwareSpieltheorieStatistikFrequenzTypentheorieProdukt <Mathematik>MAPProgrammverifikationEntscheidungstheorieEinfach zusammenhängender RaumKette <Mathematik>MereologieProjektive EbeneRechenschieberResultanteDatenflussEinflussgrößeServerNeuronales NetzAbstandNormalvektorSystemverwaltungCoxeter-GruppeFormation <Mathematik>NetzadresseSchnittmengeProtokoll <Datenverarbeitungssystem>Web-SeiteFirewallUmwandlungsenthalpieDirekte numerische SimulationVirtuelles privates NetzwerkBitrateMixed RealityWeb SiteEndliche ModelltheorieSchwellwertverfahrenNeuroinformatikCheat <Computerspiel>MalwareAlgorithmische LerntheorieMultiplikationsoperatorRechter WinkelInternetradioVirenscannerOrtsoperatorGamecontrollerEinsAnalysisMathematikRechnernetzHalbleiterspeicherWellenpaketDifferentialGeradeInternetworkingSummierbarkeitNational Institute of Standards and TechnologySpezielle unitäre GruppeWort <Informatik>FokalpunktRahmenproblemVorlesung/Konferenz
Transkript: English(automatisch erzeugt)
00:01
Hi everyone, welcome So My name is Sebastian Garcia, and I'm going to talk about the network behavior of Targeted attacks in particular. We are going to model this traffic to identify it on the network and this is part of a project that it's called
00:23
Stratosphere IPS project that is the project I'm Working with wait, wait, I found the lesser this one here So we are going to talk a little bit about how we are researching on detecting this malware in the network So we are using some machine learning tools and trying to see what's working and what's not working
00:46
So here in the audience who is working Detecting malware or any type of botnet in the network who is working with there They are upstairs. They are no one. No, no malware detection. No button. Okay and
01:01
Specific some APT attacks someone is focusing on APT. I say hey, I just don't know there. We have someone there Okay, so actually we are working with a lot of malware and botnets But we like to focus on APT for two reasons I will speak now about the first reason and later about the second one
01:22
The first reason for me is that APT I'm not going to talk about yeah, they are advanced no, no, they are not advanced They are persistent not persistence. A lot of people knows a lot about that So what I like is that the goal of the APT is it's very specific, right? So when you are you are being attacked on they're attacking someone they know who they want to attack and they know what they want
01:45
How to get it? So this is not the usual malware that it's I don't know Sending click-frauding or notherware or spam or whatever, right? It's not money or it's not usually a lot of money But they are trying to get very specific information and this is making the tags very very difficult to detect
02:05
Right. In fact, they are not such advanced tools, right what they are doing It's like normal attacks some some phishing emails some malware Very very simple rat remote access tool and that's it It's working right if you know the citizen love people from Canada their research on this a lot and they found that
02:26
The most of the time they're using very very normal malware and only once they witness and zero They attack in an APT case. So usually we are dealing with very simple stuff The problem is that it's very difficult to analyze. So if you want to analyze all these people here, huh? This is so close
02:48
So if you want to analyze APT you can get the malware you can analyze it you can open the binary It's not what I'm doing, right? I'm not a binary analysis guy. I like the network traffic. So, how do you like the network traffic?
03:01
Okay, I want network traffic. How do you get the network traffic? Okay, I can I don't know execute the malware, right? So I go there I execute the malware. But what it's the problem or what it's the difficulty of Executing this APT malware in my network why it's not the same What what do you think? Well, you say yeah, I have the real malware. I execute it and even it's connecting, right?
03:26
I will I will say that the malware is up. It's running the command and control is running. It's there Everything is perfect. Why why it's not the same the analysis No Well
03:40
Well, no human factor in fact is something like that I'm not the target this is targeted attack I am NOT the target so they are not going to attack me as they are attacking some other guy, right? So it's very important being the target to have this traffic because I don't care about the pockets in there I don't care if the pocket is TCP or UDP. What I care is the intention of the attack
04:04
I want the behavior in here. I want the malware Outdoor saying okay now get the document files. No, no, no now forget the document files get a screenshot Oh, hey, it's doing something get the kilos. That's what I want. I want the intention here I want the behavior and that's why it's so difficult to get this information. So
04:23
When we try to execute it The first thing we found is the lifetime of the campaign is very short, right? So if you if you are executing the malware like 20 days after it's being Captured in a real environment. That's it. You are not going to find the infrastructure there
04:41
The command controls are not working. Nobody's they are listening. So the execution is not not so good for us, right? So we can modify rug malware. That's what we did. We get some normal malware We modified and we execute it ourselves and we attack ourselves, right? That is completely horrible
05:01
Specifically because there is no behavior. Yeah, I can't attack myself. I can't like you You can attack me, but we are not the real players here So we did this to get the best traffic we can but if you're analyzing this type of malware, this is an issue Okay So this is the first reason why we are going to work with targeted attacks because we like them because they are very specific
05:22
Very difficult to detect and they are quite simple But If you want to detect this in the network, imagine that you are gonna detect this in the network, okay So hey, there's Michelle. Yeah, you have a talk right now Okay, sorry, I thought he was giving a talk right now, okay
05:42
You can go to your talk if it's time for it So if you're trying to detect this in the network, right you have solutions in there You you have a lot of a lot of software what you doing Really in the network, I'm not talking about antivirus stuff, right? I'm talking in your network So you are putting some firewall in there some ideas IPS filtering you start playing with
06:04
Indicators of compromise right you are you are registered in a lot of feed So you get all this information a lot of domains URLs IP addresses Your feed is coming all the way and you are blocking blocking blocking filtering a lot And also you have a lot of fingerprints so you have snort you have bro, okay actually bro, it's not with fingerprints
06:25
It's with a beautiful language But you are using Fingerprints you're seeing payloads, right? You are capturing these and you're stopping them And if this is not working, what do we have? Well the last last latest of
06:40
the tools we have is Behavior right anomaly detection. So a lot of people is working on this anomaly detection is nice It's like a fool's word. If you say anomaly detection, it's awesome, right? Nobody knows what's going on. But hey, yeah We have some behavior in here. So what's the issue with anomaly detection? It's working. Okay, who here is using some anomaly detection software product in the network
07:05
No here. No one there. Oh there we have one. It's true. They exist Now it's it's this is working The problem with anomaly detection is that for anomaly you need to those know what it's normal So you need the normal first and then you spot the anomaly and how do you know?
07:24
What's normal because we are human beings we are changing all the time our traffic our patterns our ideas So that's an issue if you go to the network what it's normal is changing all the time So you should adapt again and then you detect some anomalies and then when you have the anomaly
07:40
It turns out that an anomaly is not an attack And this is something that usually the people working with anomaly detection tend to forget an anomaly is an anomaly It's not an attack. So who is going to say if this anomaly is an attack? Okay, so you need people they're watching it. I'm reviewing and saying okay. Yeah, this is an anomaly
08:00
No, yes, this is an attack is another attack So it gets very very complicated and at the end you need people working on that So there are some issues here, right? The issues we have is that first the lifetime of the indicators of compromise is unknown So you're blocking some domain you're blocking some IP. How long are you going to block it?
08:22
One day one hour one month Okay, how long is that IP in the list of blocks IP? Nobody knows Well, some people is analyzing this but but usually this is not information you got in there If you see the analysis some information is there for three months and three months is a lot that domain is
08:42
Down and not working in less than three days, right? So why blocking it for three months? So nobody knows how to do this correctly and of course who is verifying this Who is verifying that the domains you got for blocking are really really many issues Well some people I hope I don't know
09:01
But if you go to virus total.com and you search for www.google.com and you say hey virus total give me some indicators of that you will find like 5,000 people saying this is malicious and you will find like 17,000 people saying this is normal So actually this is confusing right if you have an automatic tool working with this data
09:25
You will have a lot of domains that are false positives and you are blocking them So the errors and the verification is very important and nobody's looking at this right now. Oh, sorry So also you have a huge huge amount of information one malware can generate dozens of domains
09:42
I don't know dozens of IPs plus pillows place Fingerprints so you are blocking a lot a lot and a lot more more more more every day And actually you don't know what you are blocking. You don't know what you are not blocking that that's part of the game and Also, this information is static so it's not changing it's not evolving it's not adapting that's an issue and
10:05
Finally for the attackers is very very easy to adapt to these measures, right? The cost of adapting it's not so much changing IP. I have a lot like domain I can register Thousands, right? So I don't care actually the issue with attackers is this they don't care
10:24
I remember once in reddit reading an Anama of a real botnet malware outer and he said yeah, I have I have I don't know something like 100,000 Bots and I can use them and I'm sending spam and some user was asking the malware outer. Hey, how are you?
10:46
Sending the malware and checking that the malware sorry Another malware the spam sending the spam and checking on the spam It's being read and which is your best way of sending the span in such a way and the guy say I don't care
11:00
Hey, but if you send the incorrect image the people will know and they will want to be oh, sorry Won't be able to open your email and you say I don't care You pay for me. I send your 1 million spam you don't pay. I don't sign you pay me I don't care if the email is open or not open or whatever
11:20
They're making a lot of money and they have a lot of resources So this is an issue and most of them they don't care. They just get another domain they adapt another IP. That's it They're blocking it regenerate the malware. It's difficult. Yeah, it's costly maybe but it's not impossible, right? So we have this issue here and with anomaly detection like I told you
11:40
Most of the time is very very difficult to know if it's working, right? So what are we going to do here? What what we are working in the university? It's in some behavior method but instead of Focusing on anomaly detection. We are focusing on the behavior of the malware traffic
12:00
So we go to the network and say okay. This is malware I know it's malware because I'm analyzing it and I want to learn Which is the behavioral pattern of the malware in the network? And that's what we are going to do now So the stratosphere IPS project is the core of the of the project in the university I'm I didn't say but it's in university in Czech Republic. So
12:24
You can find it online everything is published and these are the four Pillars or main ideas of the project. The first one is free software. Why why we want free software here It's not because we love free software. We love free software, but it's not because of that It's because we know the community it's able and we want the community to verify what we are doing
12:45
We need the people checking it than loading testing it and we need everyone saying hey, this is not working This is bullshit. Now. This is working. This is you have errors in here. We can make it better We can collaborate we can send you stuff. We cannot or stop doing that or something like that
13:00
So free software is one of our main pillars The second one is NGOs and civil society organizations so at some point this citizen lab people say that in their survey that the NGOs the non-governmental organizations They are in a critical situation because they don't have the resources to buy very complex tool for protection
13:24
Right, they cannot buy from very large companies But anyway, they are being attacked as a very very powerful government So for example, they work with the Dalai Lama in Tibet. It was attacked by China So in China is a very very powerful country and their attack the Dalai Lama complete with success, right?
13:45
It was completely successful the attack and they didn't have the resources. They didn't have the money the people they cannot defend themselves so we are focusing in this type of organizations where they are very very Amazing targets for the attackers, but they don't know how to do this. They don't know how to defend
14:03
So this is the second pillar of the stratosphere IPS project The third one is the machine learning and the behavioral models. We want to have our research working in the network We want to have listen to this our research to be useful, huh? So we want actually to go to the network and plug it and we want it to work and this is
14:24
Usually the research people don't like this so much right you are doing something. It's awesome You publish papers a lot of them and when you are trying it in a real environment Yeah, maybe it's not working. So and the last pillar is the verification We want this to be very very verified
14:42
We wanted to try as much as we can to see what what's going on how we are doing. It's having errors It's not having errors which errors why we have these errors. So these are the four pillars of the stratosphere Now How are we doing this? How are we working with machine learning in the traffic?
15:01
So we start with this idea of less is more So when we start working in machine learning You can be tempted to work with a lot of features and we are going to say that no, no No, no use less information. This is the first pillar that we are going to talk about The second one is that this association we are going to disassociate two models
15:22
And I'm gonna show you now and the third one is the verification. Okay, so this means We are analyzing the behavior of the connections Not the behavior of the host and not the behavior of the network This means that if you're going to the network, I don't care about the behavior of the three thousand holes
15:43
I care about one simple connection and that's why we are able to create this behavioral model Because if you try to create the behavioral model of a Computer itself. It's very complex. The user is very complex. So we are not doing that the second point is the disassociation and that means that the
16:03
representation of the behavior in the network How we look at the behavior, it's separated from how we detect the behavior okay, usually this is all together, but we are separating it and Finally verify the models with real data. We need real data here
16:24
so Less is more this means that when you connect sorry what you connect to any any other computer on internet Your behavior is the same so you connect to Gmail and you are checking emails You are chatting the way you chat the way you check Facebook the way you use a website
16:45
The web the sorry the way you use your bank account is usually the same all the time And this is going to identify your behavioral patterns, right? The second is that we grow group the flows all the flows in the network go into a specific
17:03
Service all together. So imagine that you're connected to Gmail web server So we get all the packets and flows that you are sending to that Port 80 of Gmail and we say this is your connection and we are going to analyze that and Finally since the connection is composed of several flows we can see the behavioral patterns in here
17:26
so In the case in the case of malware and in the case of you when you are using for example any web page You are going from one state to the other like chatting not chatting like downloading stuff. No than loading stuff
17:42
Like putting information in a web page not putting download in a picture look in a picture clicking in a picture You are going jumping from states to states and that's what we want to model, right? So each flow is going to get its own state in our model I want to give you one state for each flow you have and
18:04
Our model for the states it's based on four features and these are very simple, right? We are looking at the size of the flow the duration of the flow the periodicity of the flow and the time between flows So I'm not gonna get into the periodicity because it's quite an issue to to have that information
18:24
But you can see that this is very simple, right? it's like why are you using this you can have very very amazing features in here and The reason is that we try those amazing features and they're not working They're too complex right and when the model is too complex and
18:41
Then you go and check it and if the model is not working You don't know why or worse when the model is working and you're detecting you don't know why So at some point is very very difficult to work with that and that's why we have these four features in here. Okay? So what we are doing with these features we are creating this table horrible table the table
19:03
It's saying okay you got one flow Okay, and the flow has a small size and then the duration of the flow is maybe medium and the periodicity of the flow is Weak periodicity so I want to give you a capital V or if the periodicity is weak
19:26
And your size is medium and your duration is long I will give you a capital F or you have a weak known periodicity or strong known periodicity So we assign letters and numbers to each flow in the network based on these features Okay, and finally we are using some symbols in here
19:44
like the dot comma plus star and zero and it indicates the amount of time between the flows because having a periodicity of five minutes is not the same as having a periodicity of Three days right periodicity is periodicity, but it's completely different behavior
20:03
So we are trying to get this information here, right? And if the flow has a timeout timeout of one hour, we are putting a zero. Okay special symbol there So let me show you how We can look at this
20:22
Can we use it? Thank you very much. Okay, so so For example, I'm not sure if you can look at this. I will see Can you see that or not? It's very here upstairs. Can you see that? No completely no
20:44
Maybe if we can can we turn down the lights a little bit in here. Can we try that? No Okay, I will walk through it. Don't worry. It's it's horrible. Anyway, so so don't worry So each line here, it's a connection. It's one computer connecting to other computer
21:06
Connecting to some specific port so each Letter here identifies one flow, right? So here you can see that for example There is a connection to DNS service these red letters that I'm sure you cannot get from there
21:22
So this is R dot R dot dot R plus and if you see the letters There are no periodicity here because the periodicity the periodicity was Between the letter a and the letter I right? So if you go to err you are you are not periodic anymore
21:40
So if you look at this you can say actually Actually, this is not periodic and this is a port 80 port 80 and this is a normal connection This is a normal computer doing everyday tasks If you look at this specific connection, you will see that there is a very strange port 9,000
22:00
131 and you can see some periodicity here h min periodicity and this is a total connection So the web service of Tor when you are updating the Tor service you get some periodicity in there Right. So here you can see a lot of letters as you can imagine
22:20
Most of the connection is just one or two flows, right? Because it's normal web page you go to a web page download something download an image. That's it You are not accessing every web page for hours. So let me show you another one
22:41
So I will show you for example this one this is a malware that it's called flu and we will use it later and You can see here that it's also connecting to a lot of UDP and TCP connections This is not periodic not periodic at all
23:00
And then here we have some periodicity I comma I comma I comma H I comma H I comma H You can see a pattern here, right? So this pattern is very very It's very characteristic of this command and control right and here there is another command and control This is a very periodic connection and it's keeping on going and going and going. So this is one malware that is called flu
23:26
This is a real execution So Another one I Want to show you for example Okay, the new ref botnet right a real execution of the new red bonnet. So we executed this in our lab and
23:43
You can see here a lot of connections to port 80 and look at this Well, this is a command and control but this is not periodic right and this one Yes, this is periodic and this is not and this is not periodic at all Right, so new ref have different command and control and each command and control is having a different
24:02
behavioral pattern Actually, we can know that this is a type of command and control and this is another type of command and control and you can See here. Yeah. Yeah, it's completely Sending and now this one is is periodic and you can see also the pattern right you can see here Also some some timeout. So this is how the letters looks like
24:22
I want to show you for example, for example this one here
24:41
Okay, so this one is more difficult to see but you know what it is It's the activity traffic. So it's the traffic from your computers So this is how the computers here are behaving right and you can see some people get into some web pages UDP traffic TCP traffic and most of the people is just connecting to web page and that's it
25:03
Right, you can see that there are no periodicity is no command and control channel Nothing that it's behaving like something that looks malicious, right? So this is a very easy way of to look at a lot of traffic here is for verification Right, but the tool is looking it automatically and you can see like, okay
25:21
There is no model here that looks like a command and control channel or some attack or something like that Of course, we are not trying to detect like a specific attack like going to a web page and exploit and that's it Right, that's for that. You have antivirus You have a lot of tools. We want to see what's going on in the network. We want to see the behavior here
25:41
Right, that's why if you only have a very short attack We are not going to get it and actually we cannot the tool is not for that We want to see when you are being attacked like in an APT and your documents are being Exfiltrated for example this we can capture So wanted to see also, okay, I will show you last one is Zeus botnet
26:07
Okay So this is what net is a very very large capture is like 25 days and you can see a lot of traffic here
26:25
Look at this right a lot of traffic, but you see a strange stuff I will stop it just you can see so you will see strange stuff like zero zero zeros and Some periodicity but then this is not periodic but then it's periodic but then it's not again Do you know what this type of traffic is?
26:43
This is the Zeus botnet connecting to go go So these are all goal IP addresses and Zeus is using Google for a lot of stuff But you can see if you remember the normal traffic that even when you access go go Your traffic does not look like this So we can differentiate between a normal goal connection and some malware abusing Google
27:06
Right and here these are the command and controls, right? You can see a very periodic string a very periodic string in here the behavior even this one. Look at this It's periodic, but we have nine zeros. That means nine hours between two flows
27:22
So Zeus is sending a flow waiting nine hours and sending another one and we can capture like this Right, you can see here the pattern and we can create a model from that. So going back to the presentation So once we have these letters
27:42
What what what's going on with the behaviors in here So this malware is generating the same behavioral patterns over and over again when it's connecting with a command and control It's the same behavior Actually, we can see when the command and control is down because the malware keeps connecting but the behavior is different, right?
28:01
So we can distinguish these situations also Changing the behavior is very costly for the attacker because if you want to connect to all your bots at the same time and you Want to give orders at the same time? You need some type of synchronization in there and if you lose the synchronization It's more difficult for you. You cannot use all of them at the same time
28:24
It's maybe more difficult to make a do s attack right? So at some point you can change the periodicity. It's okay We can still capture the change But you you want to connect if you don't want to command and control you cannot control your bots So you need a command control any command and control right? And that's what it's costly for the attacker
28:45
So this behavior does not expire easily. Of course The infections can go unnoticed for hours So how much time are you willing to wait for a solution in your network? Usually we say I want real-time detection real-time I want to see the red light there right very very quickly
29:03
But actually the computers can be infected for hours or day and nobody knows and nobody cares, right? So and you and you can tell the administrator. Yeah, go there and clean that computer and it's gonna take hours So there is no enough time here to capture the behavior. We need this should not work in one minute, right and
29:22
Finally we collect both Normal and malware behavior. We want them both. We need to know what's normal. We need to know how normal looks like and then we can Implement these so how can implement the detection for this type of traffic? Okay, so this is the first part
29:44
You remember that this association the letters is how you look at the models. We are not doing detection in here so far No, no fancy machine learning. So now that we want to implement this for detection, how are we going to do that? So the stratosphere project now it's implementing two models and two more are under working
30:02
I will talk about the first one that it's the interpretation of the Transition from one letter to the other letter as a mark of change. Okay, so How are we doing this? Okay, this is very very easy stuff Actually, if you have the letters in here a comma a comma z plus d plus d plus
30:25
We are looking at the transition from each letter to the next one and this Transition we model it as a mark of change. That means that we have a matrix in here that saying okay, so the probability
30:40
To go from the letter a to the letter comma It's one or 100% and the probability from comma to a is 0.5 and like that for everything So we learn this transition probabilities. We create a matrix and the matrix can be looked like this This is the same right just a diagram But it's like okay from going from a to comma the probability is 1 from comma to z
31:05
It's 0.5 from C to plus so we can model how the transitions Were in the original malware, okay And when we have these transitions we create this mark of models of the known behavior
31:20
So we can look at Zeus for a long time or Muref or any botnet and we can capture this model This behavior we create the mark of change the matrix everything we need and we have the model ready. Okay Now that we have these models that I know what they are. This is a command and control that it's down This is a command and control that is working. This is another type of attack now
31:44
We can get a known traffic for from an unknown network and we can try to compare and say okay the question is Which is the probability that this traffic was generated with this model? That's why we are using the mark of change. So I say okay. Well, the probability is actually very very low
32:05
Okay, and which is the probability of being detected by the second model? Okay, it's like that and the third one and then we choose I'll say okay from all this model including the normal ones The probability that you were generating by command and control botnet. It's this one
32:21
So I will say that you are a command and control. Okay. This is how it's being detected at the end It's not perfect. Of course, but so far it's working so I Want to show you some more stuff. So The first thing I will show you is how to see the difference between two models so
32:45
Show you What is this no go away okay, so so I will show you the difference between a Malware that it's called Zeus you don't see there. It doesn't matter. It's called Zeus and a malware is called the valve
33:04
Oh, I'm not sure the name exactly So the second malware mouth bubble was created about some people that may be here in the audience I don't know the crisis lab people there. So thank you very much. It was awesome You go go and see his talk later because you will learn a lot They created this amazing malware and they are trying to see how other tools detected right? So
33:26
This is amazing for us because it's like okay something that it's very real and very difficult and very well done So most of the people that it's trying to detect that in the computer they are trying in the house, but we are going to see How I have this one here. So how we can
33:42
See the traffic between VAVO and other malware like Zeus. So this detection that we are going to do Sorry, I copy the wrong line I'm going to show you what happened if we compare the model of VAVO with the border of
34:02
Zeus and this is a comparison that it's saying Okay, so the distance between the VAVO malware and the Zeus malware is actually It's actually very close to one here These are the first ten flows and this means that they are quite similar The behavior of VAVO is quite similar to one of the behavior of Zeus
34:23
But if we keep looking at these not ten flows But give me thirty flows and we can see the behavior start to change Starts to diverge and if we want more flows like 50 flows It's more the more difference and if we want 100 flows, it's more different
34:43
So every time we are putting more flows We can see that the behavior between VAVO and Zeus starts diverging This means that the early behavior of VAVO and this Zeus is similar But later on in the network, they grow apart
35:03
So we are trying to use this Zeus behavior that we know for a long time to detect the VAVO traffic in the network So for doing that for doing that I will show you something like that So this is gonna I hope you see something in there
35:22
So I'm gonna run sonic experiment in the tool I didn't tell you but this is a stratosphere testing framework is one of the tools we are having in the project for experimentation So I'm gonna run an experiment. I'm gonna say okay use the Zeus model and Get all the traffic from VAVO and tell me how you are detecting it. Tell me when you are detecting what right?
35:46
So when you run this, I don't care about the description when you run this it's saying, okay I'm going to separate the traffic every time slot Sorry, like five minutes or ten minutes of 15 minutes and in each of these time slots
36:02
I will tell you if the models are matching or not. I will tell you okay Yeah, I detect something or I didn't detect something. So I will go back here or up. Sorry and You can see Very quick, but you can see here that in the first in the first time slot
36:22
They're saying okay starting the time slot here and it's from zero zero minutes to five minutes you can see that there are some IP addresses in the traffic and There are no ground truth label. That means that when I was looking at the traffic There is no indication that this is a malware behavior yet
36:42
It's just some packets in there and also we didn't predict nothing. So there are no detections No known traffic in there. Nothing happened so far in the second in the second Time slot. I'm sorry. I use the Blue letters in here that it's horrible from a design point of view. I'm sorry, but you will have to believe me here in this
37:05
Blurry stuff it says button it so it means that the malware in here, it's using this IP and We know it's VAVO. It's using the command and control and we put the label button in there for sure This is a botnet or malware, but we didn't detect it. So our model is not matching here
37:24
So in the first two time slot, there are no detections That's why the type of error is false negative because we miss it We didn't capture but in the next time slot, that means that 15 minutes later We were able to detect the VAVO botnet with the Zeus
37:42
Command and control model. So we have a true detection here. We have a true positive This means that at this point the model match and we were able to capture it We were able to detect it. But if you keep looking the next The next time slots we didn't detect it because you remember that I showed you that the models were diverging
38:02
The models get different from time to time. So after some point this is not similar anymore But was enough for the detection Okay, so this is one way that we can experiment with this and at the end of the experiment Yeah, you have what's being detected or not. You have all the fancy measures true positive rate precision
38:21
You can have them all And then you can see okay, is this model enough for detection or not or we need more, right? So this was an example of using A Zeus model today for detection of VAVO. There are other models. We try also with model called I don't remember now. It was called
38:44
Votnet, okay Another botnet that I don't get now That also was able to detect VAVO and then we can use now the VAVO model for other stuff Of course, if I'm using the VAVO model for detecting VAVO it will detect it, right? But hey, that's cheating because I don't know VAVO in advance. I should find VAVO in the network
39:04
What using whichever tool I have that's why we are using the models already In the database for detecting the new unknown traffic, so I will continue with the presentation Yeah So we can see the distance between models we can experiment
39:23
Okay, use all these models in this traffic tell me what you find and especially tell me what and when you find it And this detection is done by generalizing the models so I won't speak about that here but Sorry, the Markov chains models can be generalized in such a way that we are detecting similar traffic not exactly the same traffic, right?
39:49
So I want no no, I want to say something here, sorry, yeah here I see here the verification Usually the people is asking. Okay, and is it working or not? It's like if you go to any antivirus company or any protection company and you say hey your product is working or not
40:05
How do you know? I don't know maybe yes, maybe no it depends in a lot of stuff There is no easy answer here. If somebody is telling you. Yeah, our product is working very very amazingly I will thought a lot because if I change the network if I change the attack if I change the timing
40:23
If I change the normal people it maybe if I change country your detection is going to have some issues for sure Right, that's why nothing is working. So so well, so for us is very important the verification So yeah Our mo our model is working when with this data set with these levels with these people with this
40:43
Traffic and this way of verifying because you remember I show you the experiment using five minutes time slot If you are using ten minute or you're losing one hour The results are completely different if you're using one minute is completely different. So also it means how do you
41:00
Consider a detection successful. For example, you have a malware in there and you want to say yeah, I can detect it What what can you detect? Can you detect the whole traffic? Can you detect each packet as malicious and say no well each bucket maybe not Okay, can you detect each flow each connection each IP address? What what can you detect?
41:21
So it depends how do you count the detections and then you have the final statistics saying? Yeah, we have a false positive rate or F measure of ninety nine nine nine nine nine percent, right? So be very careful where the people is giving you this type of results and you say yeah, it's amazing or no It's not working. Maybe it is So
41:41
We already say depends on the data set the timeframe and the verification method And that's why we are using and we are publishing a very large data set of malware traffic You can find it in the page of stratosphere IPS dot org You can go there and there is our data set. You can unload it a lot of
42:00
Labels in there. You can ask questions ask for new data sets or whatever because we need this to be verified, right and Having malware data set is very very difficult But having normal data set is far more difficult But in the data set we have a lot but normal ones Who can have the traffic there with the labels that is saying? Yes, this is normal
42:26
So we are doing this that is very slowly. We are going to any Computer and checking like is this normal? Okay, show me the computer. Yeah, you're not infected. You're not doing something stupid You're not attacking or whatever. So, okay. This is normal traffic So we are very fine it host by hose and that's why so important for us this traffic
42:46
Finally we want to compare approaches what what other tools are doing with this data set What are they detecting what they are not detecting and this is very very important for predicting the performance So I will stop here
43:00
I want to say that the network behavior for us are very very important And we think that this type of work using machine learning artificial intelligence Especially on behavior. It's gonna give us very good tools in the future So that's it. That's the way page of the project if you want to go. Sorry the people upstairs
43:22
You're not gonna make it stratosphere IPS org That's it So Any questions Oh Yeah, yeah, okay
43:49
so the question was what about traffic like streaming or computer games and this is a very nice question because This is specific type of traffic can be very very tricky right? For example, you have issues with tunneling protocols
44:02
VPNs Not when you have like 1,000 computer behind one not or even with DNS The just the simple model of DNS imagine this you are in a computer using DNS traffic, right? Normally because you are normal I hope People doing normal stuff and then you're infected and the traffic of the DNS is mix
44:23
Your traffic and the malware traffic is mixing one connection Oh, so you have two different behaviors generating similar packets and they're very difficult to distinguish So so far all the gaming we saw on the streaming we can differentiate for example Our worst enemy I will say is the online music like online radios
44:45
This type of websites are generating a periodicity That it's very hard to distinguish from other malware. So we have to be very careful with this model That's why we are training them and to each model we can put it doesn't matter some thresholds here saying
45:02
Okay, this model is very good. This model is not so good So when we are using it, we know where to draw the line, okay, okay Don't detect with this model so much because it's matching a lot of false positive for example, right? and then the other part of the question is that In the future what we are going to do is that we are going to get all the behaviors of your computer
45:23
And we are going to take a decision based on all of them So I don't care if you are doing something malicious what I want to know if are you also doing something normal? Are you doing something like command and control like how is the behavior or all your computer at the same time?
45:40
So that's better for differentiating these Very weird protocols. Yeah, but they are tricky. It's true Sorry, another another thing related with this is that when malware start starts mimicking the normal behavior of The people that's why it's very difficult to to detect it, right?
46:04
the what sorry Servers No, not so far. I have to say that if you care about okay. The question is within this sense I don't care or the people don't care so much being attacked because you have you have a lot of way of Stopping that but the issue is that you cannot detect when you were attacked successfully and you are
46:27
Sending information and you are communicating when the attack was successful So that's why we want to detect when the attack was successful and nobody's detecting non antivirus not the firewall No, the normal detection and you don't know what's going on So that's in the place that we want to say, okay, we can detect that right the rest of the attack
46:45
We leave it to the firewall or or the administrator. Yes another question. Yeah The people that couldn't read the slides no question. Okay. Thank you very much and enjoy the rest of the conference
Empfehlungen
Serie mit 24 Medien