Merken

Mining for Bugs with Graph Database Queries

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
so welcome to this talk on mining boxes with graph database queries of just for my information and I didn't talk like this before at last year's CCC was seen the talk of the same title nobody great circles just taken the slides there already had but OK and make some new ones as well so how well hopefully it's even better from again so this
talk is about vulnerability discovery and if you have nothing to do with vulnerability discovery just to give you a short working definition of vulnerability discovery is the art of navigating inside piles of junk the goal is to uncover small programs that we can use to our vantage so I think this is the so I was searching for a picture for a for very long time which kind of I know gives you communicates the feeling that you have when you 1st see a new code based and somebody tells you please find something explainable in here and I think this picture of the fits it's pretty well now the idea of my
work is to apply pattern recognition vulnerability discovery and in contrast to many approaches that you see in particular in academic work I don't try to focus on exact methods we reason about like confined regions of protein great detail and I you assume that you are you know can can exactly model all of the instructions of of your of your of language that you looking at but instead of I'm and taking a and taking a of a pattern recognition perspective where I try to recognize as opposed to detect parts so this is mostly inspired by all of the books to see you the real all of which is which is usually a which is more from the engineering field where you can aware of the fact that all of this stuff that you're dealing with this very imperfect so you looking at it in terms of the signal and noise and the question is whether we can use these methods and can apply them to what traditionally and people have been doing very exact methods can represented by the books that you see on the top now the an important thing to this is that I'm not trying to build something that automatically detects but instead of this is this is supposed to be tools that help the auditors in day-to-day work so in a way they are and the in that they enable people to ornaments from what can be automatic but also make use of their ideas and of their knowledge of code base to spot honorable now
I've been doing this for quite a while now and interesting and notice it's been 3 and a half years and fortunately and I finally finished my thesis and this and this is going to be coming out very soon so November 15 so if this stuff seems interesting to you please take a look so you know this this is a document that's so you know you write it for very long but then nobody actually takes a look at it so if you ever want to hide any secrets somewhere you best student pieces of but hopefully you know maybe that 1 person or 2 in the room finds this interesting actually checks to see how that would be very very cool OK so
we're going to be looking at 2 chapters essentially this and 1 of them it that's like the basis of all this work that's it's a graph mining platform for vulnerability discovery out and build on that graph mining platform on my explore the 3 main directions of machine learning of unsupervised machine learning and see them whether those directions with those methods can be used to help us find lots of and I can't tell you about all of these in the talk but I'm going to talk about 1 of them are which is how clustering can be used to help you identify 1 abilities and the nice part here is that actually in and this this method will again generate queries for the graph database system so but you actually see exactly what the system has learned on so this is something that is very often criticized about machine learning that you know you can easily create a program that kind of classify something but you don't really know what it does so in a security context but that's not really good but but you know if it actually outputs a query in the and then you can see what it has learned and if it's rubbish then you also see right so as to
the 1st part of this is about the graph mining platform and this combines the 2 what to things which seem to have not so much to do with 1 another when you start off on 1 hand we have a little computer science compiler construction so what compilers due to optimize your code and analyze you produce removed it was stuff like that and on the other hand you have the shiny new graph database and and the question is how can we combine these 2 actually getting a good system for that's wanted but the discovery and
I would start of with an example of this is an example of a block from by famous German hackers defined as a and it's a bug that you should use a scanned 13 talk and what he showed a lot of really cool box that talk and this was moreover more besides finding you could say so but it's interesting because of the way he found it and it's also interesting because it looks very much like a lot of other blogs so that that you see all over the place so let's quickly go over this but there is a rival called name land and it's a 32 bit integer and I'm apparently it comes from a network of because we're using a function called the in the end to each user with the with converting it from that work by court order to by orders so this looks a lot like this might be from the network and and what then happens is we allocate a buffer and we add 1 before allocation and so clearly if we control this name Linfield completely even if we use a max maximum integer and what will happen is we get an awful and actually it allocates 0 bytes and then it copies mainland bites into the buffer so you get the base buffer overflow so this is a very classic see this all over the place and the question is how we define this spot and particularly when when you talk to academics they will they will you know expect something really involved right now so do use white box further enhanced symbolic execution was some machine learning powered animal detector for theorem proving or model checking but it turns out that all they did
was use graph right this is that this is the regular expression that that use to to uncover the box and you can see kind of looking for locations and inside those allegations and yeah they're supposed to be a summation or you know any any sort of arithmetical operations and what encoding here at is the fact that you know if people do arithmetical inside the allocation and that's that's often resource for the base buffer overflows traded by integer overflows because obviously but the there's no sort of check after the the uh after the summation or whatever it is so they're going to be a lot of cases where this is not problematic maybe the check was done before so we know these of good points to start and I get I think we
can learn a lot from this so 1st of all it even if you take something as primitive as correct and this can be a very powerful tool if you will allow an analyst to actually guide to the analysis if you allow them to introduce some knowledge they
Graph
Quader
Datenhaltung
Eins
Data Mining
Datenhaltung
Rechenschieber
Graph
Abfrage
Grosskreis
Softwareschwachstelle
Information
Data Mining
Formale Sprache
Anwendungsspezifischer Prozessor
Geräusch
Mustererkennung
Term
Analysis
Dialekt
Code
Datenhaltung
Graph
Datenfeld
Abfrage
Softwareschwachstelle
Perspektive
Code
Mereologie
Softwareschwachstelle
Kontrast <Statistik>
Optimierung
Mustererkennung
Data Mining
Fitnessfunktion
t-Test
IRIS-T
Systemplattform
Analysis
Statistische Hypothese
Datenhaltung
Richtung
Data Mining
Virtuelle Maschine
Graph
Maßstab
Primzahlzwillinge
Softwareschwachstelle
Algorithmische Lerntheorie
Optimierung
Gammafunktion
Graph
Konvexe Hülle
Datenhaltung
Computersicherheit
Gebäude <Mathematik>
Mathematisierung
Systemplattform
Abfrage
Model Checking
Physikalisches System
Kontextbezogenes System
Mustersprache
Abfrage
Verschlingung
Softwareschwachstelle
Mereologie
Basisvektor
Dynamisches RAM
Simulation
Data Mining
Betriebsmittelverwaltung
Bit
Quader
Web log
Extrempunkt
Gewichtete Summe
Systemplattform
Computeranimation
Datenhaltung
Übersetzerbau
Data Mining
Graph
Puffer <Netzplantechnik>
Virtuelle Maschine
Model Checking
Pufferspeicher
Ganze Zahl
Theorem
Softwareschwachstelle
Hacker
Informatik
Betriebsmittelverwaltung
Lineares Funktional
Übersetzer <Informatik>
Graph
Datenhaltung
Güte der Anpassung
Spieltheorie
Physikalisches System
p-Block
Programmfehler
Abfrage
Ganze Zahl
Pufferüberlauf
Mereologie
Speicherverwaltung
Data Mining
Betriebsmittelverwaltung
Nichtlinearer Operator
Punkt
Gewichtete Summe
Quader
Graph
Datenmodell
Vorzeichen <Mathematik>
Analysis
Quick-Sort
Computeranimation
Datenhaltung
Regulärer Ausdruck
Graph
Abfrage
Pufferüberlauf
Ganze Zahl
Data Mining
Analysis

Metadaten

Formale Metadaten

Titel Mining for Bugs with Graph Database Queries
Alternativer Titel Pattern-based Vulnerability Discovery
Serientitel Hacktivity 2015
Teil 20
Anzahl der Teile 29
Autor Yamaguchi, Fabian
Lizenz CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/18836
Herausgeber Hacktivity
Erscheinungsjahr 2015
Sprache Englisch

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract While graph databases are primarily known as the backbone of the modern dating world, this nerd has found a much more interesting application for them: program analysis. This talk aims to demonstrate that graph databases and the typical program representations developed in compiler construction are a match made in heaven, allowing large code bases to be mined for vulnerabilities using complex bug descriptions encoded in simple, and not so simple graph database queries. This talk will bring together two well known but previously unrelated topics: static program analysis and graph databases. After briefly covering the "emerging graph landscape" and why it may be interesting for hackers, a graph representation of programs exposing syntax, control-flow, data-dependencies and type information is presented, designed specifically with bug hunting in mind. Our open-source program analysis platform Joern is then introduced, which implements these ideas and has been successfully used to uncover various vulnerabilities in the Linux kernel. Capabilities and limitations of the system will then be demonstrated live as we craft queries for buffer overflows, memory disclosure bugs and integer-related vulnerabilities.

Zugehöriges Material

Ähnliche Filme

Loading...