Panopticon

Video in TIB AV-Portal: Panopticon

Formal Metadata

Title
Panopticon
Subtitle
A Libre Cross-Platform Disassembler
Title of Series
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
2018
Language
English
Production Year
2017

Content Metadata

Subject Area
Abstract
by At: FOSDEM 2017 Panopticon is a graphical disassembler written in Rust that runs on GNU/Linux,Windows and OS X. It aims to create a free replacement for tools like IDA Proand BinDiff. Analysis of closed source applications is necessary for security practitionersand FLOSS-developers. Roughly 1/3 of the FSF High Priority Projects willlikely involve analysis of binary applications, on top of the dedicatedreverse engineering projects. Yet, the majority of the tools to do this areclosed source themselves. This problem extends to academia where the lack ofstable and extensible binary analysis tools forces scientist to implementtheir prototypes on top of proprietary software. This further hampers adoptionof the -- publicly funded -- research by practitioners. The Panopticon project aims to develop a tool to end the dominance ofproprietary software for reverse engineering. What sets Panopticon apart fromother free disassembler is that we believe an intuitive GUI is paramount toaid human analysts to understand as much of the binary as possible. As suchPanopticon comes with an Qt 5 UI written in QML that allows browsing andannotating control flow graphs. Panopticon implements semantic-based analysisto resolve dynamic jumps and calls. We believe that a disassembler that knowsassembly code semantics allows automation of common reverse engineering tasksand provide aids for manual analysis. The talk will touch on the vision and architecture of Panopticon. The targetaudience are reverse engineers interested in FLOSS tools and/or advancedstatic analysis as well as developers who want to contribute to the project. Room: H.1308 (Rolin) Scheduled start: 2017-02-05 09:30:00
Loading...
Machine code Code Ferry Corsten Multiplication sign Source code Parameter (computer programming) Mereology Semantics (computer science) Computer programming Fluid statics Electronic meeting system File system Information security Physical system Injektivität Block (periodic table) Cross-platform Software developer Binary code Electronic mailing list Bit Sequence Flow separation Interrupt <Informatik> Figurate number Freeware Reverse engineering Directed graph Functional (mathematics) Statistics Computer file Open source Similarity (geometry) Branch (computer science) Computer Natural number String (computer science) Energy level Communications protocol Compilation album Hydraulic jump Disassembler Graph (mathematics) Assembly language Information Interface (computing) Projective plane Mathematical analysis Line (geometry) Binary file Cartesian coordinate system Graphical user interface Software Personal digital assistant Finite-state machine Musical ensemble Object (grammar) Communications protocol Window Disassembler
Dynamical system Scripting language Equaliser (mathematics) Tracing (software) Computer programming Data model Fluid statics Different (Kate Ryan album) Error message Personal identification number Constraint (mathematics) Block (periodic table) Search tree Binary code Electronic mailing list Bit Instance (computer science) Sequence Hand fan Hexagon Order (biology) Figurate number Reading (process) Point (geometry) Abstract interpretation Momentum Computer file Control flow Branch (computer science) Approximation Intermediate language Inclusion map Term (mathematics) Energy level Data structure Compilation album Pairwise comparison Matching (graph theory) Information Assembly language Artificial neural network Interleaving Code Line (geometry) Limit (category theory) Cartesian coordinate system System call Frame problem Compiler Graphical user interface Embedded system Integrated development environment Personal digital assistant Network topology Statement (computer science) Interpreter (computing) Video game Control flow graph Greatest element Run time (program lifecycle phase) Code Multiplication sign Set (mathematics) Mereology Semantics (computer science) Public key certificate Subset Formal language Facebook Bit rate File system Flag Endliche Modelltheorie Position operator Algorithm Type theory Architecture output Right angle Reverse engineering Functional (mathematics) Vapor barrier Real number Virtual machine Prime ideal Tablet computer Robotics output Hydraulic jump Addition Graph (mathematics) Abstract interpretation Mathematical analysis Residual (numerical analysis) Password Electronic visual display Communications protocol Abstraction
Group action Run time (program lifecycle phase) Code Multiplication sign View (database) Set (mathematics) Mereology Computer programming Front and back ends Programmanalyse Fluid statics Derivation (linguistics) Semiconductor memory Interpreter (computing) Endliche Modelltheorie Abstraction Library (computing) Electric generator Block (periodic table) Network operating system Bit Repository (publishing) Functional (mathematics) Implementation Open source Computer file Hidden Markov model Microcontroller Raw image format Bit Intermediate language Architecture Operator (mathematics) Representation (politics) Energy level Disassembler Computer architecture Information Assembly language Projective plane Debugger Computer program Mathematical analysis Code Cartesian coordinate system Personal digital assistant Network topology Interpreter (computing) Routing Library (computing)
Run time (program lifecycle phase) Code Ferry Corsten Multiplication sign Workstation <Musikinstrument> Parameter (computer programming) Mereology Computer programming Software bug Formal language Malware Pattern matching Computer configuration Social class Control flow graph Computer icon Algorithm Block (periodic table) GUI widget Constructor (object-oriented programming) Electronic mailing list High-level programming language Virtualization Bit Perturbation theory Type theory Process (computing) Graph coloring Repository (publishing) Website Pattern language Freeware Functional (mathematics) Freeware Link (knot theory) Real number Typinferenz Virtual machine Control flow Element (mathematics) Twitter Intermediate language Telecommunication Hierarchy Software testing Compilation album Alpha (investment) Computer architecture Information Validity (statistics) Assembly language Military base Projective plane Expression Counting Line (geometry) Cartesian coordinate system Complex number System call Compiler Personal digital assistant Logic Mixed reality Interpreter (computing) Iteration Family Library (computing) Disassembler
Musical ensemble
[Music] so hello welcome to the security dev room get a seat and we'll start by introducing Kai from Germany who is going to talk about panopticon either a deception disassembler welcome thank you I'm Kai from Bahama Mecca and I'm today here to tell about an observed project called panopticon which is a cross-platform Libre disassembler so I will first talk about about goals of the project and then we will come back to reality and see how the portrait actually is implemented now and if the time isn't up there yet I will tell what about architecture but first I want to make the case for why we need such a tool and well when you're in security especially we need to the assembler for analyzing proprietary software for example finding bucks in tools like Windows and analyze more where because most of the more that doesn't come with source code attached and often when we are just free software developers you have to want to implement free so for replacement for file systems or natural protocols that are implemented in proprietary - it's only so we have to rip those - as a part - that's what we need dissimilar to and most of the tools we even use now especially in security work all proprietary so the idea about the project or project is to build a replacement for these proprietary tools and so when I'm talking about reverse engineering I'm talking about binary rezonings are we only concerned about elf binaries or P binaries that implemented in machine code and what I'm also not talking about is automatic reverse engineering so what we are doing here is mostly about money reverse engineering so practical nest is kitchen sink approach where you have one tool that does everything and everything is integrated and at your fingertips and you have an integrated graphics user interface to allow you to surf the code and figure out how that application does so this assembly always starts with this assembly and this is where most of the herbs Aust will stop so we have a binary code we can reverse the last assembly the last step of the compilation the assembly because it's more or less the one-to-one mapping between bits and the assembly code listing and for example tools like object donc just dump the assembly code onto the console and get to read it but that's not something you can really use and in reality because most of these tools have millions of lines of assembly listing and we are not interesting in 99% of this code because if you know a bit about programming we already know how it's implemented what we're interested in is this little part in the application that implements the state machine for a network protocol on a check stat file system or that implements some kind of picked off so what invest was do is what's called static analysis so it takes that assembly code listing and tries it cuts it down into chunks so the concept of functions for example exits exists on the same level so we can separate a code into functions and then separate functions in something called basic blocks which are sequences of assembly code instructions that are executed without interruption so we know when the first instruction is executed execution will continue on to the end of the basic block and then we have a jump or a branch so our tools try to recover this information be there nice graph and then comes the last part which is often overlooked especially by open source tools and I think this problem is mostly cultural and we have to get the information that's in the computer into the brain of the user so we have you need a graphic user interface for an interface part is the important one and we what we essentially have to do is to transform the information in the computer in a way that our brains can understand it so with panopticon we take a step that's not often done by Muslims or to is that the graphic user interface is a integral part of the system so when you implement a feature or what you want me to implement a feature you have to tell me not only how it interacts which is a similar part in the static analysis part but also how do we represent information we gather to the user in a way the user can actually use so I'm not just dumping into a text file or something like that but I'm turning in the picture or something like that that we can actually help the user to understand the binary so that's really important another thing most of the even proprietary to a slack is Alice's best based on semantics so what most tools do is they know how assembly code looks like so they know this bit pattern turns to this monic and the morick is a string every know how the arguments look like and their arguments are also strings and that's pretty much it it just gives it a dead code to you for you to read but what's more interesting is when we have
a tool that actually understands the semantics of the code at one time so what Polachek on does it implements an intermediate language is a bit like use and compilers there are so for every monic we recognize we as a generator a short sequence of intermediate language this is easy to analyze that implements the semantics of this of code at runtime and when we have the semantics we can do errors on the semantics instead of just at the syntax of what looks like and I will give you two examples what we could do when we have two semantics the first one is called abstract interpretation the basic idea here is that we have an analysis across all possible paths throughout the program and instead of just looking at one path and a one value at a time we just replace concrete barriers with sets of values or abstract of set of values so what I mean with that is here bitch explained so we have to see code on the left and that implements the switch statement it's just a bunch of cases and when you have a certain set of cases for value we print Prime and if it isn't of course these all Prime's if there isn't the case so we return false and what GCC does from this code is it will permit some reassembling a binary search tree so if you first start with the middle case which is 11 I think and well look if value is equal to mara 11 and if it's equal of course it jumps to the basic block that implements the print and if it doesn't it compares whenever value is larger or smaller than 11 and then branches according to that and so you have some kind of tree that unfolds the bottom and at the bottom you have the fourth case where everything together so what we often are interested in is okay what are the values that causes the printf to fire and of course we're in via an experience reverse engineer we can always see that this is a binary search tree and the spurious which case statements of you decode and check for are equal for a comparisons and the equal jumps and then we see that all equal jumps flow to one basic blocks Facebook what extra interpretation can do is ultimate execute that so it can execute the code figure out okay when this jumps is taking a value must be 11 and this term taken where you must be 19 and it can take the superset of all the possible values and show us that they are limits for that of course doing abstract interpretation of course for across the COBOL program is hard especially any of things like IO but again you can do this manually but I'm having a machine to do it for you and presenting it to you and helps you to concentrate on the big picture and do that what the machine can't analyze which is inferring what this means or what means when well is 11 for example so just giving hints to the user well I believe make a resonating way more easier another thing is called bond model checking as opposed to absurd interpretation we are before marching we only care about one Pacific password program that is feasible under a set of constraints so one example where we could use this is okay this call is a bit artificial but it implements some kind of sanity check on a network protocol or a file system followed so we first we have two inputs a and B let's just answer integers and if we first check that a is smaller it B and then a must be 0 and then we multiply a by 3 and invert me and then we add them both today together and we expect this to be X decimal 42 so when we compile this we get something that looks like we have on the right so we have all checks and the true branches are a4 through here so we want to follow the red lines and the last basic block is the one we're interested in that well again prints there okay so of course we interested so what that's impetus look like in order to let printf being executed and of course what we do with as experience residue devices we execute it called backwards in our mind you gotta check your first positions okay the addition has to be Oh X 4 2 and then we trace the call backwards and what we do in reality is we write a short person a program that just in over 800 cases until we find one so at least this far what binary robot monkey can do is generate and well am more or less for me I found that we add a bunch of constraints and then we throw it into the magic binary model checking algorithm and we get will give me give us and possible trace throughout a program that will hit that basic block so what we do here is ambiente constraint that that last jump is taken which just means that zero flag has to be one and then the model checking algorithm will look for a possible set of various that fulfill this constraint and give us the ways and including the traces up there so we see on the top there we need a to be Oh X 15.1 for a six and be something else and what's very nice about this is that you can add additional constraint so maybe you okay you can see a but you maybe there are some checks before that you already saw that check that a isn't this value so you can add another constraint that okay we want it chopped to be taken but as we don't want that eight to be that value we can start algorithm again and it will find another solution over tell us that there's no solution all we could try to compute forever and a good question but these are three Possible's so just as a reminder the difference between have certification and pawn watchings with us protection we are looking at all paths at the same time it was poor not showing which was taking it looking at one path so as I thought that some other features I would like to see and ascending order of outrageousness what we'd really really nice open optical is meant as a
static analysis tool but having dynamic information is always very helpful when you have future applications so of course with the symmetric informations you could simulate the ball program but this is very expensive and especially with ball taking you came through this on real life application so it's pretty much impossible to do born worth checking on a wall chromium instance for example so having the ability to include traces from pin for example dynamic ro or just at gdb it says would be really helpful and Vera would like to see is that we can matched traces on to the control flow graphs and that can tell us ok when you have this input under this environment control flow flows like that and when I change that value or that part of the environment control flow flows or that we have fun I think and the last thing is we already have two traces we have four we have pin we of Diana our i/o we just have to implement the matching and the reading of these traces of course you always need scripting support when you have a powerful tool you want to automate things so I'm betting any type of coding language would really bear for I would prefer to have only one and I would like I don't want to start only any a long H bar so but we can do pretty much everything will be pricing the rates I'm not a fan of this so in if week in case you want Gaea that may be a longer discussion but I can live with everything and well when you want to replace I'll a pro you have to replace hex race so the compiler would be pretty nice even addy copilot doesn't really decompile the C code you get out there isn't really secure as was written especially when the program wasn't written in C and but you only get that code so there's this kind of porn well checking can can be done on C but there's no real use in doing and C instead of in assembly code but the control flow structures the FNC are easier to reach your control for graphs always planner we have high level type information makes it easier to read real-life applications so maybe some kind of giggle party would be nice this isn't as impossible as it looks like especially if when you have semantic informations you can use abstract interpretations for example to recover stack layouts and the use of sec frames throughout the program and then you only have to do a type reference I know back to reality this is all nice and part of this is implemented especially certain interpretation part but aside from that the program isn't as far far as I wanted to so how does look like bit like that so we have a graphic user interface it's in QT you can open the application you can open a file and then start this assembly at entry point and Vivian listen functions you can click the list of functions you'll get a control flow graph you can pan around consume click on one of the lines at comments safety or thing that's pretty much it we can disassemble into architectures as well as two of the
smaller 8-bit microcontrollers we have somatic informations for the 8-bit microcontrollers pretty much complete and well into this another thing we have wallet 500 memoria Sinitta and so you have to write the somatic information for 500 or so of course but this isn't as big as it looks like because when you look at real-life applications when you implement around 100 120 of the most popular of course you already have 90% of everything that's in there and as I said before we are not really concerned about global program analysis we just want a local reasoning about that function or the set of basic blocks so what happens at runtime how do I get to the path there so it isn't as important that we have had 100% that we are one our percent precise we're not trying to do this simple execution or do automatic export generation we can open your efforts I
actually have a pull requests open pretty much now I will merge when I come home and get a bit of sleep and so we will and support p-52 and aside from that we can roll the raw flesh NOS where we are which isn't that complicated either the project is hosted on github we have open development model we use the issue tracker there so in case you have a question you can open an issue and we try to answer it and if you have a pet you can set it as per request so I have a bit time left so I will talk a bit about the architecture the application is in one repository but it's two parts so we're for library that does all the disassembly the static analysis and representation of the code and it's written in rust in case you never used Trust it's not that much different from C++ for example so I started rust one and a half years ago and it took me three months or so to understand it in a way that I can program truth like that so just a complete cake that complicated we have the graphical front-end which is a bit rust to interact with library and of top of death SQL that's some kind of JavaScript derivative that's used by kids to implement Richards hmm so when you clone the repository you see stuff like that um the library consists of around 20 files that are more less named after the thing they do we have strict interpreter interpreter and then we have two assembler for amd64 AVR and MOS and then we have some kind of tree of a representation of the program so at the lowest level we have two monix which of Malakar s memorials are grouped into basic blocks of ice blocks are grouped into functions functions are groups and programs and provides a route into a project and the project is at the top level note of what saved in the application and we have two data definitions in there here's a IL RS which is the definition of the intermediate language we use there in case you're more on the academic side view israel but it's um it's a director for a and we have some custom functions custom operations in there well the front end
isn't that complicated we have a bunch of rust fights to communica free library and to the layouting for big colorful graphs and I'm aside from that their film folder called qml where all the cue mel fights live each five elements one richard there isn't that complicated so in case you never use javascript it isn't that far it isn't that javascript e as you would expect as a QT has really nice documentation so we could check that out it's pretty straightforward so
in case you interested and may want to help me and or just want to check out the project we have website we do you can also we have on the website the link to the api documentation the user's documentation we are you can also jump to the git repository directly and also if you have a question you can reach us on the free notes channel and also we have a Twitter account where we mostly positive party was about a project Thank You Kerry so we have five minutes for questions we have the first i wondering why rust I mean I love rust but I wonder why the project started years ago and I used C++ because why not and I'm sick of C++ and whenever when I saw rust rust solves the problems I have with real live C++ applications and this is my hobby project so I just thought to myself why not use trust so one year ago I just rewrote the application it was at that time ten thousand lines of code into rust and turns out it was way easier than I thought I actually got the line count down to eight thousand I have less bugs and rust really helps me to avoid a kind of Park CFS C++ code bases like iterate and validation data races it's way less painful to program rust in C++ I was wondering about obfuscated malware an in particular there are some TTP's that you can recognize so easily that you could potentially build semantic information for that yes of course alpha station is there to stop us there's only so much you can do that's why it's interesting to have a dynamic information so when you have something that unpacks itself at runtime you can do a snapshot and import it as when you have things like virtualized or wherever you have some is an interpreter in there what I can do is use the scripting engine to implement some kind of lifter for that intermediate language after you have disassembled it and then use because you only have to exit do to generate the intermediate language and then use all the code Alyce's features they're a bit in there to do coders and less directly on the obfuscated and virtualized small there but of course this is a problem it's there to stop us we can all do so much follow one question on that you have dick disassembly for all the language yet see but we're thinking to disassemblers to c++ or some other high-level language or compiled to c++ to write as well with c++ you have the advantage that you can try to pattern match certain parts of the C++ compiler to figure out how for example class hierarchies look like but right now it's C but only for now you can you know like assembly code listings of course you can even analyze Hasker it just looks a bit crazy so we have two more questions planned raise your hand if you want to ask more it's a simple question but what's logic behind the decompiling it decides of the scope of this little part when you the compiler a list of the assembly you the compiling c or c like mmm what's the logic behind you you you peek and a coat of cheese shattered by a list of assembly besides another so what I can do is what for example I the Provost in does is parametric you can of course the compiler turn certain constructs into certain assembly code listings and we try to try to recognize that and try it back and other ways is to just turn the code into C's or you turn it into some kind of C expressions and then you can turn something called expressions into SC expression the D compilation is just three process you only have to recover the control floor architecture of the control flow constructions in C so you can do this with pattern matching you see okay whenever I have a block that just have loop okay doesn't through charger loop what's more complicated is to recover the type information and tend to recover how the stack is used that can be done with assembly sorry interpretation and the type of information where you can do a type inference algorithm here SK of thrust and to for this to work you hasn't need typing information so you need to encode in disassembler that certain API calls of a certain type set and so I can use this when when the assembly code costs the faction you know okay the arguments mafia must have test types and you can try to push the information down to the assembly code so that's pretty much how the compilation works thank you more questions one question there too first question what was the reason not to use any of the existing disassembly libraries which would give you access to more proposals of families and the second question would be is is there an option for for example another type of syntax like 8080 syntax and I notice you use in the syntax for the x86 so currently we only let's start with a silly question the problem is that celebrates you have now totally on doggy feel really somatic information they asked capstone which can tell you at least which part with arguments are written it was read but I can't tell you what's the function between those two iron arguments and doing this is the most of the part of most of the work so I saw no much use in trying to wrap library because trying to lepsy wrap c libraries and having it compile all ously on most machines is very hard with rust so when you only have rust it's easier and okay we of course we can generate 18 te syntax there's we can put switch in the country we have inter hot code but that's not much of a problem what's up okay let's think okay
and there is a five-minute break please open the door so we can get some air in thank you
[Music]
Loading...
Feedback

Timings

  363 ms - page object

Version

AV-Portal 3.20.2 (36f6df173ce4850b467c9cb7af359cf1cdaed247)
hidden