We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

No source, no problem! High speed binary fuzzing

00:00

Formale Metadaten

Titel
No source, no problem! High speed binary fuzzing
Serientitel
Anzahl der Teile
254
Autor
Lizenz
CC-Namensnennung 4.0 International:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Modern grey-box fuzzers are the most effective way of finding bugs in complex code bases, and instrumentation is fundamental to their effectiveness. Existing instrumentation techniques either require source code (e.g., afl-gcc, ASan) or have a high runtime performance cost (roughly 10x slowdown for e.g., afl-qemu). We introduce Retrowrite, a binary rewriting framework that enables direct static instrumentation for both user-mode binaries and Linux kernel modules. Unlike dynamic translation and trampolining, rewriting code with Retrowrite does not introduce a performance penalty. We show the effectiveness of Retrowrite for fuzzing by implementing binary-only coverage tracking and ASan instrumentation passes. Our binary instrumentation achieves performance similar to compiler-based instrumentation. Fuzzing is the method of choice for finding security vulnerabilities in software due to its simplicity and scalability, but it struggles to find deep paths in complex programs, and only detects bugs when the target crashes. Instrumentation greatly helps with both issues by (i) collecting coverage feedback, which drives fuzzing deeper into the target, and (ii) crashing the target immediately when bugs are detected, which lets the fuzzer detect more bugs and produce more precise reports. One of the main difficulties of fuzzing closed-source software is that instrumenting compiled binaries comes at a huge performance cost. For example, simple coverage instrumentation through dynamic binary translation already incurs between 10x and 100x slowdown, which prevents the fuzzer from finding interesting inputs and bugs. In this talk we show how we used static binary rewriting for instrumentation: our approach has low overhead (comparable to compile-time instrumentation) but works on binaries. There are three main techniques to rewrite binaries: recompilation, trampoline insertion and reassembleable assembly. Recompilation is the most powerful but it requires expensive analysis and type recovery, which is an open/unsolved problem. Trampolines add a level of indirection and increase the size of the code, both of which have a negative impact on performance. Reassembleable assembly, the technique that we use, suffers from neither problem. In order to produce reassembleable assembly, we first disassemble the binary and then symbolize all code and data references (replacing offsets and references with references to unique labels). The output can then be fed to a standard assembler to produce a binary. Because symbolization replaces all references with labels, we can now insert instrumentation in the code and the assembler will fix the references for us when reassembling the binary. Symbolization is possible because references to code and global data always use RIP-relative addressing in the class of binaries that we support (position-independent x86_64 binaries). This makes it easy to distinguish between references and integer constants in the disassembly. We present Retrowrite, a framework for static binary rewriting that can add efficient instrumentation to compiled binaries and scales to real-world code. Retrowrite symbolizes x86_64 position-independent Linux binaries and emits reassembleable assembly, which can be fed to instrumentation passes. We implement a binary version of Address Sanitizer (ASan) that integrates with the source-based version. Retrowrite’s output can also be fed directly to AFL’s afl-gcc to produce a binary with coverage-tracking instrumentation. RetroWrite is openly available for everyone to use and we will demo it during the presentation. We also present kRetrowrite, which uses the same approach to instrument binary kernel modules for Linux. While many devices can be used with open-source drivers, some still require binary drivers. Device drivers are an inviting target for attackers because they run at the highest privilege level, and a buggy driver could result in full system compromise. kRetrowrite can instrument binary Linux modules to add kCov-based coverage tracking and KASan instrumentation.
Schlagwörter
47
72
Vorschaubild
1:02:13
82
Vorschaubild
1:02:15
99
144
157
162
175
179
187
246
253
Open SourceBinärcodeKryptologieSoftwareKonfiguration <Informatik>QuantencomputerDijkstra-AlgorithmusFuzzy-LogikFramework <Informatik>AdressraumComputersicherheitNotepad-ComputerInformatikt-TestTwitter <Softwareplattform>ResultanteProdukt <Mathematik>UnrundheitTermersetzungssystemDifferenteComputeranimationJSON
Open SourceBinärcodeNP-hartes ProblemBinärdatenÜbergangswahrscheinlichkeitEin-AusgabeSoftwaretestSpieltheorieCompilerCodeObjektverfolgungMittelwertQuellcodeTreiber <Programm>BildschirmmaskeBinärcodeSoftwareschwachstelleSoftwareDifferenteTwitter <Softwareplattform>Notepad-ComputerComputersicherheitMonster-GruppeSchreiben <Datenverarbeitung>CASE <Informatik>SystemplattformQuellencodierungProgrammMereologieHalbleiterspeicherProgrammfehlerCompilerFuzzy-LogikTaskGraphische BenutzeroberflächePhysikalisches SystemInformationFunktionentheorieFunktionalanalysisKeller <Informatik>Kartesische KoordinatenSoftwareentwicklerRechter WinkelResultanteAdressraumMultiplikationsoperatorNebenbedingungSpieltheorieTeilmengeGewicht <Ausgleichsrechnung>Overhead <Kommunikationstechnik>SummengleichungKernel <Informatik>Prozess <Informatik>ProgrammierumgebungMechanismus-Design-TheorieCodeAnalysisEinfache GenauigkeitVerkehrsinformationKategorie <Mathematik>Open SourceStatistische HypotheseKonditionszahlWeg <Topologie>SoftwaretestOptimierungsproblemEin-AusgabeSpeicherabzugRückkopplungSchnittmengeSystemzusammenbruchExploitHumanoider RoboterMessage-PassingGenerator <Informatik>Patch <Software>Mobiles InternetÜberlagerung <Mathematik>BitTransformation <Mathematik>Puffer <Netzplantechnik>GeradeDatenflussProtokoll <Datenverarbeitungssystem>VerschlingungRadikal <Mathematik>Treiber <Programm>TermersetzungssystemComputerunterstützte ÜbersetzungBasis <Mathematik>SchlüsselverwaltungAusnahmebehandlungChirurgie <Mathematik>Güte der AnpassungGlobale OptimierungSummierbarkeitTypentheoriePhysikalismusQuick-SortComputeranimation
Treiber <Programm>ÜbergangswahrscheinlichkeitBinärcodeBinärdatenLaufzeitfehlerAnalysisCodePhysikalisches SystemHydrostatikMittelwertSpeicheradresseOrtsoperatorStochastische AbhängigkeitRippen <Informatik>KonstantePi <Zahl>Genetischer AlgorithmusAssemblerSystemaufrufDifferenteBitSpeicheradresseBinärcodeEinfache GenauigkeitCodeSymboltabelleZeiger <Informatik>MereologieSystemzusammenbruchStrömungsrichtungOrtsoperatorMechanismus-Design-TheorieOpen SourceGeradeStochastische AbhängigkeitFunktionentheorieRechter WinkelFuzzy-LogikPersönliche IdentifikationsnummerAssemblerBildschirmmaskeFunktionalanalysisRückkopplungProgrammVariableProgrammfehlerRuhmasseSystemaufrufProgrammbibliothekSchlüsselverwaltungRandomisierungElektronische PublikationAbstraktionsebeneAdressraumCompilerKonstanteSkalarfeldKernel <Informatik>Keller <Informatik>Weg <Topologie>SpeicherabzugRelativitätstheoriePhysikalisches SystemCASE <Informatik>SoftwareschwachstelleNotebook-ComputerInformationBetrag <Mathematik>LastKontrollstrukturSoftwareTreiber <Programm>BlackboxKonstruktor <Informatik>InformationsspeicherungDynamisches SystemLoopDreiecksfreier GraphRippen <Informatik>AlgorithmusStandardabweichungLaufzeitfehlerNichtunterscheidbarkeitWechselsprungKategorie <Mathematik>FlächeninhaltPunktImplementierungProzess <Informatik>Zentrische StreckungÜbergangWald <Graphentheorie>BaumechanikCMM <Software Engineering>Computeranimation
Genetischer AlgorithmusBinärcodeAssemblerSystemaufrufSpeicheradresseRelativitätstheorieAssemblerSpeicheradresseFreewareFunktionalanalysisÜbergangAnalysisLastFuzzy-LogikDifferenteFitnessfunktionInformationRechter WinkelWeg <Topologie>Elektronische PublikationBinärcodeBildschirmmaskeWechselsprungSystemaufrufCodeComputeranimation
BinärdatenÜbergangswahrscheinlichkeitBinärcodeEin-AusgabeMittelwertROM <Informatik>LaufzeitfehlerGraphische BenutzeroberflächeAdressraump-BlockObjektverfolgungVerschlingungCodePhysikalisches SystemInterrupt <Informatik>SystemzusammenbruchThreadDeterministischer ProzessDatenparallelitätEinfache GenauigkeitVirtuelle MaschineEin-AusgabeComputersicherheitHalbleiterspeicherStabilitätstheorie <Logik>SystemzusammenbruchZeitzoneVerkehrsinformationAggregatzustandZeichenketteQuick-SortValiditätZufallsgeneratorProgrammfehlerMereologieAdressraump-BlockObjekt <Kategorie>GraphFuzzy-LogikProgrammGradientBildgebendes VerfahrenMetadatenPufferüberlaufAutorisierungMultiplikationsoperatorFramework <Informatik>Klasse <Mathematik>FunktionentheorieProgrammbibliothekSyntaktische AnalyseBitDeterministischer ProzessPuffer <Netzplantechnik>Kernel <Informatik>Einfache GenauigkeitDialektDatenflussSoftwaretestImplementierungRechter WinkelFolge <Mathematik>InformationÜberlagerung <Mathematik>CASE <Informatik>SoftwareSpeicheradresseDateiformatWeg <Topologie>KonfigurationsraumGraphfärbungSystemaufrufBinärcodeLaufzeitfehlerThreadDeterminanteProgrammierspracheSpeicherabzugLeckFlächeninhaltEinfügungsdämpfungPRINCE2PasswortSinusfunktionÜbergangswahrscheinlichkeitBildschirmmaskeFehlertoleranzOffice-PaketBildverstehenEntscheidungstheorieVerband <Mathematik>Bus <Informatik>CompilerCAN-BusPrinzip der gleichmäßigen BeschränktheitPlotterHoaxCoxeter-GruppeRadikal <Mathematik>Computeranimation
ÜbergangswahrscheinlichkeitModul <Datentyp>MittelwertBinärcodeCodeSystemaufrufp-BlockOpen SourceVerschlingungKernel <Informatik>ROM <Informatik>Explosion <Stochastik>StandardabweichungLaufzeitfehlerFuzzy-LogikTreiber <Programm>AdressraumBitJust-in-Time-CompilerOverhead <Kommunikationstechnik>Interrupt <Informatik>SoftwaretestGoogolOpen SourceHalbleiterspeicherStandardabweichungKernel <Informatik>Dynamisches SystemCASE <Informatik>MereologieBlackboxCompilerMultiplikationsoperatorSystemaufrufp-BlockFunktionalanalysisModulBenchmarkProzess <Informatik>Schreiben <Datenverarbeitung>CoprozessorIterationResultanteDateiverwaltungLaufzeitfehlerMetadatenBinärcodeKlasse <Mathematik>BimodulOrtsoperatorCodeRechter WinkelSystemzusammenbruchSymboltabelleDateiformatFramework <Informatik>Inverser LimesProgrammfehlerVerkehrsinformationTreiber <Programm>GrenzschichtablösungDatenparallelitätLeistungsbewertungVerschlingungFahne <Mathematik>Projektive EbeneFunktionentheorieHardwareURLBefehlsprozessorProgrammWeg <Topologie>Elektronische PublikationPhysikalisches SystemImplementierungARM <Computerarchitektur>BetriebsmittelverwaltungZeitzoneÄhnlichkeitsgeometrieDatenstrukturEinsBewertungstheorieGenetischer AlgorithmusExistenzsatzStrömungsrichtungPlastikkarteSchlüsselverwaltungOffice-PaketInformationsspeicherungEndliche ModelltheorieLesen <Datenverarbeitung>FreewareQuick-SortBildgebendes VerfahrenZahlenbereichBeobachtungsstudieSystem FInformationComputeranimation
BinärcodeOpen SourceÜbergangswahrscheinlichkeitSharewareGrenzschichtablösungAbstraktionsebeneMultiplikationsoperatorBefehlsprozessorOverhead <Kommunikationstechnik>ResultanteMereologieStatistische HypotheseLeistungsbewertungInteraktives FernsehenKontrast <Statistik>SystemaufrufBitBenchmarkDateiverwaltungKernel <Informatik>Physikalisches SystemCASE <Informatik>SoftwaretestIndexberechnungDiagrammComputeranimation
ModulHydrostatikNabel <Mathematik>Gebäude <Mathematik>MAPModul <Datentyp>Attributierte GrammatikVerzeichnisdienstFunktion <Mathematik>AdditionGarbentheorieDisassemblerRestklasseDateiformatClientInformationVersionsverwaltungZufallszahlenProtokoll <Datenverarbeitungssystem>ÜbergangswahrscheinlichkeitProzess <Informatik>GenerizitätEin-AusgabeSchnelltasteCompilerBefehlsprozessorDatenmodellTreiber <Programm>EreignishorizontInterface <Schaltung>SpeicherabzugSocketBroadcastingverfahrenSpielkonsoleRechnernetzRegulator <Mathematik>DatenbankLastGerichtete MengeFirmwareSpezialrechnerROM <Informatik>WärmeübergangszahlZugriffskontrolleSharewareBildschirmfensterTaskSpeicheradresseCachingWeb-SeiteSystemaufrufSystemzusammenbruchModulArithmetisches MittelShape <Informatik>URLInformationsspeicherungCoxeter-GruppeÄhnlichkeitsgeometrieZeitzoneHalbleiterspeicherSharewareSymboltabelleInformationLastEndliche ModelltheorieFunktionalanalysisSichtenkonzeptReelle ZahlRechter WinkelLesen <Datenverarbeitung>ResultanteGRASS <Programm>TopologieCASE <Informatik>Elektronische PublikationSoftwareschwachstelleInverser LimesPuffer <Netzplantechnik>AdditionPhysikalisches SystemZeichenketteSpeicherverwaltungLokales MinimumSpeicheradresseDruckertreiberBootenCodeProzess <Informatik>Gebundener ZustandDateiverwaltungDienst <Informatik>VerkehrsinformationProgrammfehlerCachingDisassemblerBetriebsmittelverwaltungValiditätKernel <Informatik>BimodulMetadatenDatenfeldSchreiben <Datenverarbeitung>Treiber <Programm>
FreewareSpeicheradresseWeb-SeiteROM <Informatik>InformationBefehlsprozessorHauptidealringStandardabweichungHardwareRippen <Informatik>CodeReelle ZahlTaskResultanteDateiverwaltungSchnittmengeReelle ZahlSoftwaretestDifferenteFreewareFuzzy-LogikBitInverser LimesBildschirmmaskeBinärcodeOrtsoperatorEndliche ModelltheorieTreiber <Programm>Framework <Informatik>Kernel <Informatik>GraphFokalpunktCodeComputersicherheitSoftwareentwicklerSechseckVerkehrsinformationKette <Mathematik>MereologieMultiplikationsoperatorRechter WinkelCASE <Informatik>Prozess <Informatik>SystemzusammenbruchBimodulProgrammfehlerModulWeg <Topologie>SpeicheradresseProgramm/Quellcode
BinärdatenÜbergangswahrscheinlichkeitBinärcodeROM <Informatik>OrtsoperatorStochastische AbhängigkeitCodeObjektverfolgungMittelwertCodeSpeicheradresseFunktionentheorieBinärcodeWeg <Topologie>Open SourceSystemzusammenbruchFuzzy-LogikMereologiePunktMultiplikationsoperatorZahlenbereichKernel <Informatik>Inverser LimesImplementierungBimodulSymboltabelleStochastische AbhängigkeitSharewareKartesische KoordinatenProgrammfehlerHalbleiterspeicherSoundverarbeitungRechter WinkelOrtsoperatorAdditionAdressraumTransformation <Mathematik>SechseckSchnittmengeSchlüsselverwaltungKontrollstrukturSchreiben <Datenverarbeitung>FokalpunktReelle ZahlComputeranimation
MittelwertObjektverfolgungÜbergangswahrscheinlichkeitROM <Informatik>OrtsoperatorStochastische AbhängigkeitCodeBinärdatenAdressraumCASE <Informatik>Rechter WinkelKernel <Informatik>QuellencodierungHumanoider RoboterTropfenOpen SourceDateiverwaltungQuellcodeAusnahmebehandlungSpeicherverwaltungBimodulAbschattungAdditionStatistische HypotheseAutomatische HandlungsplanungZeitzoneFinitismusMultiplikationsoperatorARM <Computerarchitektur>Mailing-ListeAbgeschlossene MengeDifferenteFunktionalanalysisBildschirmmaskeZahlenbereichSoftwareschwachstelleBildschirmfensterSystemplattformZweiGraphikprozessorSchreiben <Datenverarbeitung>Coxeter-GruppeLeistungsbewertungSchlussregelData DictionaryQuick-SortPhysikalisches SystemOrthogonalitätSchlüsselverwaltungBefehlsprozessorMessage-PassingHalbleiterspeicherInternetworkingElektronische PublikationGraphBenchmarkProgrammRechenschieberDateiformatElektronische UnterschriftPatch <Software>Reelle ZahlCachingSoundverarbeitungCompilerComputerarchitekturDruckertreiberSoftwareTreiber <Programm>BetriebsmittelverwaltungBinärcodePerspektiveBitVersionsverwaltungIntegralÜbergangswahrscheinlichkeitMathematikCodeSpeicheradresseRegulärer GraphDivisionGesetz <Physik>Schreib-Lese-KopfInverser LimesURLEndliche ModelltheorieRelativitätstheorieBitrateHochdruckOffice-PaketBestimmtheitsmaßGraphfärbungLASER <Mikrocomputer>Basis <Mathematik>ÄquivalenzklasseInformationFokalpunktComputeranimation
ObjektverfolgungMittelwertROM <Informatik>ÜbergangswahrscheinlichkeitCodeOrtsoperatorStochastische AbhängigkeitBinärdatenVariableCodeKernel <Informatik>ZeitzoneBimodulAssemblerPuffer <Netzplantechnik>LaufzeitfehlerMereologieAdressraumMaßerweiterungRepository <Informatik>DisassemblerCompilerGlobale OptimierungBinärcodeKontrollstrukturHalbleiterspeicherTypentheorieGewicht <Ausgleichsrechnung>DateiformatTransformation <Mathematik>RuhmasseMaschinenspracheInternetworkingBitMapping <Computergraphik>ÄquivalenzklasseOpen SourceDynamisches SystemBetriebsmittelverwaltungQuick-SortRegulärer GraphInverser LimesZahlenbereichSystemzusammenbruchGebäude <Mathematik>MultiplikationsoperatorNichtlinearer OperatorZeichenkettePatch <Software>MathematikFunktionalanalysisProgrammfehlerCASE <Informatik>Basis <Mathematik>FehlermeldungSpeicherverwaltungRechter WinkelVariationsrechnungWiederherstellung <Informatik>Objekt <Kategorie>InformationHeuristikUnrundheitSymboltabelleElektronische PublikationAnalysisTreiber <Programm>SystemidentifikationKeller <Informatik>AblaufverfolgungFamilie <Mathematik>SpieltheorieNabel <Mathematik>SichtenkonzeptVideokonferenzVirtuelle MaschineProdukt <Mathematik>VerkehrsinformationOffene MengeDifferenteGraphfärbungFlächeninhaltGraphAggregatzustandZweiDatenverarbeitungssystem
Computeranimation
Transkript: Englisch(automatisch erzeugt)
high-speed binary fuzzing. We have two researchers that will be presenting the product of the latest work which is a framework for a static binary rewriting. Our speakers are first one is a computer science master student at EPFL and the second one is a security researcher and assistant professor at EPFL. Please give a big round of applause to Nspace and
Ganymo. Thanks for the introduction. It's a pleasure to be here as always. We're gonna talk about different ways to speed up your fuzzing and to find different kinds of vulnerabilities or to tweak your binaries in somewhat
unintended ways. I'm Matthias Peyer or I go by Ganymo on Twitter and I'm an assistant professor at EPFL working on different forms of software security, fuzzing, sanitization but also different kinds of mitigations and Matteo over there is working on his master's thesis on different forms of
binary rewriting for the kernel and today we're gonna take you on a journey on how to actually develop very fast and very efficient binary rewriting mechanisms that allow you to do unintended modifications to the binaries and allow you to explore different kinds of unintended features
in binaries. So about this talk what we discovered or the reason why we set out on this journey was that fuzzing binaries is really really hard. There's very few tools in user space. It's extremely hard to set it up and it's extremely hard to set it up in a performant way. The setup is
complex, you have to compile different tools, you have to modify it and the results are not really that satisfactory. As soon as you move to the kernel, fuzzing binaries in a kernel is even harder. There's no tooling whatsoever. There's very few users actually working with binary code in a kernel or modifying binary code and it's just a nightmare to work with. So what we are
presenting today is a new approach that allows you to instrument any form of binary code or modern binary code based on static rewriting which gives you full native performance. You only pay for the instrumentation that you add and you can do very heavy weight transformations on top of it. The picture
if you look at a modern system, let's say we are looking at a modern setup, let's say you're looking at CAD pictures in your in your browser, Chrome plus the kernel plus the libc plus the graphical user interface to get a clock up at a about a hundred million lines of code. Instrumenting all of this for some form of security analysis is a nightmare. Especially along
this large stack of software there's quite a bit of different compilers involved, there's different linkers, it may be compiled on a different system with different settings and so on and then getting your instrumentation across all of this is pretty much impossible and extremely hard to work with. And we want to enable you to select those different parts that you're
actually interested in, modify those and then focus your fuzzing or analysis approaches on those small subsets of the code giving you a much better and stronger capability to test the systems that you are or those parts of the system that you're really really interested in. Who's worked on fuzzing
before? Quick show of hands. Wow that's a bunch of you. Do you use AFL? Yeah most of you are AFL. Libfuzzer? Cool about 10-15 percent libfuzzer, 30 percent fuzzing and AFL. So there's a quite good knowledge of fuzzing so I'm not going to
spend too much time on fuzzing but for those that haven't really run their fuzzing campaigns yet it's a very simple software testing technique. You're running this in some form of execution environment and fuzzing then consists of some form of input generation that creates new test cases throws them at
your at your program and sees and checks what is happening with your program and either everything is okay and your code is being executed and your input the program terminates everything is fine or you have a bug report. If you have a bug report you can use this find the vulnerability
maybe develop a POC and then come up with some form of either exploit or patch or anything else right. So this is pretty much fuzzing in a in a nutshell. How do you get fuzzing to be effective? How can you cover large source bases, complex code and complex environment? Well there's a couple of simple steps that you can take and let's let's walk quickly through the
effective fuzzing 101. Well first you want to be able to create test cases that actually trigger bugs and this is a very very complicated complicated part and we need to have some notion of the inputs that a program accepts and
we need to have some notion of how we can explore different parts of the program right different parts of functionality. Well on one hand we could have a developer write all the test cases by hand but this would be kind of boring and it would also require a lot of human effort in creating these these different inputs and so on. So coverage guided fuzzing has evolved as
a very simple way to guide the fuzzing process leveraging the information on which parts of the code have been executed by simply tracing the individual pass through the program based on the on the on the execution flow. So we can the fuzzer can use this feedback to then modify
the inputs that are being thrown at the fuzzing process. The second step is the fuzzer must be able to detect bugs. If you've ever looked at a memory corruption if you're just writing one byte after the end of a buffer it's highly likely that your your software is not going to crash but it's still a bug it it may still be exploitable based on the underlying condition
conditions. So we want to be able to detect violations as soon as they happen for example based on on some form of sanitization that we add some form of instrumentation that we add to the to the to the binary that then tells us hey there's a violation of the memory safety property and we terminate the application right away as a feedback to the fuzzer. Third but the and last but
not least speed is key right for if you're running a fuzzing campaign you have a fixed resource budget you have a couple of cores and you want to run for 24 hours 48 hours a couple of days but in any way whatever your constraints are you have a fixed amount of instructions that you can actually
execute and you have to decide am I spending my instructions on generating new inputs tracking constraints finding bugs running sanitization or executing the program and you need to find a balance between all of them as it is a zero-sum game you have a fixed amount of resources and you're trying to make the the best with these resources so any overhead is slowing
you slowing you down and in the end this becomes an optimization problem how can you most effectively use the resources that you have available. As we are fuzzing with source code it's quite easy to actually leverage existing mechanisms and we add all that instrumentation at compile time we
take source code we pipe it through the compiler and modern compiler platforms allow you to instrument and add little code snippets during the compilation process that then carry out all these tasks that are useful for fuzzing so for example modern compilers can add short snippets of code for coverage tracking that will record which parts of the code that you have
executed or for sanitization which record and check every single memory access if it is safe or not and then when you're running the instrument the binary everything is fine and you can detect the the policy violations as you go along now if you would have source code for everything this would be amazing but it's often not the case right we we may be able on on Linux
to cover a large part of the protocol stack by focusing only on source code based approaches but there may be applications where no source code is available if we move to to Android or other mobile systems there's many drivers that are not available as open source or just available as binary
blobs or the full software stack maybe maybe closed source and we only get the binaries and we still want to find vulnerabilities in these complex software stacks that span hundreds of millions of lines of code and a very efficient way the only solution to cover this part of massive code base is to
actually rewrite and focus on binaries a very simple approach could be black box fuzzing but this is this doesn't really get you anywhere because you don't get any feedback you don't get any information if you're triggering bugs so one simple approach and this is the approach that is most dominantly used today is to rewrite the program or the binary dynamically you're taking the
binary and during execution you use some form of dynamic binary instrumentation based on pin anger or some other binary rewriting tool and translate the targeted runtime adding this binary instrumentation on top of it as you're executing it it's simple it's straightforward but it comes at a
terrible performance cost of 10 to 100 X slowdown which is not really effective and you're you're spending all your course in your cycles on just executing the binary instrumentation so we don't really want to do this and we want to have something that's more effective than that so what we are
focusing on is to do static rewriting it involves a much more complex analysis as we are rewriting the binary before it is being executed and we have to recover all of the control flow all of the all of the different mechanisms but it results in a much better performance and we can get more bang for
our buck so why is static rewriting so challenging well first simply adding code will break the target so if you are if you're disassembling this piece of code here which is a simple loop that loads data decrements the registers and then jumps if you're not at the end of the of the array and
keeps iterating through this area now as you look at the jump not zero instruction the last instruction of the of the snippet it is a relative offset so it jumps backwards seven bytes which is nice if you if you just execute the code as is but as soon as you want to insert new code you
change the offsets in the program and you're modifying all these different different offsets and simply adding new code somewhere in between will break the target so a core feature that we need to enforce a core property that we need to force is that we must find all the references and properly adjust them both relative offsets and absolute offsets as well
getting a single one wrong will break everything what makes this problem really really hard is that if you're looking at the binary a byte is a there's no way for us to distinguish between scalars and references and in fact they are indistinguishable getting a single reference wrong breaks
the target and would would introduce arbitrary crashes so we have to come up with ways that allow us to distinguish between the two so for example if you have this this code here it it takes a value and stores it somewhere on the on the stack this could come from two different different
kind of high-level constructs on one hand it could be taking the address of a function and storing this function address somewhere in a stack variable or it could be just storing a scalar in a stack variable and these two are indistinguishable and rewriting them as soon as we add new code the offsets
will change if it is a function we would have to modify the value if it is a scalar we have to keep the same value so how can we come up with a way that allows us to distinguish between the two and rewrite binaries by recovering this missing information so let us take let me take you or let us
take you on a journey towards instrumenting binaries in the kernel this is what we what we aim for we'll start with the simple case of instrumenting binaries in user land talk about different kinds of coverage guided fuzzing and what kind of instrumentation we can add what kind of
sanitization we can add and then focusing on taking it all together and applying it to kernel binaries to see what what will fall out of it let's start with instrumenting binaries first I will now talk a little bit about retro right our mechanism in our tool that enables static binary
instrumentation by symbolizing existing binaries so we recover the information and we translate relative offsets and absolute offsets into actual labels that are added to the to the assembly file the instrumentation can then work on the recovered assembly file which can then be
reassembled into a binary that it can then be executed for for fuzzing we implement coverage tracking and binary address sanitizer on top of this leveraging the abstraction as we go forward the key to enabling this kind of binary rewriting is position independent code and position independent code has
become the de facto standard for any code that is being executed on a modern system and it effectively says that it is code that can be loaded at any arbitrary address in your address space as you're executing binaries it is essential and a requirement if you want to have address based layout randomization or if you want to use shared libraries which de facto you want
to use in all these different systems so since a couple of years all the code that you're executing on your phones on your desktops on your laptops is position independent code and the idea between the position independent code is like you can load it anywhere in your address space and you can therefore not use any hard coded static addresses and you have to
inform the system with relocations or pick relative addresses to on how the system can relocate these different mechanisms on x86 64 position independent code leverages addressing that is relative to the instruction pointer so for example it uses the instruct the current instruction pointer
then a relative offset to that instruction pointer to reference global variables other functions and so on and this is a very easy way for us to distinguish references from constants especially in PI binaries if it is rip relative it is a reference everything else is a constant and we can build our
translation algorithm and our translation mechanism based on this this fundamental finding to remove any form of heuristic that is needed by focusing especially on position independent code so we're supporting position independent code we are we don't support non position independent code but we give you the guarantee that we can rewrite all the
different code that is out there so symbolization works as follows if you have the little bit of code on the lower right symbolization replaces first all the references with assembler labels so look at the call instruction and the jump not zero instruction the call instruction references an absolute address and the jump not zero instruction jumps
backward relative 15 bytes so by focusing on these relative jumps and calls we can replace them as actual labels and rewrite the binary as follows so you're calling a function replacing it with the actual label and for the jump not zero we are inserting an actual label in the assembly code and adding a backward reference for PC relative addresses for
example the data load we can then replace it with the name of the actual data that we have recovered and we can then add all the different relocations and use that as auxiliary information on top of it after these three steps we can insert any new code in between and can
therefore add different forms of instrumentations or run some more higher level analysis on top of it and then reassemble the file for fuzzing or coverage guided tracking address sanitization or whatever else you want to go want to do I will now hand over to Matteo who will
cover coverage guided fuzzing and sanitization and then instrumenting the binaries in the car go ahead so now that we have this really nice framework to rewrite binaries one of the things that we want to add to actually get to fuzzing is this covers tracking instrumentation so covers guided fuzzing is a way a method for to let the fuzzer
discover interesting inputs and interesting paths to the target by itself so the basic idea is that the fuzzer will track the parts of the programs that are covered by different inputs by inserting some kind of instrumentation so for example here we have this target program that checks
if the input contains the string PNG at the beginning and if it does then it does something interesting otherwise it just bails out and fails so if we tracks the part of the programs that each input executes the fuzzer can figure out that an input that contains P will have discovered a different path
through the program than input that doesn't contain it and then so on it can one byte at a time discover that this program expects this magic sequence PNG at the start of the input so the way that the fuzzer does this is that it every time a new input discovers a new path through the
target it is considered interesting and added to a corpus of interesting inputs and every time the fuzzer needs to generate a new input it will select on something from the corpus mutated randomly and then use it as a new input so this is like this is like conceptually pretty simple but in practices work really
well and really lets the fuzzer discover the format that the target expects an unsupervised way so as an example this is an experiment that was run by the author of AFL AFL is the fuzzer that sort of popularized this technique where he was fuzzing JPEG parsing library starting from a corpus
that only contain the string hello so now clearly hello is not a valid JPEG image and so but still like the fuzzer was still able to find to discover the correct format so after a while it started generating some grayscale images on the top left and as it generated more and more inputs it started generating more interesting images such as some grayscale
gradients and later on even some color images so as you can see this this really works and it allows us to fuzz a program without really teaching the fuzzer how the input should look like so that's it for coverage guiding fuzzing now we'll talk a bit about sanitizations as a reminder the core idea behind sanitization is that just looking for crashes is likely to miss some of
the bugs so for example if you have this out of buns 1x3 that will probably not crash the target but it would still like to catch it because it could be used for an info leak for example so one of the most popular sanitizers is address sanitizer so address sanitizer will instrument all the memory accesses in your program and check for memory corruption which
is so memory corruption is a pretty dangerous class of bugs and unfortunately still playing C and C++ programs on unsafe languages in general and ASEN tries to catch it by instrumenting the target it is very popular it has been used to find thousands of bugs in complex software
like Chrome and Linux and even though it has like a bit of a slowdown like about 2x it is still really popular because it lets you find many many more bugs so how does it work the basic idea is that ASEN will insert some special regions of memory called red zones around every object in memory
so we have a small example here where we declare a four byte array on the stack so ASEN will allocate the array buff and then add a red zone before it and a red zone after it whenever the program accesses the red zones it is terminated with a security violation so the instrumentation just prints a bug
report and then crashes the target this is very useful for detecting for example buffer overflows or under flows or even other kinds of bias let's just use of the free and so on so as an example here we're trying to copy 5 bytes into a 4 byte buffer and ASEN will check each of the accesses one by
one and when it sees that the last byte writes a red zone it detects the violation and crashes the program so this is good for us because this bug might have not been found by simply looking for crashes but it's definitely found if you use ASEN so this is something we want for fuzzing so now we'll cover briefly cover ASEN we can talk about instrumenting binary in the
kernel so Matias left us with retro right and with retro right we can add both coverage tracking and ASEN to binaries so the simple it's a really simple idea now that we can rewrite this binary and add instructions wherever we want we can implement both coverage tracking and ASEN in order to
implement coverage tracking we simply have to identify the start of every basic block and the lethal piece of instrumentation at the start of the basic block that tells the fuzzer hey we've reached this part of the program hey we've reached this other part of the program and then the fuzzer can
figure out whether that's a new part or not ASEN is also like you know it's also somewhat it can also be implemented in this way by finding all memory accesses and then linking with libASEN. libASEN is a sort of runtime for ASEN that takes care of inserting the red zones and instrument adding you know like keeping around all the metadata that ASEN needs to know
where the red zones are and detecting whether a memory access is invalid so how can we apply all of this in the kernel well first of all fuzzing the kernel is not as easy as fuzzing some user space program there's some issues here so first of all this crash handling so whenever you're fuzzing a user space
program you expect crashes well because that's what we're after and if I use a space program crashes then the US simply terminates gracefully and so the fuzzer can detect this and save the input as a crashing input and so on and this this is all fine but when you're fuzzing the kernel so if you were fuzzing the kernel of the machine that you were using for fuzzing after a
while the machine would just go down because after all the kernel runs the machine and if it starts misbehaving then all of it can go wrong and more importantly you can lose your crashes because if the machine crashes then the state of the fuzzer is lost and you have no idea what your crashing input was so what most kernel fuzzers have to do is that they resort to some
kind of VM to keep the system stable so they fuzz the kernel in a VM and then run the fuzzing agent outside of VM on top of that is tooling so if you want to fuzz a user space program you can just download AFL or use libFuzzer there's plenty of tutorials online it's really easy to set up and just like compile your program you start fuzzing and you're good to go if you
want to fuzz the kernel it's already much more complicated so for example if you want to fuzz Linux with syscaller which is a popular kernel fuzzer you have to compile the kernel you have to use a special config that supports the scholar you have you have way less guys available than for you user space fuzzing and in general it is much more complex and less intuitive than
just fuzzing user space and lastly we have the issue of determinism so in general if you have a single threaded user space program unless he uses some kind of random number generator it is more or less deterministic there's nothing that affects the execution of the program but and this is really nice if you want to try to reproduce a test case because if you if you have
a non-deterministic test case then it's really hard to know whether this is really a crash or if it's just something that you should ignore and in the kernel this has been harder because you don't only have concurrency like so multi processing you also have interrupts so interrupts can happen at any time and if one time you got an interrupt well executing a test case
and the second time you didn't then maybe it only crashes one time you don't really know it's it's not pretty and so again we have several approaches to fuzzing binaries in the kernel first first one is to do black box fuzzing we don't really like this because it doesn't find much
especially something complex like a kernel approach one is to use dynamic translation so use something like qmu or or you name it this works and people have used it successfully the problem is that it is really really really slow like we're talking about 10x plus overhead and as we said before
the more iteration the more test cases you can execute in the same amount of time the better because you find more bugs and on top of that there's no currently available sanitizer for kernel for kernel binaries that works is based on this approach so in user space you have something like
volgrind in the kernel you don't have anything at least that we know of there's another approach which is to use Intel processor trace this has been like there's been some research papers on this recently and this is nice because it allows you to collect coverage at nearly zero overhead it's like really fast but the problem is that it requires hardware support so it requires a fairly new x86 CPU and if you want to find something on arm say
like your android driver or if you want to use an older CPU then you're out of luck and what's worse yeah you cannot really use it for sanitization or at least not the kind of sanitization that asin does because it just trace to the execution doesn't allow you to do checks on memory
accesses so approach Lee which is what we will use a static rewrite so we had this very nice framework for rewriting user space binaries and then we asked ourselves can we make this work in the kernel so we took the we took the system the original writer right we modified it we implemented support for Linux modules and it works so we have implemented and it we
have used it to fuzz the to fuzz some kernel modules and it really shows that this approach doesn't only work for user space it can also be applied to the kernel so as for some implementation the nice thing about kernel modules is that they're always positioned independent so you cannot have position
like fixed position kernel modules because Linux just doesn't allow it so we sort of get that for free which is nice and Linux modules are also a special class of elf files which means that the format is even though it's not the same as user space binaries it's still somewhat similar so we didn't have to change the symbolizer that much which is also
nice and we implemented a symbolization with this and we used it to implement both code coverage and binary ASIN for kernel binary modules so for coverage the idea behind the whole retro right project was that we wanted to integrate with existing tools so existing fuzzy we didn't want to force
our users to write their own fuzzer that is compatible with retro right so for in user space we had AFL style coverage tracking and binary ASIN which is compatible with source space ASIN and we wanted to follow the same principle in the kernel so it turns out that Linux has this built-in
coverage tracking framework called k-cov that is used by several popular kernel fuzzers like syscaller and we wanted to reuse it ourselves so we designed our coverage instrumentation so that it integrates with a k-cov the downside is that you need to compile the kernel with k-cov but then again Linux is open source so you can sort of always do that the kernel it's
usually not the kernel it is a binary blob but it's usually only the modules so that's still fine and the way you do this is the way you implement k-cov for binary modules is that you just have to find the start of every basic block and add a call to some function that then stores the collective coverage so here's an example we have a short snippet of code with three basic
blocks and all we have to do is add a call to trace PC to the start of the basic block trace PC is a function that is part of the main kernel image that then collects this coverage and makes it available to a user space fudging agent so this is all really easy and it works and let's not see how we
implemented binary ASIN so as I mentioned before when we instrument the program with binary ASIN user space we link with libASIN which take care of setting up the metadata takes care of putting the resins around around locations and so on so we have to do something similar in the kernel of course you cannot link with libASIN in the kernel because that doesn't work but what we can do instead is again compile the kernel with
KASIN support so this instruments the allocator to add red zones it allocates space for the metadata it keeps this metadata around does this all for us which is really nice and again the big advantage of using this approach is that we can integrate seamlessly with a KASIN instrumented
kernel and with fuzzers that rely on KASIN such as syscaller so we see this as more of a plus than like a limitation and how do you implement ASIN well you have to find every memory access and instrument it to check the to check whether this is discussing a red zone and if it doesn't
then you just call this bug report function that produces a struct trace a bug report and crashes the kernel so that the fuzzer can detect it again this is compatible with source based case and so we're happy we can simply load a rewritten module with added instrumentation into a kernel as long as you have compiled the kernel with the right flags and we can use a
standard kernel fuzzer here for the R evaluation we use syscaller popular kernel fuzzer by by some folks at Google and works really well so we're finally reached the end of our journey and now we wanted to present some experiments with it to see if this really works so for user space we
wanted to compare the performance of our binary ASIN with source based ASIN and with existing solutions that also work on binaries so for user space you can use a vulgar in memcheck it's a memory sanitizer that is based on binary translation and dynamic binary translation and works in binaries we
compared it with source ASIN and retro reason on the spec CPU benchmark and saw how fast it was and for the kernel we decided to find some file systems and some drivers with syscaller using both source space and so space case and a kickoff and carer to write base case and a kickoff so these are
our results for user space so the red bar is volt grind we can see that the execution time of all grind is the highest it is really really slow like three then 30 X overhead way too slow for fuzzing then in green we have our binary ASIN which is like already a large improvement in orange we
have source base ASIN and then finally blue we have the original code without any instrumentation whatsoever so we can see that source base ASIN has some like 2x or 3x overhead and binary ASIN is a bit higher like a bit less efficient but still somewhat close so that's it for user space and for
the kernel we these are some preliminary results so this is like I'm doing this work as part of a master thesis and so I'm still like running the evaluation here we can see that the overhead is already like a bit lower so the reason for this is that spec is a pure CPU benchmark doesn't interact with the system that much and so any instrumentation that you add is gonna
massively slow down or like considerably slow down the execution by contrast when you fuzz a file system with syscaller not only every test case has to go from the host to the guest and then do multiple syscalls and so on but also every system call has to go through several layers of
abstraction before it gets to the actual file system and all these like all of this takes a lot of time and so in practice the overhead of our instrumentation seems to be pretty reasonable so since we know that you like demos we prepared a small demo of k-raterite so let's see yeah all
right so we prepared a small kernel module and this module is like really
simple contains a vulnerability and what it does is that it creates a character device so if you're not familiar with this a character device is a like a fake file that is exposed by a kernel driver and they can read to and write from and instead of going to a file the data that you read that you in
this case write to the fake file goes to the driver and is handled by this demo write function so as we can see this function allocates a buffer a 16 byte buffer on the heap and then copies the data into it and then it checks if the data constrains the string 1 3 3 7 if it does then it accesses the buffer out of bounds you can see a lock 16 and the buffer is
16 byte this is a out of bounds read by 1 byte and if it doesn't then it just accesses the buffer in bounds which is fine and it's it's not the vulnerability so we can compile this this driver okay okay and then so we
have our module and then we will instrument it using k-retro write so instrument yes please okay so k-retro write did some processing and it
produced an instrumented module with asan or k-san and a symbolized assembly file we can actually have a look at the symbolized assembly file to see what it looks like yes yes okay so it's big enough yeah as you can see so we can we can actually see here the asin instrumentation ah shouldn't yeah
so we this is the asin instrumentation the original code loads some data from this address and as you can see the asin instrumentation first computes the actual address and then does some checking basically this is checking some metadata that asin stores to check
if the address is in a red zone or not and then if the field check fails then it calls this asin report which produces a stack trace and crashes the kernel so this is fine we can actually even look at the disassembly of both modules so object and then okay so on the left we have the original module
without any instrumentation on the right we have the module instrumented
with asin so as you can see the original module has push r13 and then has this memory load here on the right in the instrumented module red k writer right inserted the asin instrumentation so the original load is still down here but between that between the first instruction and this
instruction we have now have the case and instrumentation that those are checked so this is all fine now we can actually test it and see what it does so we can we can but we will boot us very simple a very minimal Linux system and try to trigger the vulnerability first with the non
instrumented module and then with the instrumented module and we can we will see that in the in with the non instrumented module the kernel will not crash but with the instrumented module it will crash and produce a bug report so let's see yeah this is a kme VM I have no idea why it's taking so
long to boot I'll blame the demo gods not being kind to us yeah I guess we just have to wait okay all right so we loaded the module we will see that it
has created a fig file character device in dev demo yep we can write to this file yep so this will this access the array in bounds and so this is fine then what we can also do is write 1 3 3 7 to it so it will access the
array want of Autobots so that this is the non instrumented module so this will not crash it will just bring some garbage value okay that's it now we can load the instrumented module instead and do the same experiment again all
right we can see that that demo is still here so the module still works let's try to write 1 2 3 4 into it this again doesn't crash but when we try to run 1 with 3 3 7 this will produce a bug report so this has quite a
lot of information we can see like where the memory was allocated as a stack trace for that it wasn't freed so there's no stack trace for the free and we we see that this the cache size of the of the of the memory like
it was a 16-byte allocation we can see the shape of the memory we see that these two zeros means that there's two 8-byte chunks of valid memory and then these FCFCFC is the are the red zones that I was talking about before all right so that's it for the demo we will switch back to our presentation
now so hope you enjoyed it so after applying this to demo module we also wanted to see what happens if you apply this to a real file system after a couple of hours we were when we came back and checked on the results we saw a couple of issues popping up including a nice set of use after free reads a set
of use after free writes and we check the bug reports we saw a whole bunch of Linux kernel issues popping up one after the other in this nondescript module that we first we're in the process of reporting it this will take
some time until it is fixed that's why you see the blurry lines but as you see there's still quite a bit of opportunity in the Linux kernel where you can apply different forms of targeted fuzzing into different modules leverage
these modules on top of the case and instrument eight instrumented kernel and then leveraging this as part of your your fuzzing toolchain to find interesting kernel odays that yeah you can then develop further or report or do whatever you want with them now we've shown you how you can take existing
binary only modules think different binary only drivers or even existing modules where you don't want to instrument the full set of the Linux kernel but only focus fuzzing and exploration a small different small limited
piece of code and then do security tests in those we've shown you how you can do coverage based tracking and address sanitization but this is also up to you on what kind of other instrumentation you want like this is just a tool a framework that allows you to do arbitrary forms of instrumentation so we've taken you on a journey from instrumenting binaries
over coverage guided fuzzing and sanitization to instrumenting modules in the kernel and then finding crashes in the kernel let me wrap up the talk so this is one of the the fun pieces of work that we do in the in the hex hive
lab at EPFL so if you're looking for postdoc opportunities or if you're thinking about a PhD come talk to us we're always hiring the tools will be released as open source a large chunk of the user space work is already open source we are working on a set of additional demos and so on so you can get started faster leveraging the different existing instrumentation is
already out there the user space work is already available the kernel work will be available in a couple of weeks this allows you to instrument real-world binaries for fuzzing leveraging existing transformations for coverage tracking to enable fast and effective fuzzing and memory checking to
detect the actual bugs that exist there the the key takeaway from this talk is that retro right and K retro right enables static binary rewriting at zero instrumentation cost we take the limitation of focusing only on position independent code which is not a real implementation but we get the advantage
of being able to symbolize without actually relying on heuristic so we can even symbolize large complex source large complex applications and effectively rewrite those aspects and then you can focus fuzzing on these
these parts another point I want to mention is that this enables you to reuse existing tooling so you can take a binary blob instrumented and then reuse for example address sanitizer or existing fuzzing tools as it integrates really really nice as I said all the code is open source check it out try it let us know if it if it breaks we're happy to fix you're
committed to open source and let us know if there are any questions thank you so thanks guys for an interesting talk we have some time for questions so we
have microphones along the aisles we'll start from question from microphone number two hi thanks for your talk and for the demo I'm not sure about the use case you showed for the kernel retro right because you are usually
interested in fuzzing binary in kernel space when you don't have source code for the kernel for example for IOT or Android and so on but you just reuse the kcov and kasan in the kernel and you never have the
kernel in IOT or Android which is compiled with that so are you do you have any plans to binary instrument the kernel itself not the modules so so we we thought about that the I think that there's a there's some additional problems that we would have to solve in order to be able to
instrument the full kernel so other than the fact that it gives us compatibility with with like existing tools the reason why we decided to go with a compiling the kernel with kasan and kcov is that building the like you would you have to like think about it you have to instrument the memory allocator to add red zones which is like already somewhat complex
you have to instrument the exception handlers to catch like any faults that the instrumentation detects you would have to like set up some memory for the for the ace and shadow so this is like I think you should be able to do it but it will require a lot of additional work so this is like
this was like four months thesis so we decided to start small and prove that it works in the kernel for modules and then leave it to future work to actually extend it to the full kernel also like I think for Android so so in the case of Linux the kernel is GPL right so if the
manufacturer ships a custom kernel they have to release the source code right they never do they know well that's a different issue right so that's why I ask because I don't see how it is can be used in the real world now let me try to put this into perspective a little bit as well right so there's what we did so far is we leveraged existing tools like case and or
kickoff and integrate into these existing tools now doing heat based allocation is fairly simple and replacing those as additional red zones that instrumentation you can carry out for Lee for leave out by focusing on the different allocators second to that simply oopsing the kernel and
printing the stack trace is also fairly straightforward so it's not a lot of additional effort so it is it involved some engineering effort to port this to to none case and compile kernels but we think it is it is very feasible in the in the interest of time we focused on case and enable
kernels so that some some form of a Zen is already enabled but yeah this is additional engineering effort but there's also a community out there that can help us with these kind of changes so the key key retro right and retro right themselves are the binary rewriting platform that allow you to turn
a binary into an assembly file that you can then instrument and run different passes on top of it so another pass would be a full a Zen pass or case and pass that somebody could add and then contribute back to the community it would be really useful thanks cool next question from the
internet yes there is a question regarding the slide on the spec CPU benchmark the second or third graph from the right had an instrumented version that was faster than the original program why is that cache effect thank you microphone number one thank you thank you for presentation I
have a question how many architecture do support and if you have support more than 664 okay so no plans for arm or meat so there are plans okay again
there's a finite amount of time we focus on the technology arm is high up on the list if somebody's interested on working on it and contributing we're happy to hear from it our list of targets is armed first and then maybe something else but I think was x86 64 and arm we cover the majority of the
interesting platforms and second question did you try to fast any real closed source program because as I understand from presentation you fast like just file system what we can compile and fast we see scholar in
like in a in a bust so for the evaluation we wanted to be able to compare between the source base instrumentation and the binary base instrumentation so we focus mostly on open source file system and drivers because then we could instrument them with a compiler we haven't yet tried but
this is like also pretty high up on the list we wanted to try to find some close source drivers there's lots of them like for GPUs or anything and when give it a try and find some more days perhaps yes but we see scholar you still have a problem you have to write rules or like dictionaries I mean you
have to understand the format how to communicate with the driver yeah but there's for example closed source file systems that we are looking at okay thank you number two hi thank you for a talk so I don't know if there are any kcob or a case an equivalent solution to Windows but I was
wondering if you tried or are you planning to do it on Windows the framer because I know it might be challenging because of the driver signature enforcement and patch guard but I wondered if you tried or I thought about it yes we thought about it and we decided against it Windows is
incredibly hard and we are academics the research I do in my lab or we do in my in my research lab focuses on predominantly open source software and empowers open source software doing full support for Microsoft Windows is somewhat out of scope if somebody wants support these tools we are happy
to hear and work with these people but it's it's a lot of additional engineering effort with very additional very low additional research value so we'll have to find some form of compromise and like if you would be willing to fund us we would go ahead but it's it's it's yeah it's a cost question and you're referring both to kernel and user space right yeah
okay thank you number five this seems more interesting if you're looking for vulnerabilities in closed source kernel modules but not giving you too much thought it seems it's really trivial to prevent this if you're writing your closed source module well how would you prevent this well
for starters you would just take a difference between the address of two functions that's not gonna be IP relative so right so we explicitly like in your even in the original retro right paper we explicitly decided to not try to deal
with obfuscated code or code that is purposefully trying to defeat this kind of rewriting because like the assumption is that first of all there are techniques to like the obfuscate code or remove these like checks in some way but this is like sort of orthogonal work and at the same time I guess most drivers are not really compiled with this sort of
obfuscation it just like they just you know the compile with a regular compiler but yeah of course this is like a limitation they're likely stripped but not necessarily obfuscated at least from what we've seen when we looked at binary only drivers microphone number two how do you decide where to play
the red zone so from what I heard you talked about the instrumenting the allocators but well there are a lot of variables on the stack so how do you deal with those oh yeah that's actually super cool I refer to some extent to the paper that is on the on the github repo as well if you
think about it modern compilers use canaries for for buffers are you aware of stack canaries and how stack canaries work so stack canaries like if the compiler sees there's a buffer that may be overflown it places a stack canary between the buffer and any other data what we use is we and as part of our analysis tool we find these stack canaries remove the code that does the
stack canary and use this space to place our red zones so we actually hacked the stack canaries remove that code and add at a Zen red zones into the empty stack canaries that are now there it's actually a super cool optimization because we piggyback on what what kind of work the compiler already did for us before and we can then leverage that to to gain additional
benefits and protect the stack as well thanks another question from the internet yes did you consider lifting the binary code to LLVM IR instead of
generating assembler source yes but so a little bit longer answer yes we did consider that yes it would be super nice to live to to LLVM IR we've actually looked into this it's incredibly hard it's incredibly complex there's no direct mapping between the the machine code equivalent and the LLVM
IR you would still need to recover all the types so it's like this magic dream that you recover full LLVM IR and then do heavy weight transformations on top of it but this this is incredibly hard because if you compile down from LLVM IR to a machine code you lose a massive amount of
information we would have to find a way to recover all of that information which is which is pretty much impossible and undecidable for many cases so for example just as a as a as a note we only recover control flow and we only do symbolize control flow for data references we don't support
instrumentation of data references yet because there's still an undecided undecidable problem that we are facing with I can talk more about this offline or there there's a note in the paper as well so this is just a small problem only if you're lifting to assembly files if you're lifting to LLVM IR you would have to do full end-to-end type recovery which is
massively more complicated yes it will be super nice unfortunately it is undecidable and really really hard so you can come up with some heuristics but there is no solution that will do this in that will be correct 100% at the time. I'll take one more question from microphone number six. Thank you for your talk. What kind of disassemblers did you use
for retro right and did you have problems with a wrong disassembly and if so how did you handle it? So retro right so we used capstone for the disassembly. An amazing tool by the way. Yeah so so the idea is that like we
need some kind of some informations about where the functions are so for the kernel modules this is actually fine because kernel modules come with this sort of information because the kernel needs it to build stack traces for example for easy space binaries this is somewhat less common but you can use
another tool to try to do function identification and we do like a sort of like disassemble the entire function so we have run into some issues with like in AT&T syntax because like we wanted to use a gas news assembler for
for reassembly yeah and some instructions are like you can express the same like two different instructions like five byte knob and six byte knob using the same string of like a text and mnemonic and operant string but the problem is that like the kernel doesn't like it and crashes
this took me like two days to be back so the kernel uses dynamic binary patching yeah when it runs at runtime and it uses fixed offsets so if you replace a five byte knob with a six byte knob or vice versa your offsets change and your kernel just blows up in your face it was kind of our case on case basis where you saw the errors coming out of the disassembly and you
had to fix it so sorry can you can you repeat the question like for example if you if some instruction is not supported by the disassembler so you saw it that it crashed that there's something wrong and then you fix it by hand yeah well if we saw that there was a problem with this like I don't
recall having any unknown instructions in the disassembler I don't think I've ever had a problem with that but yeah this was a lot of like you know engineering work so let me repeat the problem was not a bug in the disassembler but an issue with the instruction format that the same mnemonic
can be translated into two different instructions one of which was five bytes long the other one was six bytes wrong both used the exact same mnemonic right this is an issue with assembly encoding you had no problems with unsupported instructions which couldn't be disassembled no no I'm not as far as I know at least we have one more minute so a very
short question for microphone number two does it work is your binary instrumentation equally powerful as kernel address space I mean kasan so
does it detect all the memory corruptions on stack heap and globals no globals but heap it does all of them on the heap there are some slight variation on the stack because we have to piggyback on the canary stuff as I mentioned quickly before there's no reflowing and full recovery of data
layouts so to get anything on the on the stack we have to piggyback on existing compiler extensions like stack canaries but so we don't support intra object or flows on the stack but if you do leverage the stack canaries to get some stack benefits which is I don't know 90 95 percent there because
the stack canaries are pretty good for heap we get the same precision for globals we have very limited support thanks so that's all the time we have for this talk you can find the speakers I think afterwards offline please give them a big round of applause