SaBRe: Load-time selective binary rewriting
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 490 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/47456 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Local GroupSoftwareBinary codeStructural loadRewritingSoftwareStudent's t-testCausalityComputer animation
00:40
Local GroupSoftwareStructural loadBinary codeRewritingMiniDiscRead-only memoryPoint (geometry)Level (video gaming)Multiplication signCASE <Informatik>WritingSystem programmingSoftwareSoftware testingFault-tolerant systemComputer animation
01:26
SoftwareRead-only memoryMiniDiscSystem programmingKernel (computing)Real number1 (number)Computer programmingPoint (geometry)Rule of inferenceSystem programmingSemiconductor memorySoftwareComputer configurationKernel (computing)Instance (computer science)Computer animation
02:07
SoftwareRead-only memoryMiniDiscSystem programmingKernel (computing)Computer virusCodeLibrary (computing)Instance (computer science)Library (computing)Kernel (computing)SoftwareLabour Party (Malta)Operating systemComputer programmingWordSystem programmingFormal languageComputer animation
02:54
Computer virusCodeSystem programmingInterface (computing)Line (geometry)CodeLibrary (computing)Multiplication signString (computer science)WordPoint (geometry)Message passingRun time (program lifecycle phase)System callOperating systemInterface (computing)System programmingAnnihilator (ring theory)SpacetimeComputer programmingSingle-precision floating-point formatGradientComputer animation
03:33
Computer virusInterface (computing)System programmingCodeSpacetimeKernel (computing)Operations researchSpacetimeComputer programmingOperating systemInterface (computing)System callSystem programmingGraph coloringForcing (mathematics)Set (mathematics)Level (video gaming)TouchscreenSoftwareOperator (mathematics)TelecommunicationPeripheralPrimitive (album)Instance (computer science)Computer animation
04:24
SpacetimeKernel (computing)Interface (computing)System programmingOperations researchFunction (mathematics)Similarity (geometry)Function (mathematics)Functional programmingTerm (mathematics)outputGoodness of fitSet (mathematics)System programmingParameter (computer programming)System callResultantNumberExecution unitRight angleInstance (computer science)WordComputer fileCodeLetterpress printingRadical (chemistry)CASE <Informatik>Multiplication signComputer animation
05:23
Interface (computing)System programmingOperations researchFunction (mathematics)Similarity (geometry)Kernel (computing)SpacetimeParameter (computer programming)String (computer science)Associative propertyComputer fileNumberFunction (mathematics)Streaming mediaWebsiteInstance (computer science)Term (mathematics)System programmingType theoryError messageCASE <Informatik>Row (database)Computer animation
06:17
System programmingKernel (computing)System callSpacetimeMiniDiscInterface (computing)WritingPermianParameter (computer programming)Rule of inferenceSystem programmingComputer fileCodeCASE <Informatik>Asynchronous Transfer ModeNumberFunctional programmingMultiplication signRight angleTerm (mathematics)FehlererkennungscodeInstance (computer science)String (computer science)System callComputer programmingError messageSpacetimeBit rateComputer animation
07:15
WritingPermianSystem programmingKernel (computing)System callSpacetimeMiniDiscInterface (computing)Error messageSimulationLevel (video gaming)CodeSystem callParameter (computer programming)String (computer science)Right angleSystem programmingInstance (computer science)NumberCASE <Informatik>Row (database)Negative numberInductive reasoningForm (programming)Term (mathematics)Computer animation
07:53
SimulationSystem programmingError messageLevel (video gaming)CodePermianInstance (computer science)MiniDiscComputer fileFehlererkennungscodeSystem callTerm (mathematics)System programmingWritingCartesian coordinate systemReal numberOperator (mathematics)Error messageCodePoint (geometry)Computer animation
08:48
Machine codeSource codeLevel (video gaming)Binary codeSource codeBefehlsprozessorMachine codePoint (geometry)Level (video gaming)CodeTerm (mathematics)Computer programmingWritingComputer programMultiplication signGoodness of fitComputer animation
09:27
Machine codeSource codeControl flowSequenceBinary codeDisassemblerBit rateBinary codeWritingTerm (mathematics)Representation (politics)SequenceFamilyRow (database)Computer programmingComputer programmingBefehlsprozessorComputer animation
10:06
DisassemblerMachine codeSource codeSequenceControl flowBinary codeRevision controlSequenceTranslation (relic)InformationForcing (mathematics)Insertion lossReal numberBitComputer animation
10:45
Binary codeDisassemblerMachine codeSource codeSequenceControl flowBranch (computer science)FamilyMultiplication signComputer programmingSet (mathematics)Bit rateComputer animation
11:25
DisassemblerBranch (computer science)Operations research2 (number)WritingRevision controlReal numberOperator (mathematics)Sheaf (mathematics)Electronic mailing listInformationBit rateBinary codeType theoryComputer animation
12:04
Computer programmingStructural loadText editorClassical physicsMereologyRevision controlComputer animation
12:42
Inclusion mapOperator (mathematics)Graphics tabletSpacetimeStructural loadComputer animation
13:23
Instance (computer science)Structural loadBit rateSpacetimeConstructor (object-oriented programming)Right angleLoop (music)Point (geometry)Structured programmingCondition numberBranch (computer science)Assembly languageGraphics tabletTerm (mathematics)High-level programming languageMachine codeCodeComputer programmingRow (database)Computer animation
14:26
System callCASE <Informatik>Multiplication signGame theoryInsertion lossFunctional programmingSystem callInternetworkingProcess (computing)Instance (computer science)Branch (computer science)SequenceCore dumpUniform resource locatorComputer animation
15:14
System callAddress spacePairwise comparisonBranch (computer science)CodeWindowInformationDifferent (Kate Ryan album)Group actionBitReflection (mathematics)CASE <Informatik>Positional notationComputer animation
16:08
System callBranch (computer science)System callCore dumpCASE <Informatik>WebsitePerfect groupGraphics tabletInstance (computer science)Multiplication signProcess (computing)Computer animation
16:47
System callComputer configurationSpacetimeRead-only memoryHydraulic jumpGraphics tabletCASE <Informatik>WhiteboardBit rateTerm (mathematics)CodeSpacetimeGoodness of fitInsertion lossComputer animation
17:33
Computer configurationSpacetimeRead-only memoryHydraulic jumpDampingSemiconductor memoryLine (geometry)MereologyStreaming mediaFitness functionInsertion lossSpacetimeHydraulic jumpComputer animation
18:18
SpacetimeProcess (computing)Revision controlCASE <Informatik>SequenceBranch (computer science)Line (geometry)LengthRight anglePoint (geometry)Green's functionCodeArrow of timeComputer animation
18:58
SpacetimeMereologyLine (geometry)CodeSpacetimeGenderFunctional programmingGreen's functionConstructor (object-oriented programming)Computer animation
19:42
SpacetimeBranch (computer science)CodeFunctional programmingComputer fontSpacetimeRow (database)VotingLabour Party (Malta)DampingLine (geometry)Point (geometry)MathematicsSemantics (computer science)Multiplication signComputer animation
20:33
SpacetimeCodeBranch (computer science)CASE <Informatik>WebsiteRight angleMultiplication signSelf-organizationComputer animation
21:29
Substitute goodLine (geometry)SpacetimeCASE <Informatik>Branch (computer science)Graphics tabletDataflowInformationOrder (biology)Instance (computer science)NumberStructural loadPairwise comparisonComputer animation
22:11
Line (geometry)Substitute goodSpacetimeBranch (computer science)Pairwise comparisonSpacetimeSystem callInterior (topology)Line (geometry)Insertion lossCASE <Informatik>ResultantDampingSinc functionComputer animation
23:07
WindowDescriptive statisticsCASE <Informatik>View (database)Reflection (mathematics)Process (computing)SpacetimeConstructor (object-oriented programming)Rule of inferenceComputer animation
23:57
CASE <Informatik>BitMereologyInstance (computer science)NumberLoop (music)Branch (computer science)Theory of relativityCodeSequenceDressing (medical)Rule of inferenceStreaming mediaFlow separationMultiplication signAddress spaceDisplacement MappingOpen sourceDistanceSet (mathematics)Different (Kate Ryan album)Computer animation
24:51
Different (Kate Ryan album)Displacement MappingDistanceControl flowOpen sourceCASE <Informatik>Instance (computer science)Computer animation
25:39
Lie groupArithmetic meanRow (database)Arrow of timeFerry CorstenBranch (computer science)Fitness functionDisplacement MappingStructural loadMechanism designLine (geometry)40 (number)DampingSpacetimeSet (mathematics)Computer animation
26:22
Lie groupDisplacement MappingSet (mathematics)SpacetimeDisplacement MappingElectronic visual displayNeuroinformatikComputer animation
27:08
Displacement MappingBranch (computer science)EmpennageDisplacement MappingNumberFeasibility studyWritingOpen sourceProcess (computing)WeightBranch (computer science)Computer animation
27:53
Branch (computer science)CASE <Informatik>Goodness of fitLabour Party (Malta)Right angleLoop (music)Branch (computer science)Structural loadBit rateAddress spaceCore dumpSequenceComputer animation
28:53
Branch (computer science)Row (database)Displacement MappingSound effectCASE <Informatik>Right angleCodeBranch (computer science)Sound effectWave packetReflection (mathematics)Computer animation
29:34
Branch (computer science)Displacement MappingSound effectOverhead (computing)Type theorySound effectAcoustic shadowSet (mathematics)Right angleWindowInstance (computer science)Branch (computer science)CASE <Informatik>Position operatorLogic gateThumbnailComputer animation
30:25
Overhead (computing)Binary codeWritingOperator (mathematics)Goodness of fitInformationFrame problemOpen sourceAddress spaceOverhead (computing)WordComputer animation
31:17
Overhead (computing)DisassemblerControl flowCodeBinary codeSequenceOverhead (computing)Dressing (medical)DemosceneMechanism designSinc functionComputer programmingMultiplication signResultantProgram codeMachine codeTerm (mathematics)Binary codeSequenceNumberGoodness of fitComputer animation
32:24
DisassemblerBinary codeCodeSequenceControl flowKey (cryptography)SequenceLatent heatWordInstance (computer science)Computer programmingComputer animation
33:11
DisassemblerControl flowBinary codeCodeSequenceType theorySymbolic dynamicsFluid staticsRun time (program lifecycle phase)Computer programAlgorithmMultiplicationLinear codeSweep line algorithmRecursionAlgorithmRight angleDisassemblerWritingBinary codeComputer programming1 (number)Dynamical systemForm (programming)Computer animation
33:53
DisassemblerRecursionSweep line algorithmLinear codeInclusion mapTerm (mathematics)Linear codeSweep line algorithmComputer programmingStructured programmingInstance (computer science)Point (geometry)Observational studyInsertion lossOperating systemGraphics tabletCASE <Informatik>CodeComputer animation
34:38
DisassemblerSweep line algorithmLinear codeRecursionCodeContent (media)Mixed realityVariable (mathematics)BefehlsprozessorDataflowControl flowBranch (computer science)Instance (computer science)Multiplication signObservational studyCondition numberRevision controlFluid staticsCompilation albumTerm (mathematics)CodeSeries (mathematics)Computer animation
35:24
System callSystem programmingStructural loadBinary codeFunction (mathematics)SimulationError messageCodePermianComputer programMiniDiscOverhead (computing)Read-only memoryLinker (computing)Process (computing)System callReal numberSystem programmingError messageInstance (computer science)FehlererkennungscodeBinary codeMusical ensembleComputer programmingMotherboardStructural loadSemiconductor memoryComputer animation
36:03
System programmingSystem callComputer programMiniDiscOverhead (computing)Read-only memoryProcess (computing)Linker (computing)Operating systemSemiconductor memoryComputer programmingSystem programmingMultiplication signRight angleThomas BayesStructural loadDisassemblerRun time (program lifecycle phase)CodeComputer animation
36:43
System programmingSystem callComputer programMiniDiscRead-only memoryOverhead (computing)Process (computing)Linker (computing)Computer programmingInterface (computing)Plug-in (computing)HookingFunction (mathematics)Parameter (computer programming)Term (mathematics)Overhead (computing)Instance (computer science)System programmingComputer programmingCartesian coordinate systemBootingGoodness of fitPlug-in (computing)NumberForcing (mathematics)Computer animation
37:31
Computer programmingInterface (computing)Plug-in (computing)Function (mathematics)HookingParameter (computer programming)System programmingLevel (video gaming)Error messageSimulationCodePermianLine (geometry)Functional programmingHookingSpacetimeParameter (computer programming)CodeRun time (program lifecycle phase)Multiplication signSystem programmingSystem callComputer animation
38:22
Plug-in (computing)System callNumberParameter (computer programming)Right angleGoodness of fitSpeech synthesisComputer programmingFunctional programmingSystem programmingDirection (geometry)CodeLatent heatCASE <Informatik>Graph coloringRow (database)Computer animation
39:05
Plug-in (computing)Identity managementSoftware testingSimulationAbelian categorySet (mathematics)Error messageCASE <Informatik>FehlererkennungscodeMultiplication signCartesian coordinate systemCodeRight angleComplete metric spaceOperator (mathematics)Letterpress printingBitSoftware testingSoftware bugPlug-in (computing)TouchscreenRow (database)WordForcing (mathematics)Noise (electronics)Decision theoryMathematicsCore dumpSystem programmingParameter (computer programming)Open sourceLabour Party (Malta)Computer animation
40:16
FingerprintContinuous integrationOpen setEmulationImplementationAbstract state machinesLibrary (computing)Computer programCodeReduced instruction set computingCodeParameter (computer programming)Right angleDomain nameVideo gameOcean currentImplementationOperator (mathematics)SoftwareLibrary (computing)Computer animation
40:57
ImplementationAbstract state machinesLibrary (computing)Computer programCodeReduced instruction set computingSystem callOverhead (computing)Structural loadIdentity managementBefehlsprozessorComputer fileDisassemblerSoftwareType theoryForestSequelOverhead (computing)Server (computing)Cartesian coordinate systemComputer animation
41:42
Identity managementBefehlsprozessorSystem callStructural loadOverhead (computing)System programmingEstimationMiniDiscNetwork topologyPlug-in (computing)Function (mathematics)Equivalence relationPairwise comparisonStructural loadMultiplication signDependent and independent variablesAverageOverhead (computing)Goodness of fitMusical ensembleSystem programmingCASE <Informatik>Function (mathematics)Run time (program lifecycle phase)Data storage deviceQuantum stateForm (programming)ResultantComputer animationDiagram
42:21
Personal identification numberAddress spacePlug-in (computing)ImplementationStress (mechanics)Condition numberOverhead (computing)Arithmetic meanReading (process)View (database)Loop (music)Decision theoryComputer animationDiagram
43:02
Suite (music)Message passingError messageCrash (computing)Read-only memoryProcess (computing)Plug-in (computing)System callFunction (mathematics)Directory serviceSoftware bugUtility softwareRadical (chemistry)Different (Kate Ryan album)Multiplication signNetwork topologyCore dumpRandomizationSelectivity (electronic)Computer animation
43:42
Process (computing)Read-only memoryPlug-in (computing)System callFunction (mathematics)Functional programmingMultiplication signRewritingSystem callLatent heatOpen sourceCodeStructural loadPlug-in (computing)TimestampOverhead (computing)Line (geometry)Table (information)CASE <Informatik>Computer animation
44:27
Plug-in (computing)File formatCASE <Informatik>AreaPresentation of a groupHecke operatorMereologyException handlingComputer animation
45:09
Plug-in (computing)Exception handlingMereologyCASE <Informatik>Arithmetic meanError messageAnnihilator (ring theory)Branch (computer science)SpacetimeDifferent (Kate Ryan album)Computer animation
46:01
Plug-in (computing)Address spaceOperator (mathematics)System programmingOperating systemCASE <Informatik>System callComputer animation
46:45
Plug-in (computing)Mechanism designBinary codeOpen sourceSource codeSoftwarePoint (geometry)Computer animation
47:32
Plug-in (computing)Personal identification numberComputer configurationInstance (computer science)Procedural programmingBinary codeAlgorithmVideoconferencingCodeComputer animation
48:17
Plug-in (computing)Limit (category theory)WordBranch (computer science)Hydraulic jumpDynamical systemNeuroinformatikInstance (computer science)Revision controlMachine visionRange (statistics)ResultantComputer animation
49:02
Plug-in (computing)Branch (computer science)CompilerCodeComputer animation
49:48
Point cloudFacebookOpen sourceWorkstation <Musikinstrument>GodMechanism designCodePoint (geometry)
Transcript: English(auto-generated)
00:05
So hello everybody, we're gonna start next talk which was presented by Poloncio and Aras and Which is a low-level special Engineer and he will talk about Sabre
00:23
Thank you, thank you for the for the introduction So yes indeed. I'm going to tell you about software that I developed We have a bunch of students under the supervision of Christian Kadar when I was a postdoc at the Software Rally Group at Imperial College London
00:45
until last year So before going into detail on what exactly is low-time selected binary writing. Let's start with Some quite simple use case just to see why we need to this such low-level first system
01:03
So for instance, let's say that you you are a developer. You've developed a brand new software and you want to To test to see how resilient your software is Well, you basically want to to see if you can you can find some some tricky bugs And fix them
01:21
Easy with some tools so basically if you want to assert the fault tolerance of your software So for particular example, well, you can have a this rule of Memory run exhausted or whatever kind of problem you can you can run into and you want to check that your software is able to cope appropriately with this kind of
01:44
All issues. So the problem is it's hard to reproduce on the real system Well, if you want to to simulate on the real system for this or Lack of RAM well, that's quite quite painful. Usually what you want to do is to to simulate this
02:01
So the question is how could you? achieve this so It is basically possible you are you really several option that one of the option is to to tinker with the the kernel for instance well for kernel hackers, that's That great but when for most people it's well quite quite painful and dangerous so you don't want to do that
02:24
you can also try to To modify to adapt or to insert intercept the libraries that support your your software But once again, it's quite a quite painful and not really a reliable So actually the there is an intermediate solution
02:41
That is trying to see what's between the libraries and the operating system So let's let's have a look at the simple very simple Python program for instance Well here the example is with Python, but you can do that with whatever language you want And what with the basic hello world what you want to do is just print the string. Hello world
03:03
Onto the string until time here. So basically what happened is that So you have your user code. So basically this line of code that we put into some from the library So in this case, it will be the Python runtime and at some point the pass on the runtime will Enter call into the operating system to what actually print the string onto the string
03:23
and what we're interested in here is The interface between the library and the credit system. So this interface is called the system called interface And this is the single interface that's available to communicate between The user space your program and the operating system
03:43
There is no no other way So you are absolutely sure that any call that you are making to the operating system will go through this Interface so that's very important for for us What's next so just let's take a look a quick look at what is exactly this system called interface?
04:04
How does it work? What is it for? So actually it's it's a set of low level operations like what kind of of primitive Real relatively low level like if you want to write the characters on the screen as I said previously or for instance to send packets
04:21
To the to the networks or communicate with some random peripheral. Well, what whatever you want to do Especially in terms of input or output you have to go through this and use one of the function that are available So Basically system calls are really alike
04:41
regular function that is you have a set of arguments that you pass the system call and you get One result in return depending so the the number arguments vary usually between zero and six and the Linux So depending what you what you want to do you will end up using one of these functions
05:01
So for instance if we go back to our Python hello world the print hello world will translate into the right system code which is the way that you use to To write some character on the terminal for instance or also to write some data into a file So in this case this right call takes three agreements the first one is the file handle so when?
05:28
most people know under Linux the Standard output is Associated with the file descriptor number one that reason why we have one at the first Argument then the string that you actually want to print well and finally the side of the string which is here
05:43
so far so good now As I said what we want to achieve is to simulate a fault The problems in terms of for instance we we have this one what exactly happened, so we want to Take a look at the return value of the system
06:03
Because this is what we tell us if the this temple has succeeded, or if there was an error This return value can be either positive If it's a sexual or negative in this case the exact value will indicate the type of error So in this case in the case of the right System call if it is successful you get the side that was actually written
06:23
And if it's not well you get some error code that will describe the error that you got to find sense Permission denied the code is e perm or this rule in which case you get e no space So for instance let's say that we have this Python program slightly more complex than the previous one this time
06:42
we want to write into a file, so we use the the open function and We want to write into a the file whose path is slash EMP slash. Hello in write mode And then yes, we write the string hello into this file In terms of system code so we have to system code the first one is well as you can get
07:01
You open to get system code that takes well Maybe the same arguments at the real function and once the file has been open if it succeeds It's returned the number of the file descriptor Which is 8 here, and then we just have to get this number It's a descriptor and pass it to the right system code in town, so that's what we do
07:22
We are calling the right system call with eight of the first argument the string and the sale dressing so here We have two possible outcome either It's successful in the English case it returned the number of characters return Six or if it's not well it will return some negative value Which is a zero code that describes the problem so for instance you could get an e perm code
07:44
We tell you that you don't have the necessary permission to write into this thing Right so let's go back to our original example of fault injection We want to simulate some some fault for instance stimulate the fact that the disk is full so that you can't write to a file
08:02
So the question is in terms of system call how does it translate? So what we want to do actually is to take the error code that is really returned by the system code and swap it with an error code so that Even though the operation has been successful your application will get an error code, and we think that
08:24
there was the error decoded and encoded by the By the error code and at that point you'll be able to assess exactly how your your application is responding to it So yes, let's do this so for instance. Let's take this right call and
08:40
Replace the the return value the real return value which is fixed with an e perm code So how to achieve that well, that's where binary writing comes into play, so What exactly is binary writing so in simple terms it's modifying a program Not at the source code level, but at the machine code level that is the code the binary code that is actually
09:05
Read and executed by the CPU so the good point is you don't need the source code Well sometimes you don't have access to source code at all But well most of the time even if you have the source code you don't necessarily want to recompile it every time
09:21
especially if you want to to deal with fault detection So well one good thing with binary writing is that you don't have to to modify the source code You don't need to recompile anything. You just have to get your binary and work directly on it So the only requirement for binary writing is
09:44
disassembling so For now, let's let's keep the assembling Apart, and let's just say that what we need is the representation of the program's in terms of Assembly instructions, so it's quite basically means that you have your your binary code
10:03
You see just a sequence of zero and one so that's really what the CPUs Are seeing and what they are able to execute, but well we need To be able to have a slightly more human readable version of this so this is where this assembly coming to play is basically
10:23
Translating what we have on the left sequence of this into something that is more or less. I say more or less Readable by by human which is just a sequence of instruction, so that's really a translation the perfect translation There is no loss of information, and you can
10:40
then Reconvert the assembly version into the original bit sequence without any loss of of data So once once we have this disassembly we are able to do interesting stuff so Let's let's take this representation, so
11:00
so far So I've been showing you just a small snippet of assembly And the rest of this talk anytime I will show you Some assembly it will look like this there will be three columns The first one is your set from the very beginning of your of your program say so here Let's say that just an except so the very beginning of your program. Just the first the first couple of instruction
11:27
So we start with the zero and then we increment Second second column is the size so I'll explain later why I should decide because that's really important when we do binary writing and finally the
11:40
Human readable version which is just what the instruction does in a way more readable by by human okay, so let's Let's take this and then let's try to see what operation we can do on instruction So we can do three exactly three operation then remove instruction We can replace instructions, or we can add instruction
12:03
That's the three types of instruction that we can do in binary writing and I'm going to show you How we can do each of them, so let's start with the very first the simplest thing that we can do is removing instruction so Here what we do is that let's let's say that we have this program on the on the left hand side
12:23
So here we want to remove the instruction in the red Okay, so the load something so what do we do well? Well we just remove it, but we don't just remove it because well, it's not like in a classic text editor You can't just remove the the part that you're not interested in and have everything that will
12:44
That will fit between here you have to do An additional operation so the additional operation is to pad up the gap that you've just created By adding some instruction that do basically nothing so the instruction that do nothing here is not no operation
13:01
So since each of these instruction is just one byte long you put two and then you end up having sealed Um the space which is two byte long With the load here, right? Um, well, so you're going to add but why do we need to do that? Why why not just remove the load operation and then shift everything?
13:21
Uh, so that's well we end up we know no lot of space. Well The problem is if you do that You change the offset of the instruction for instance here you can see that the load instruction has offset one And everything after as offset three eight, etc Here if you want to remove
13:42
The load instruction without padding the space will end up having the cool instruction at offset one The or instruction at offset six. Well, everything will be shifted and the thing is in assembly in machine code You need to have the offset right to be able to jump from one point of the code to another
14:03
So basically in a high level language when you have an if a while a loop or any conditional structure It's basically saying in terms of assembly. Well, we are jumping from one point in the program to another and the jump Um in assembly code is achieved by just recording the offset what we want to jump
14:22
So well, that's the reason why we don't do that We don't just remove the instruction that we uh, we're not interested in. Uh, we need to pad to keep the offset Uh the same, right? Um, so that was to remove now, let's see how we can replace Uh some instruction slightly more complex, but right let's start with a simple case
14:42
so let's say that we want to to replace a corner for calling you read the way that you uh, Um, you're calling a routine a subroutine or a function so let's say for instance that we have this snippet of assembly and we have this call instruction we want just Uh to replace it with the jump to somewhere else. Okay. So here is how we translate
15:05
So we just replace the call with a jump And we end up with this assembly sequence but now Now we now we now that we have this we have to ask ourselves the question What exactly the size of this jump instruction because as I just said you have to ensure that you keep the offset of the surrounding
15:24
code the same so the question is what is the size of the jump because if the size is five well, you will end up with the The same offset for the for the next instruction would be eight, but if it's different Well, you will shift everything so you have to be careful with the size of the new instruction
15:41
Um, so basically you need to you need to compare uh, the related sizes of the original instruction And the instruction with which uh, you want it to be replaced Um, so yeah, that's what I I just say. Um So let's well, let's introduce a bit of uh notation
16:01
So I note s of o the side of the original instruction and s of r the side of the rewritten the new instruction And then we compare them so in this case we compare the size of the red instruction the one that we remove With the size of this new jump instruction, which is in in green and in this case, okay Let's say let's say that it's quite simple. We have the same side five for the call five for the jump
16:26
That's the same thing In this case, well quite simple Uh, we just sorry quite simple We just replaced and we do nothing else nothing to pad nothing to shift that just fits that's perfect But well as you can guess most of the time it's not that simple
16:43
So for instance, let's imagine that the job is shorter. It's three instead of five. So in this case, well You can guess what we do we pad just like we did when we remove them. That's more or less equivalent, but then obviously We can get into cases where the jumper so the new instruction
17:02
Is larger than the original one in this case. Well, what do we do so? That's quite a problem usually uh Well quite a basic problem in terms of binary writing so there are Several ways to deal with that. I'm going to show you the way that we use in saber which is called detour
17:23
so detour or detouring is just Taking the instruction that do not fit anymore In the original code space so where we added Instruction or we replaced an original instruction with a larger one. So there are some instructions that can't feel can't fit anymore
17:44
So we have to take them out Move them somewhere else. So the somewhere else is What are we called an out of line scratch space? So we offline because it's not it's not in line anymore. It's not part of the The straight instruction stream. It's somewhere else in the memory. Okay
18:01
So the way we do that is well, basically we are allocating memory somewhere else. We are moving the instruction that do not fit anymore in the original mainline stream To this out of line squash space and then obviously we need to connect this Out of line space to the original mainline and we do that by inserting some jumps so let's let's take it
18:26
on a simple example Let's say that we want to add an instruction with such a detour in this case Let's say that we want insert a jump No, sorry, let's say that we want to insert some instruction
18:42
it can be one instruction can be a sequence of instruction no matter the The the length of the sequence you want to insert the principle is the same So let's say that we want to insert at the point where you see the green arrow Okay, so right after the load and before the the code So how can we do that? So first well, okay the out of line space which is at the the bottom
19:05
right corner And there we move the code instruction Because it can't fit anymore Uh once we have rewritten this part of the code, so we move it The different spots we move it from the the main line to the out of line stretch
19:22
Then we insert the add instruction. So here I didn't represent them But you can you can think of it as just one instruction or a snippet of how many instructions you want. That doesn't matter And then Last part that I mentioned you have to connect This out of line squash space with the rest of the code and to do that you use them
19:44
So first we insert the jump So that the first green instruction that you can see at offset three And you jump to some label that is representing the beginning of the outline squash space So the label here is d0. So when you're here
20:01
Which should be the place when you when you have the coin function in the original code Instead of directly executing the coin function you will jump At the beginning of the out of line scratch space And then at that point you can execute the original instruction the code as if it were still at the same the same place the semantics doesn't change you have this guarantee and then
20:23
You can just execute whatever arbitrary instruction that you want and finally Let's not forget to jump back to the main line with this jump. So this time We've inserted another label called l0 which is just right after after the jump So that when we have executed the add instruction we just come back and then continue
20:45
Well, I was about to say as if nothing has happened. Well, something has happened, but the rest of the code the surrounding code remains unchanged Right. So this is this is very important. This is really the the basics of how everything is working
21:01
So in this case, you can notice that the size of the original instruction the code here in orange is the same as the jump five Those are five Well, that's fine But as I mentioned before you have to be careful that when you replace one instruction with the other you have to compare the relative sizes and see if that fits or if you have to to adapt
21:23
So in this case, we do exactly what we've done before we have to compare this time not The original instruction that the jump that we insert with the original instruction So in this case, let's say that well, we have basically a two two cases Either the jump is less or equal to the to the size of the original instruction in which case well
21:43
You just insert it if need be You pad with knobs and everybody is happy Or if it's not the case if the jump itself is larger than the original instruction Well in this case you have to relocate the number of instructions So for instance, let's go back to our previous example
22:02
We want to replace this load instruction And we want to insert a jump so in this case, let's say that the original instruction the load that we want to replace Is only two two bytes which is quite short in comparison the jump is five So well in this case, the jump is quite quite larger than the original instruction
22:24
So you need to well, you need to make space even to insert the the jump. So well What you do is just As I mentioned before You take the call instruction that is just next to the instruction that you want to replace you move it
22:41
Um to the outer line squash space the rest is the same you still need to to allocate the space So I don't go back to that. But once you have your out of line scratch space You just move the surrounding instruction At the right place. So here after the instruction that you want to substitute since it is Since you are relocating the instruction that is after
23:02
The instruction that you that you replace and you end up with this result So still the the jump to the outer light space and then jump back to the main line um, so well i've I basically covered all the possible outcomes depending on on the side the relative sizes of the original instruction And the written instruction and same for the job instruction
23:24
In case you have to uh to have this out of that space Um, so yeah, basically the most complex case that you can face is that you have a couple of instructions around The job that you need that you insert that you need to relocate So now the question is, uh, can instruction always be
23:44
Re-written relocated. Sorry, and we always rewritten arbitrarily Selecting instruction around the spot where we want to replace or other instruction Well you get that if i'm asking the The answer is no, so i'm going to show you the well some typical examples of
24:03
Cases where you can't or you have to be very careful when you relocate, uh instructions So for instance pc relative addressing, so what is pc relative addressing? So usually when you want to I give you the example of a jump when you want to to jump to some part Of the code for instance if you have a loop a loop is basically having a jump backwards
24:24
Into the instruction stream So that you will execute this sequence several times and usually The most efficient way to do that is instead of encoding the the address the full address of the Of the place where we want to end up the target of the jump, which would be well 30 to 64 bits
24:45
Which is quite long Usually you just use the displacement which is the the distance the offset between The source of the jump where we're jumping from To the target where you're jumping to the difference between the two usually it's a well a couple of bytes
25:01
because while it's quite rare that you have to jump especially when you're When you have a typical Control structure that you have to jump to a place very very far away in the corner So usually you just have to encode the displacement the distance between source and target and that's much more efficient so in this case for instance Let's say that
25:21
We want to load some data That is located 48 bytes from the current value of the pc, which is basically where where we are the current instruction That is that is executed. So 48 bytes From the place where we are currently executing
25:40
Right and we want to exit to to insert an instruction where we have the green arrow There so right before the load instruction that has the displacement addressing So, what do we do? Well, we just apply the d2, the d2 mechanism that I just presented Earlier, so we end up with the same kind of stuff with our jump
26:03
Well, we need to insert some nope because we are relocating the the load instruction because well, we need to make the the jump fit and So we end up here in the out of line scratch paste we end up with this displacement 40a from pc
26:21
well The problem is since we moved the instruction to somewhere else. You can see here. The offset is quite different It's ffea And the original set was two so we are very very far away from the original Space so this offset is not correct anymore because we didn't move the the data that it's putting to
26:40
So we end up actually with Like quite a significant mismatch between between the right displacement Which is 49 in the original instruction? and after relocating the instruction you get This displacement which is one zero zero three two. Well, that's completely, uh wrong So you have to be careful with that and you have to well basically
27:05
Just recompute the actual displacement after having Relocating the instruction. So well, you just compute the difference between the original Displacement and the new offset so here 49 minus ffe6 and you end up with the the new
27:24
Displacement once you have relocated the instruction, which is minus ff90 Okay, so well That's that's feasible. That's feasible But usually we try to when we do binary writing we try to to avoid it Because why it can be quite quite painful so second problem that can that can arise when you want to relocate some instruction
27:47
um, so I told you about germs I told you about Source and targets so well Basically here imagine that we have a jump backwards like in like in the loop. Um, and we want to
28:00
Um, we want to insert an instruction right before the target of the loop, which is the target of the jump Sorry, which is here represented by the l0 label, right Um, so what do we do? Well, once again, we do the same kind of thing with the detour We move The original surrounding instructions, which in this case is this load
28:22
We had the instruction etc as usual, um the problem When you end up trying to execute the jump l0 instruction after writing what happened? Well You've lost the the l0 because you rewrote the sequence of instructions
28:40
Which is load then call And you can see that the target the l0 label is pointing right in the middle of this and since you've completely overwritten this If you still want to jump at this address Well, if you end up jumping right in the middle of another instruction, so this is completely wrong um and in this case well
29:01
There is no simple way to solve it So what we do is that we scan the code before before even starting to uh to rewrite We record all the branch targets And we say okay these instructions we don't do anything with them So we we just keep them at best and we won't even try to relocate them So we need to do we need to do that. Otherwise we end up with some
29:23
Um Trying to decode some non-existent illegal instruction and one that can be quite quite problematic. So I just show you two examples of a typical problem that you that you end up when you want to relocate instruction There is a third one that is side effects
29:40
Uh, I won't go into the details of the side effects, but just keep in mind that you have three types of problems or challenges and you have to You have just to keep them in mind when you rewrite instruction So now What if we end up for instance with um a target A target for a jump right before the instruction that we want to insert and also some instruction with side effect right after
30:06
Which means that you just uh, you'll write in between two instructions that you can't Relocate so what happened in this case because you may for very valid reason you will be willing to uh,
30:21
If you end up having surrounding social that you can't locate well you're stuck Well, no not completely you can you can still end up Doing something which is not strictly speaking binary writing but something that can Allow you to um to do what you want so actually what what you can do is just
30:42
Replace the instruction that you you're interested in with an illegal one So an instruction that you that is known to not encode An illegal operation and so you replace it with this this instruction And what you do is that you set up? Um a single handler that will catch the single that will be raised by trying to decode an illegal instruction. So
31:07
Once you set up this you just have to cut the signal obviously you need to uh to record the source the source address Um of the instruction that uh, that's what that triggered the the signal signal
31:21
But then you can do whatever you want inside the signal handler So well as you can imagine, that's quite uh, quite a significant overhead Doing this having to go through all the mechanism Of the signal handler, but since um, well in our experience that really really happened rarely
31:42
Well the overhead is uh instructable so That's that's really the the last resource The last result solution that at least we have a solution when when that happens which means that we can We can rewrite virtually all all instruction any arbitrary instruction in the in the program
32:02
So I promised you that I go back to the disassembling problem Well at the time I didn't I didn't tell you that there there was a problem, but actually there is a problem Um, so let's let's go back. Let's let's make one step backward and look at What we exactly have? um when you when we actually look at
32:22
some machine code Um some program code in terms of of binary so we have well on the left this binary sequence Some arbitrary one and we want to turn it into something that is well understandable Um by by human And especially we want to to to know exactly which bit uh, which bytes correspond exactly to which instruction
32:44
Because the instruction is the granularity that we are interested in So you can see here that actually what it is really, um meant for is Being able to to match a sequence of bytes with some specific instructions
33:02
So for instance here you have the the yellow sequence of byte that will be Representing a push instruction the orange one will represent the road etc. So that's that's exactly what what we want to to achieve the problem is It's not always easy. You you don't always get it for free. So you have to you have to have some
33:26
Some Some algorithm or some technique that will that will guarantee that You have the right the right disassembly. So there are there are different different techniques Uh that allow you to get uh to get the assembly, uh, the dynamic one the static one
33:42
Uh, well both have advantages but for for the purpose of uh of binary writing In sabre, we will use the the static one so There are basically in terms of starting the assembly your there are many two ways of doing this Either what you have on the left, which is a linear sweep
34:00
So linear sweep is just well, we basically start at the very beginning of the program and we scan it instruction by instruction um, we know We know where the program starts because otherwise the operating system is not even able to load it And so we can we know actually that the point where we start is correct But then we can end up at some point having some uh garbage instruction for instance if you have some some padding bytes
34:26
That are that are just there to uh to ensure alignment in the code Uh linear sweep is not able to uh to get that on the other hand You can have what is called recursive traversal, which is in this case something more clever It's really trying to do what the cpu will do that is following
34:43
The the flow the control flow and when you have the jump jumping which is enabled to To skip the the garbage bytes for instance, but at the same time Uh, you can skip some instruction because you you don't necessarily know where when in which condition a branch will be taken
35:01
um Just to say in sabre we are using the the static version because it's really really efficient in terms of of cost and Previous research has shown that it works very well with At least with code that is produced by uh, well most compilers the gcc llvm and for most targets
35:21
So x86, uh, for instance, uh, it's known to not work very well for arm, but well For at least for our purpose. It's uh, it works quite quite well So, okay. Let's let's put it together and try to see what we can achieve with all of this Uh, so just as a reminder we want to do this we want to inject a fault
35:41
replacing the real error code that is returned by a system called by some error So what we do is that we intercept those system calls We well, we just rewrite the instruction that actually executes the stem call So for instance on x86 you have one syscall instruction. Well, so what sabre does is just rewriting all the syscall instructions
36:08
At one at the very beginning of the execution, so What happened is that when the operating system loads? the binary into memory the program
36:21
Saber will catch it and she will rewrite everything. Well all the system calls even before the execution actually starts So it does that at the very beginning so that you pay Receive a small price at load time, but then at runtime You don't have to do any any disassembly. You don't have to to do any actual rewriting
36:42
the only code that you pay is just what I showed you just taking the detour having a For each system called a couple of instruction. I did so in terms of overhead as I show you In a minute It's really really good So yes, that's basically what uh what we do um
37:00
so in terms of How to use it in concrete terms um, so sabre itself the what I call the backbone of sabre is just A loader that will load your application Rewrite the system calls and then execute the the program As normal if you want to actually do something with those system calls
37:22
You have to write a small plugin using an api. That is really really simple So just an example of how that works. Let's say that we want Well just as I was saying before just inject false so we just have to implement Um, I say we the user the user just has to implement this function
37:41
Uh, whatever the name what it does is that it's like a hook that is called by sabre Every time it reaches a system code. Okay, so at runtime It goes through the out of line squash space then it jumps Like like with the trampoline it jumps into this function and this function
38:02
Well, it's just a usual c code so you can do whatever you like You can change the arguments of the stem code You can change the return value which is what we want to do. You can even end up not calling a system call at all And well basically that that allows you to uh To do what you what you want
38:21
So well, let's let's see what it would look like Um, if we want to achieve this basic fault injection, uh problem So we have this function. Let's call it. Uh, which takes a number of arguments So the first argument is the number of the system code So here we want to catch one specific stem code, which is the right system code
38:46
Well all the other Arguments are just the real argument to the system code So we just pass them directly But in the case in the very specific case when we get the right system code, what do we do?
39:00
Uh in this case. Well, we we can we can uh, call it or Even decide not to call the stem color At all, but in any case what we want to do is return some arbitrary error code So here for instance, we return the the epam error code, which is a permission error and then Your application is going to see this instead of the expected successful, uh return code
39:24
It will see this and at that time we have to decide what to do and You end up seeing what happens It's it depends or maybe it will uh, it will behave very nicely or you get some some error Some error message or a complete crash or whatever. So that's that's how you end up
39:42
Seeing, um, what what uh, what can happen when you have some some random? Bug or some random problem with the operation you're doing with the operating system, right? So just just a few words to tell you that uh, so currently there are Uh three. Yes, three plugins that are open source and available for you
40:01
So well, the first one is basically just reassuring the stem code without changing anything. It's basically for testing purpose You have a fault injector that is a bit more sophisticated than the basic one that I showed you And you have a tracer that is well, if you know s trace, it's basically doing the same just a recording printing on the screen All the stem code that are
40:20
Called with the arguments and return values, but it's much much more efficient. I'm going to show you in a couple of minutes So for now, we support two two targets. I mentioned before so x86 64 and risk five And we get oh, yes, so the current implementation targets exclusively, uh linux
40:41
an operating system, I guess it would be Not too difficult to extend it to Other unices but so far it's not been done it's written mainly mainly in c with some assembly snippets Very important doesn't use any third-party library Not even for disassembling So if you if you look at the other software that exists that do binary writing usually they rely on
41:04
A bunch of third-party libraries, especially for disassembly. We don't do that. And that's why we end up with something really really tiny. So, uh, 47 kilobyte for the executable for x86 40 kilobyte for race file So it's you can even use that for for small embedded embedded types every target. Uh, this will work very well
41:24
um Just just to show you because I uh, I mentioned earlier that um favor is able to be very efficient So just to show you the the kind of overhead that we that we get Uh, so we measure that with some uh, well, uh, well known server Um application like memcash d
41:42
Um So we end up with An average load time like load time overhead. That is uh, not average. Sorry But maximum measured load time over that is 60 milliseconds. That is well lower than the perceived the response Time for the user which is 100 milliseconds the runtime over is very very low as I mentioned before
42:04
usually below three percent In most cases around one percent So we we compare the the Our system called tracer with the original s trace trying to mimic the the output and we get well We got very very good results
42:22
As I can show you here. Yes Um, so well native address the original execution without tracing without implementation or nothing Uh, so we compare the original stress with our implementation of the plugin sbr price and also with some pin Uh pin tool doing the equivalent and well as you can see we are obviously we are uh, there is an overhead
42:42
Uh compared to native implementation, but compared to stress, it's really really efficient Uh, yeah, the first one. Sorry. The first one was on the du, um, the gnu tool To basically reading and writing in the loop And we also did some experiments on the du trying to to run to estimate the decision usage of a big
43:03
Uh directory tree And wind up with more of the same results as with uh with dd um And finally, yes the fault injector. So we got we found some bugs in the in the core utils Tools for the utility is basically well all of the basic tool that you that you have and that you use all the time
43:23
in your your terminal Uh, so yes, we found some bugs different kind of bugs thanks to the Uh, the random probability, uh, uh tool that we have in our fault injector is that well quite efficient so to sum up
43:40
To say that what it's doing it's selective of selective Well, I didn't define selected before but are you as you have understood selective because it's targets Mainly, uh stem calls by the way It's also able to rewrite function prologues if you want to uh, what to intercept some specific function It's also able to to rewrite the rdtsc instruction in x86
44:01
Uh for dtc just reading the timestamp counter So it works at load time just ahead of time as we call it very low overhead so it's suitable for embedded devices Uh, the mp the api is quite simple So you can build your your plug-in just by basically writing one or one function in a couple lines of code We have uh embedded and available plugins open source on github and everything is gplv3
44:28
So yeah, just feel free to play with it If you have any question, i'll be happy to take them. Thank you
44:43
Okay, so we have around four minutes for questions. Thank you for this talk And who has a question here? Thank you for your presentation most interesting technology
45:01
You mentioned a case where you get parts of the rewriting by modifying an error handler Yeah, sorry, I can't hear very well Okay, you mentioned case where you did part of the rewriting using an error handler Yes, so now i'm curious
45:21
How do you make sure that this same error handler? Is not invoked from an entirely different place Which would disturb the meaning of your system? So if I understand your question correctly you are asking when we relocate instructions into an error handler
45:41
Which might also be coughed from a different place Okay, so so when we we don't have space to insert the the jump we use a signal handler to catch the sig the signal And so your question is when we do that How do we ensure that we are well actually handling an instruction that we genuinely have?
46:02
Uh rewritten rather than some well some real Skills some real illegal instruction that we uh, well, we weren't uh aware of is that your question? yeah, so actually you what you do is that You have to um, when you're inside your signal handler, uh, you have to check the address of the instruction that
46:25
That goes through the the signal handler so you you look at this address you see What exactly it is the so the operation? In this case when you are most interesting in system call, so if it's a system call
46:41
Well, that's fine. If it's something else when you just fall through and you let let it be handled by the operating system Does that answer your question? Okay, well if you have more more questions you can can take it offline if you want Uh another question
47:08
Have you with your approach have you encountered any counter um mechanism which is provided in and Um, I mean this way you do not really need the source code, right? You can do this on arbitrary binaries
47:22
We don't need the source code yet. Yes so, uh I assume you have experimented with with third-party software as well. Um doing doing your approach here Well, we're experimenting with Not not all possible. Not all possible option that we will need with for instance. I showed the pin
47:41
Pin tool to do the same My question is my question is if you have any um any Well algorithm algorithm or any any procedure in place. Um, if the binary that you are injecting into Has precautions against like, um, like injections if you have stumbled across something like this
48:02
Uh, i'm not sure I understand what you're what you're asking what uh, what do you want to do to check again? The code that is injected Yeah, i'm not sure i'm not sure this is my question actually Um, so if you have stumbled across any situations where this approach does not work Oh the the approach with the the detour
48:21
Well, the the only problem that can well, I can think of two limitations two problems mainly Uh when we we can't relocate, uh anything surrounding and well then you have to uh to use a Skill It mostly work, but well you can you can end up, you know with uh with problems with that
48:42
The other issue is that I mentioned that you can't relocate instruction that are a target of jumps the problem is for direct jumps I mean jump with the target encoded as an offset as I showed you that works but this you have an offset a target of a gem that is
49:02
Dynamically computed for instance as the result of a computation Uh, well that depends on the data And as I wanna use this is static you can't uh, You can't really do anything about about that. So yes, that would be the main the main problem the problem with indirect jump
49:20
But if I wanted like if I wanted to make my binary, um resilient against this approach that I would just require I would I would just add a a custom custom sick ill handler that I need to invoke Oh, yeah, and if you then override it, I I know okay. Someone is fiddling. I better quit. I better die So this actually this as I mentioned this approach works well with well, you know mainstream, uh compiler gcc
49:44
them that do Well, um sane and sound uh stuff, but if you end up with uh, uh some code that has Obfuscation mechanism because well some that that happens. Yes with obfuscation
50:00
That wouldn't work because frustration, you know, you can have overlapping instructions. You can have embedded instruction. You can even have a Self-relating code self-modifying code. Well in this case, we can't do anything about that But I would say for 99.99999 of the code that that was quite well