Minemu: Protecting buggy software from memory corruption attacks
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 84 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/40035 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Year | 2012 |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 201234 / 84
1
2
3
5
7
9
11
13
14
15
16
17
18
20
23
24
26
27
28
30
32
37
39
41
44
46
47
50
51
52
53
55
57
58
63
68
70
73
75
79
80
81
84
00:00
SoftwareRead-only memoryFreewareEmulatorLevel (video gaming)Process (computing)Dependent and independent variablesSemiconductor memoryBinary codeComputer animationLecture/Conference
00:24
BitCodeComputer animation
00:36
Buffer overflowPointer (computer programming)Point (geometry)Address spaceCodeComputer programmingBitOperator (mathematics)CoprocessorRandomizationNormal (geometry)Buffer solutionLecture/Conference
01:08
Physical systemBuffer overflowCodeSoftware maintenanceSource codeStack (abstract data type)Multiplication signFlagCompilerMathematical optimizationSpacetimeInterior (topology)RandomizationComputer animationLecture/Conference
02:06
Buffer overflowStack (abstract data type)Address spaceBitForcing (mathematics)Semiconductor memoryMemory managementRandomizationType theoryBuffer overflowHost Identity ProtocolComputer animation
02:35
Inflection pointHydraulic jumpCodePointer (computer programming)Multiplication signPhysical systemComputer programmingCartesian coordinate systemMechanism designInjektivitätComputer animationLecture/Conference
04:31
Mathematical analysisAddress spaceMathematical analysisCodeComputer animationLecture/ConferenceJSON
04:50
CAN busInformationPointer (computer programming)Physical systemCodeError messageIntegerOrder (biology)Computer animation
06:00
Video trackingPhysical systemOrder (biology)OrbitOrder of magnitudeComputer animation
06:19
Read-only memoryFundamental theorem of algebraPropagatorSemiconductor memoryComputer animation
06:33
Semiconductor memoryKernel (computing)Kernel (computing)32-bitAddress spaceProcess (computing)BitMathematicsDiagramComputer animationLecture/Conference
07:01
Semiconductor memoryProcess (computing)Semiconductor memoryAreaEmulatorCodeTranslation (relic)InformationJust-in-Time-CompilerDiagramComputer animation
07:45
Semiconductor memoryComputer programmingSemiconductor memoryAddress spaceDiagram
08:05
Semiconductor memoryRead-only memoryAcoustic shadowAddress spaceSemiconductor memoryAddress spaceAsynchronous Transfer ModeProcess (computing)1 (number)SpeicheradresseNumberMachine codeLogical constantCoprocessorComputer animationLecture/Conference
09:46
Fundamental theorem of algebraRead-only memoryAddress spaceSemiconductor memoryComputer animation
10:06
Sound effectUniform resource locatorPropagatorSpeicheradresseSemiconductor memoryExploit (computer security)Computer animationLecture/Conference
11:29
Memory managementDressing (medical)Vulnerability (computing)Data typeString (computer science)Vulnerability (computing)Exploit (computer security)Software testingCartesian coordinate systemBound stateComputer animation
11:52
Overhead (computing)Server (computing)Cartesian coordinate systemNumberMultiplication signEncryptionBefehlsprozessorLecture/Conference
12:43
Interior (topology)Cartesian coordinate systemPhysical systemReal numberReal-time operating systemVideo gameMultiplication signLecture/Conference
13:04
Limit (category theory)Physical systemCASE <Informatik>Semiconductor memorySubsetPoint (geometry)Vulnerability (computing)FunktionalanalysisMaizePointer (computer programming)CodeComputer animationLecture/Conference
14:12
Control flowWorkstation <Musikinstrument>Exploit (computer security)Physical systemError messageRight angleComputer filePointer (computer programming)FunktionalanalysisCASE <Informatik>Lecture/ConferenceJSON
14:44
Queue (abstract data type)InformationQuery languageSource codeInjektivitätFunktionalanalysisVulnerability (computing)Multiplication signComputer animationLecture/Conference
15:32
Multiplication signWebsiteComputer animationLecture/Conference
15:48
Hacker (term)Content (media)Open sourceOpen setSoftware protection dongleVideoconferencingXMLJSONUML
Transcript: English(auto-generated)
00:00
My name is Eric Wasman. I'm here to talk about Minimew. Minimew is an emulator for x86, a process level emulator designed to protect binaries from memory corruption attacks.
00:20
So traditional stack smashing has become quite a bit more difficult in the recent years, luckily. So normally, you used to inject your own code in the program,
00:40
and then through a buffer overflow, overwrite a return pointer on the stack to point to your code instead of the code it would execute normally. And then it would execute your code, and done. Luckily, some things have changed. So address-based layer randomization
01:00
has made it more difficult to find your own code inside the address space. And the NX bit on modern processors have allowed operating systems to separate data from writeable data from executable data
01:25
so that you can't actually execute even if you jump to your own code. And also, there are some compiler flags
01:40
in recent times, such as Fortify Source and Stack Protector, which try to protect against buffer overflows. And if there are any package maintainers over here, please fortify all your packages, because it really helps.
02:03
But yeah, so this is still not enough. The address space layered randomization is often not random enough not to be brute forced.
02:20
And heap overflows, which is another type of memory corruption, it's very difficult to protect against those. And the NX bit can be circumvented
02:43
by, instead of jumping to your own code, jumping to useful code inside the original program itself. So you insert pointers to useful code in the application itself, and then use the code of the original application against itself.
03:05
But the situation is even worse, because all these protection mechanisms need to be enabled at compile time, and there is a lot of old code out there.
03:25
And even new packages sometimes don't apply these mechanisms. And there are also still some flaws
03:43
in how the mechanisms are implemented on some systems. So the question arises, can we do more? So if we see that data execution prevention prevents
04:00
your interested data from being run as code, we might say, yeah, and the counterattack is that instead of injecting code, we inject pointers to trusted code.
04:21
Untrusted pointers to trusted code, we might say, can we prevent pointers to code being used as jump addresses? So the answer is that we can do this through a technique called taint analysis.
04:43
And in taint analysis, we track what code is trusted or untrusted. And then whenever data gets copied,
05:00
you also copy the data of whether it's trusted or not. And if you combine two pieces of information, for example, when you add two integers, you can order trust information so that the least trusted piece of information
05:27
decides whether it's trusted or not. And when the code actually tries
05:40
to jump to an untrusted pointer, a check is done and an error is raised. So you can corrupt anything you like, but when you try to use it, it will blow up in your face.
06:02
And these systems already existed, but they're slow as hell. So often in order of magnitude, because you have to do all the copying and maintaining whether it's trusted or not. So we tried to see whether we could make it faster. So in Minimew, we used two tricks
06:24
to try to make the propagation of taints faster. First, we have a novel memory layout. So this is how you can view the Linux kernel memory layout on 32-bit.
06:41
So you have the high quarter of the address space is in use by the kernel. If you read or write from it, it will generate a fault. And then the rest is used by the user process. So we changed this layout a bit.
07:01
So we have two equally sized areas of memory. And one of the pieces can be used by the user process.
07:26
And the other one is used to hold the taint information. And then there's our third piece, which is used by the emulator itself for JIT translation code, because everything is translated.
07:41
So during execution, only the user and the taint pieces of memory can be accessed by the program. And because they are of equal size,
08:02
if we want to propagate a write to an address in the original memory, we can just add the number. And there we have the address of where the trust value of this address is in taint memory.
08:21
And because the way the memory is laid out, if the process tries to write into the taint memory, it will generate a fault. So it will detect it, at least. So if we want to access the taint memory,
08:41
we only have to add a constant. So the nice thing is that on Intel, you can add a constant to any addressing mode in the machine code instructions and have another addressing mode, which
09:01
accesses the same memory location but with an offset of a certain constant. So for example, for an instruction which accesses the memory at the edx register, which is a,
09:23
we can have an addressing mode which accesses edx plus some constant. And that holds for all memory addressing modes, even the more exotic ones.
09:41
So the second innovation is that we use SSE registers. So we emulate a processor without SSE registers. And then we use these addresses to keep the taint value for the general purpose registers
10:02
so that we don't have to store them in memory. So it's much faster. So this is how it's done. We use two SSE registers to hold the taint. And so we separate them in four columns. And yeah, so we can store the taint for all eight registers.
10:26
So if we want to, for example, add the taint from a memory location x to the edx register, we can do that very simply. We just identify the column where it's in.
10:43
And then through the SSE instructions, which are quite powerful, we can first load the taint from memory at location x plus the constant into memory, and then do an OR.
11:03
And then we have just two instructions. We have done the propagation for this instruction. And most instructions will need one or two extra instructions to propagate taint.
11:22
So we tested the effectiveness of this method against a lot of exploits. And in all of them, the exploits were caught. And it's important to know that the vulnerabilities that are caught don't have to be known beforehand.
11:42
So if there's an unknown vulnerability, you can catch it with this tool. We also did performance tests. So for IO-bound applications, the performance is quite reasonable, because there's not
12:01
a lot of number crunching. So most of the time, it's just copying data. And for that, the overhead isn't that much. If you want to do number crunching, such as encryption, the overhead seems to be almost three times as slow.
12:20
But that might still be useful for some applications. Your server might not be loaded at all. And you really want your data to be secure. And so you have to make a trade-off. We also tested it to very CPU-bound,
12:44
so very difficult for the system applications. And overall, it's 2.4 times as slow. And we tested it against also some real-time, real-life
13:01
applications. There are some limitations to the system. We don't prevent memory corruption. We only prevent when it's used. So that's the best we can do with this technology. But it's still quite useful, I think.
13:23
And there are also some corner cases where before you can execute your own code, it can find a point.
13:45
So you can write something to arbitrary memory. And if something is trusted, then you also have a problem. But this is just a very small subset of vulnerabilities
14:01
that are discovered. And there's also a limitation if you don't overwrite a function pointer, but just something else. For example, does this user have the right to read this file? Then it's also not caught because you never
14:37
check for whether the function pointer is trusted.
14:44
In some cases, we can also catch other vulnerabilities if we add hooks to certain functions. For example, to a MySQL query, we can have detailed information about which
15:00
data in the MySQL query is trusted or not. So we could have more information about SQL injections this way, because if you
15:20
find an AND or SQL keyword which is untrusted, then it's wrong. I don't think there's time, but the source is available at the website. We don't have time for questions,
15:41
but if anyone has questions.