We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Minemu: Protecting buggy software from memory corruption attacks

00:00

Formal Metadata

Title
Minemu: Protecting buggy software from memory corruption attacks
Title of Series
Number of Parts
84
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2012

Content Metadata

Subject Area
Genre
Abstract
Minemu: protecting buggy programs from memory corruption attacks Dynamic taint analysis is a powerful technique to detect memory corruption attacks. Yet with typical overheads of an order of magnitude, it is not something you would choose to deploy in any production environment. Minemu is a fast taint-tracking emulator for Linux which aims to be fast enough to be run on production systems. Minemu is a fast, process-based taint-tracking emulator for Linux (x86, 32bit). By keeping track of where untrusted data (such as data from the network) is copied to inside your program, and by subsequently checking whether this data is used to take control of the program, Minemu effectively protects against most memory corruption attacks, both for known and unknown vulnerabilities. Tracking the flow of untrusted data during the execution of a program is slow because we effectively have to do an extra memory operation for each original memory operation. However, by using a special memory layout and utilizing SSE registers, Minemu tries to keep the overhead to a minimum.
5
Thumbnail
15:38
9
Thumbnail
49:09
14
Thumbnail
15:13
15
Thumbnail
11:24
27
47
73
Thumbnail
50:11
80
Thumbnail
54:40
SoftwareRead-only memoryFreewareEmulatorLevel (video gaming)Process (computing)Dependent and independent variablesSemiconductor memoryBinary codeComputer animationLecture/Conference
BitCodeComputer animation
Buffer overflowPointer (computer programming)Point (geometry)Address spaceCodeComputer programmingBitOperator (mathematics)CoprocessorRandomizationNormal (geometry)Buffer solutionLecture/Conference
Physical systemBuffer overflowCodeSoftware maintenanceSource codeStack (abstract data type)Multiplication signFlagCompilerMathematical optimizationSpacetimeInterior (topology)RandomizationComputer animationLecture/Conference
Buffer overflowStack (abstract data type)Address spaceBitForcing (mathematics)Semiconductor memoryMemory managementRandomizationType theoryBuffer overflowHost Identity ProtocolComputer animation
Inflection pointHydraulic jumpCodePointer (computer programming)Multiplication signPhysical systemComputer programmingCartesian coordinate systemMechanism designInjektivitätComputer animationLecture/Conference
Mathematical analysisAddress spaceMathematical analysisCodeComputer animationLecture/ConferenceJSON
CAN busInformationPointer (computer programming)Physical systemCodeError messageIntegerOrder (biology)Computer animation
Video trackingPhysical systemOrder (biology)OrbitOrder of magnitudeComputer animation
Read-only memoryFundamental theorem of algebraPropagatorSemiconductor memoryComputer animation
Semiconductor memoryKernel (computing)Kernel (computing)32-bitAddress spaceProcess (computing)BitMathematicsDiagramComputer animationLecture/Conference
Semiconductor memoryProcess (computing)Semiconductor memoryAreaEmulatorCodeTranslation (relic)InformationJust-in-Time-CompilerDiagramComputer animation
Semiconductor memoryComputer programmingSemiconductor memoryAddress spaceDiagram
Semiconductor memoryRead-only memoryAcoustic shadowAddress spaceSemiconductor memoryAddress spaceAsynchronous Transfer ModeProcess (computing)1 (number)SpeicheradresseNumberMachine codeLogical constantCoprocessorComputer animationLecture/Conference
Fundamental theorem of algebraRead-only memoryAddress spaceSemiconductor memoryComputer animation
Sound effectUniform resource locatorPropagatorSpeicheradresseSemiconductor memoryExploit (computer security)Computer animationLecture/Conference
Memory managementDressing (medical)Vulnerability (computing)Data typeString (computer science)Vulnerability (computing)Exploit (computer security)Software testingCartesian coordinate systemBound stateComputer animation
Overhead (computing)Server (computing)Cartesian coordinate systemNumberMultiplication signEncryptionBefehlsprozessorLecture/Conference
Interior (topology)Cartesian coordinate systemPhysical systemReal numberReal-time operating systemVideo gameMultiplication signLecture/Conference
Limit (category theory)Physical systemCASE <Informatik>Semiconductor memorySubsetPoint (geometry)Vulnerability (computing)FunktionalanalysisMaizePointer (computer programming)CodeComputer animationLecture/Conference
Control flowWorkstation <Musikinstrument>Exploit (computer security)Physical systemError messageRight angleComputer filePointer (computer programming)FunktionalanalysisCASE <Informatik>Lecture/ConferenceJSON
Queue (abstract data type)InformationQuery languageSource codeInjektivitätFunktionalanalysisVulnerability (computing)Multiplication signComputer animationLecture/Conference
Multiplication signWebsiteComputer animationLecture/Conference
Hacker (term)Content (media)Open sourceOpen setSoftware protection dongleVideoconferencingXMLJSONUML
Transcript: English(auto-generated)
My name is Eric Wasman. I'm here to talk about Minimew. Minimew is an emulator for x86, a process level emulator designed to protect binaries from memory corruption attacks.
So traditional stack smashing has become quite a bit more difficult in the recent years, luckily. So normally, you used to inject your own code in the program,
and then through a buffer overflow, overwrite a return pointer on the stack to point to your code instead of the code it would execute normally. And then it would execute your code, and done. Luckily, some things have changed. So address-based layer randomization
has made it more difficult to find your own code inside the address space. And the NX bit on modern processors have allowed operating systems to separate data from writeable data from executable data
so that you can't actually execute even if you jump to your own code. And also, there are some compiler flags
in recent times, such as Fortify Source and Stack Protector, which try to protect against buffer overflows. And if there are any package maintainers over here, please fortify all your packages, because it really helps.
But yeah, so this is still not enough. The address space layered randomization is often not random enough not to be brute forced.
And heap overflows, which is another type of memory corruption, it's very difficult to protect against those. And the NX bit can be circumvented
by, instead of jumping to your own code, jumping to useful code inside the original program itself. So you insert pointers to useful code in the application itself, and then use the code of the original application against itself.
But the situation is even worse, because all these protection mechanisms need to be enabled at compile time, and there is a lot of old code out there.
And even new packages sometimes don't apply these mechanisms. And there are also still some flaws
in how the mechanisms are implemented on some systems. So the question arises, can we do more? So if we see that data execution prevention prevents
your interested data from being run as code, we might say, yeah, and the counterattack is that instead of injecting code, we inject pointers to trusted code.
Untrusted pointers to trusted code, we might say, can we prevent pointers to code being used as jump addresses? So the answer is that we can do this through a technique called taint analysis.
And in taint analysis, we track what code is trusted or untrusted. And then whenever data gets copied,
you also copy the data of whether it's trusted or not. And if you combine two pieces of information, for example, when you add two integers, you can order trust information so that the least trusted piece of information
decides whether it's trusted or not. And when the code actually tries
to jump to an untrusted pointer, a check is done and an error is raised. So you can corrupt anything you like, but when you try to use it, it will blow up in your face.
And these systems already existed, but they're slow as hell. So often in order of magnitude, because you have to do all the copying and maintaining whether it's trusted or not. So we tried to see whether we could make it faster. So in Minimew, we used two tricks
to try to make the propagation of taints faster. First, we have a novel memory layout. So this is how you can view the Linux kernel memory layout on 32-bit.
So you have the high quarter of the address space is in use by the kernel. If you read or write from it, it will generate a fault. And then the rest is used by the user process. So we changed this layout a bit.
So we have two equally sized areas of memory. And one of the pieces can be used by the user process.
And the other one is used to hold the taint information. And then there's our third piece, which is used by the emulator itself for JIT translation code, because everything is translated.
So during execution, only the user and the taint pieces of memory can be accessed by the program. And because they are of equal size,
if we want to propagate a write to an address in the original memory, we can just add the number. And there we have the address of where the trust value of this address is in taint memory.
And because the way the memory is laid out, if the process tries to write into the taint memory, it will generate a fault. So it will detect it, at least. So if we want to access the taint memory,
we only have to add a constant. So the nice thing is that on Intel, you can add a constant to any addressing mode in the machine code instructions and have another addressing mode, which
accesses the same memory location but with an offset of a certain constant. So for example, for an instruction which accesses the memory at the edx register, which is a,
we can have an addressing mode which accesses edx plus some constant. And that holds for all memory addressing modes, even the more exotic ones.
So the second innovation is that we use SSE registers. So we emulate a processor without SSE registers. And then we use these addresses to keep the taint value for the general purpose registers
so that we don't have to store them in memory. So it's much faster. So this is how it's done. We use two SSE registers to hold the taint. And so we separate them in four columns. And yeah, so we can store the taint for all eight registers.
So if we want to, for example, add the taint from a memory location x to the edx register, we can do that very simply. We just identify the column where it's in.
And then through the SSE instructions, which are quite powerful, we can first load the taint from memory at location x plus the constant into memory, and then do an OR.
And then we have just two instructions. We have done the propagation for this instruction. And most instructions will need one or two extra instructions to propagate taint.
So we tested the effectiveness of this method against a lot of exploits. And in all of them, the exploits were caught. And it's important to know that the vulnerabilities that are caught don't have to be known beforehand.
So if there's an unknown vulnerability, you can catch it with this tool. We also did performance tests. So for IO-bound applications, the performance is quite reasonable, because there's not
a lot of number crunching. So most of the time, it's just copying data. And for that, the overhead isn't that much. If you want to do number crunching, such as encryption, the overhead seems to be almost three times as slow.
But that might still be useful for some applications. Your server might not be loaded at all. And you really want your data to be secure. And so you have to make a trade-off. We also tested it to very CPU-bound,
so very difficult for the system applications. And overall, it's 2.4 times as slow. And we tested it against also some real-time, real-life
applications. There are some limitations to the system. We don't prevent memory corruption. We only prevent when it's used. So that's the best we can do with this technology. But it's still quite useful, I think.
And there are also some corner cases where before you can execute your own code, it can find a point.
So you can write something to arbitrary memory. And if something is trusted, then you also have a problem. But this is just a very small subset of vulnerabilities
that are discovered. And there's also a limitation if you don't overwrite a function pointer, but just something else. For example, does this user have the right to read this file? Then it's also not caught because you never
check for whether the function pointer is trusted.
In some cases, we can also catch other vulnerabilities if we add hooks to certain functions. For example, to a MySQL query, we can have detailed information about which
data in the MySQL query is trusted or not. So we could have more information about SQL injections this way, because if you
find an AND or SQL keyword which is untrusted, then it's wrong. I don't think there's time, but the source is available at the website. We don't have time for questions,
but if anyone has questions.