Insecure coding in C (and C++)
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 170 | |
Autor | ||
Lizenz | CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben | |
Identifikatoren | 10.5446/50854 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
00:00
SoftwareentwicklerExpertensystemCoxeter-GruppeComputersicherheitSoftwareentwicklungMaschinenspracheNabel <Mathematik>RandomisierungMechanismus-Design-TheorieAdressraumProgrammAssemblerCodierungstheorieMAPExpertensystemMaschinenspracheUmwandlungsenthalpieComputersicherheitGesetz <Physik>HackerProgrammierungHardwareMAPSoftwareentwicklungSoftwareentwicklerVirtuelle MaschineDatenflussAbgeschlossene MengeTest-First-AnsatzAssemblerSchreiben <Datenverarbeitung>RechenschieberComputeranimation
02:21
Virtuelle MaschineKernel <Informatik>Keller <Informatik>PufferspeicherRahmenproblemRandomisierungSpeicheradresseAdressraumExploitInjektivitätMaschinenspracheNabel <Mathematik>SoftwareentwicklungComputersicherheitInformationBinärdatenImplementierungFunktion <Mathematik>OvalEin-AusgabeVariableROM <Informatik>Exogene VariableStandardabweichungMAPOrdnung <Mathematik>MaßerweiterungAbgeschlossene MengeFunktionale ProgrammierungRechter WinkelMechanismus-Design-TheorieHackerMaschinenspracheGlobale OptimierungMultiplikationsoperatorCompilerPuffer <Netzplantechnik>VollständigkeitExogene VariableSampler <Musikinstrument>TypentheorieKeller <Informatik>StandardabweichungKreisbogenInformationProgrammierspracheSchreiben <Datenverarbeitung>ProgrammierungGüte der AnpassungSoftwareentwicklungPatch <Software>RückkopplungRahmenproblemPunktPufferüberlaufSystemprogrammierungVirtuelle MaschineSystemaufrufNeuroinformatikFreewareZweiunddreißig BitProgrammfehlerNichtlinearer OperatorProgrammiergerätKernel <Informatik>NetzbetriebssystemRauschenMathematikInjektivitätAggregatzustandSchlüsselverwaltungComputersicherheitDatenfeldMereologieExploitInstantiierungZellularer AutomatKreisflächeSichtenkonzeptCASE <Informatik>Objektorientierte ProgrammierungInternetworkingAdressraumVariableWort <Informatik>Computeranimation
11:31
Keller <Informatik>Exogene VariablePufferspeicherROM <Informatik>VariableFunktion <Mathematik>StandardabweichungVirtuelle MaschineOvalBootenImplementierungSpeicheradresseStrom <Mathematik>Zeiger <Informatik>ZeichenketteE-MailBefehl <Informatik>ParametersystemStatistikAdressraumStellenringGruppoidCompilerObjektorientierte ProgrammierungInformationsspeicherungEin-AusgabeGeradeStreaming <Kommunikationstechnik>FehlermeldungMaschinencodeSelbstrepräsentationBenutzerfreundlichkeitAssemblerSpeicheradresseMaschinenspracheExpertensystemHalbleiterspeicherProgrammiergerätCompilerGüte der AnpassungAuthentifikationVariableStellenringSelbstrepräsentationZeiger <Informatik>ProgrammbibliothekCASE <Informatik>LaufzeitfehlerRichtungKeller <Informatik>RahmenproblemOrdnung <Mathematik>Puffer <Netzplantechnik>SystemaufrufVirtuelle MaschineAlgorithmische ProgrammierspracheBefehlsprozessorZahlenbereichAdressraumSoftwareentwicklungMultiplikationsoperatorNetzbetriebssystemWellenpaketWechselsprungZeichenketteGamecontrollerSpeicherverwaltungWort <Informatik>InformationsspeicherungObjektorientierte ProgrammierungBitStandardabweichungHochdruckPunktGlobale OptimierungExogene VariableFlächeninhaltWeb logGeradeArithmetische FolgeMereologieWeb SitePRINCE2Ein-AusgabeStrukturierte ProgrammierungURLSpeicherbereinigungGenerator <Informatik>MAPComputeranimation
20:26
BenutzerfreundlichkeitBootenSpeicheradresseZeiger <Informatik>Exogene VariableStrebeSCI <Informatik>ImplementierungGruppoidVariableKeller <Informatik>WechselsprungInjektivitätZeichenketteStandardabweichungFunktion <Mathematik>Wort <Informatik>MaschinenspracheNabel <Mathematik>ROM <Informatik>ExpertensystemRechenschieberGewicht <Ausgleichsrechnung>SystemprogrammierungMechanismus-Design-TheorieOperations ResearchHardwarep-BlockStrategisches SpielStochastische AbhängigkeitAdressraumMaschinenspracheExploitKreisbogenSoundverarbeitungSoftwareentwicklungInjektivitätSystemprogrammierungCASE <Informatik>VollständigkeitHalbleiterspeicherNichtlinearer OperatorSkriptspracheZeichenketteSpeicheradresseWechselsprungMultiplikationsoperatorTypentheorieFunktionale ProgrammierungVIC 20Quantisierung <Physik>NetzbetriebssystemProgrammbibliothekWeb-SeiteRegulärer GraphSechseckURLDatenstrukturZahlenbereichMereologieMechanismus-Design-TheorieVirtuelle MaschineHidden-Markov-ModellCompilerAdressraumStrukturierte ProgrammierungWurm <Informatik>GruppenoperationVererbungshierarchieHochdruckRechter WinkelPunktHackerEinfach zusammenhängender RaumStreaming <Kommunikationstechnik>MehrrechnersystemExpertensystemStrategisches SpielComputeranimation
29:21
Stochastische AbhängigkeitMaschinenspracheSpeicheradresseAdressraumZeichenketteSystemprogrammierungInjektivitätMechanismus-Design-TheorieVirtuelle MaschineMathematikParametersystemHackerSteuerwerkPi <Zahl>Persönliche IdentifikationsnummerÜbersetzer <Informatik>PufferspeicherWort <Informatik>RuhmasseFunktion <Mathematik>Keller <Informatik>MIDI <Musikelektronik>SummierbarkeitGruppenoperationVariableDivisionNo-Free-Lunch-TheoremLokales MinimumÄhnlichkeitsgeometrieProxy ServerBus <Informatik>Radikal <Mathematik>OrtsoperatorKette <Mathematik>Funktionale ProgrammierungÄhnlichkeitsgeometrieSpeicheradresseNichtlinearer OperatorBitInklusion <Mathematik>SoftwareentwicklungBinärcodeFigurierte ZahlKeller <Informatik>Mechanismus-Design-TheorieProgrammierungPunktInformationWechselsprungSampler <Musikinstrument>HalbleiterspeicherComputersicherheitVariableProzess <Informatik>OktaederTropfenZweiPufferüberlaufSystemprogrammierungStrategisches SpielZeiger <Informatik>Puffer <Netzplantechnik>ResultanteCASE <Informatik>VektorpotenzialNeuroinformatikExogene VariableVirtuelle MaschineSchnittmengeAggregatzustandAssemblerARM <Computerarchitektur>Ordnung <Mathematik>VollständigkeitBinärdatenTuring-TestMaschinenspracheReelle ZahlStochastische AbhängigkeitDefaultSoftwareschwachstelleLeistung <Physik>BootenKernel <Informatik>Wurm <Informatik>Computeranimation
38:16
ÄhnlichkeitsgeometrieSystemprogrammierungSoftwareentwicklungProxy ServerWechselsprungVirtuelle MaschineSoftwareentwicklungExploitProgrammierungComputersicherheitVorzeichen <Mathematik>ZeichenketteMultiplikationsoperatorAbstimmung <Frequenz>Computeranimation
39:49
ImplementierungWort <Informatik>BitProgrammierumgebungNo-Free-Lunch-TheoremOvalGlobale OptimierungFunktion <Mathematik>MaschinenspracheComputerBitKnoten <Statik>Prozess <Informatik>MathematikCompilerSummierbarkeitRechter WinkelMultiplikationsoperatorMaschinenspracheSystemprogrammierungFigurierte ZahlPOKEPufferüberlaufQuantisierung <Physik>CASE <Informatik>Physikalische TheorieFunktionale ProgrammierungPunktVirtuelle MaschineVersionsverwaltungHardwareLeckVerschlingungFahne <Mathematik>NeuroinformatikSchlussregelGrenzschichtablösungBasis <Mathematik>ZahlenbereichSoftwareentwicklerUnternehmensarchitekturSchreiben <Datenverarbeitung>DrucksondierungSoftwareentwicklungZweiEndliche ModelltheorieAkkumulator <Informatik>Lateinisches QuadratAggregatzustandZirkel <Instrument>Zeiger <Informatik>ÄhnlichkeitsgeometrieSampler <Musikinstrument>PasswortBinärcodeSchnittmengeZeichenketteMehrrechnersystemPuffer <Netzplantechnik>Ganze ZahlHalbleiterspeicherRechenbuchGlobale OptimierungSelbstrepräsentationStichprobenumfangVollständigkeitKartesische KoordinatenAdditionInformationComputersicherheitAssemblerLoopRadikal <Mathematik>KonditionszahlWhiteboardProgrammierspracheDifferenteQuellcodeSystemaufrufOrdnung <Mathematik>ProgrammfehlerMechanismus-Design-TheorieProgrammierungSoundverarbeitungAusnahmebehandlungGesetz <Physik>Abstimmung <Frequenz>IntegralNetzbetriebssystemComputervirusWeb-SeiteSpeicheradresseToter WinkelRandomisierungFolge <Mathematik>QuantencomputerProgrammierumgebungMomentenproblemWorkstation <Musikinstrument>ResultanteGebundener ZustandGeradeLesezeichen <Internet>Automatische HandlungsplanungSoftwareGrundsätze ordnungsmäßiger DatenverarbeitungModulare ProgrammierungArithmetischer AusdruckZusammenhängender GraphLesen <Datenverarbeitung>ProgrammbibliothekInnerer PunktValiditätBinärdatenGüte der AnpassungEin-AusgabeProgrammiergerätPatch <Software>AuthentifikationFirmwareKeller <Informatik>DoS-AttackeLeistungsbewertungNichtlinearer OperatorArithmetisches MittelElektronische PublikationStützpunkt <Mathematik>Kernel <Informatik>Dämon <Informatik>Generator <Informatik>
Transkript: Englisch(automatisch erzeugt)
00:03
So welcome to this session about insecure coding in C and C++. First of all, I have to make a small disclaimer because it's not that much C++ here. Everything I'm talking about is also relevant for C++, but there are few or actually
00:27
nothing that is specific for C++ per se, and that's quite typical when you walk down in the stack and close to the hardware. The other thing is that I kind of promised lots of assembler, but when I compare it to
00:44
a few other talks that I've done, for example, test-driven development in assembler, and another talk I did last year, a 90-minute talk where I think 60-70% of the slides were in assembler. I have to admit it's not a lot of assembler, but it's enough.
01:03
So some assembler in this talk. But perhaps more important, in the program it's impossible to see what kind of level talks are at, because it's not written in the program. And when I sent in this proposal, I deliberately set it to introduction and a general kind
01:23
of overview of how to do insecure coding in C and C++. So if there are any deep experts in kind of hacking programs on machine code level, you might be disappointed. It's not an expert talk. It's not even advanced.
01:40
It is a general introduction to a lot of concepts that are useful to know, both if you want to write insecure code, but also if you want to write secure code. So it's an introduction talk. But just checking. Is there any kind of hackers in the room that really know this stuff?
02:04
Okay. Just anyone that know about code injection? Yeah. A few people. Anyone that have done it in the last ten years? Because everybody did it when they were kids, but in the last ten years? No? Okay. Not so many.
02:22
But that's good, because then it's nice to have this introduction level. So being stupid is a privilege to some extent, but you can also do dumb things intentionally. And you have to admit that these guys, these guys, they do actually know a few things in
02:46
order to do so many stupid things that they do. And I hope this talk will also focus on knowledge-based ways of doing stupid things in C++. So there is nothing special with my machine when I'm referring to it, apart from it's
03:06
just a 32-bit Linux kernel with a fairly recent compiler and a fairly recent Ubuntu distro. I wrote the main of this talk in March.
03:20
So if I wrote the main of this talk now, I would have just upgraded it. But there's nothing special in there. So that's not rare to look for faults, et cetera. This is something that you can do on all types of machines and operating systems and compilers that you find out there.
03:41
So in this talk, I will briefly discuss the following topics. Stack buffer overflow, also called stack smashing. How the call stack and activation frame is working. I just have to go through that, even if most people have a reasonable understanding of how it's working.
04:00
Most C programs have a reasonable understanding of how it's working. I need to go through it because I will refer to a few of those things later when I explain how to write exploits, how to do arc injection, code injection, how a few protection mechanisms are working. There is some feedback in the audio.
04:23
I'm getting some feedback in the audio. OK. Probably good now. And then some protection mechanisms like ASLR, stack canaries, and a fancy mechanism called return-oriented programming. Who knows about return-oriented programming?
04:43
OK, three, four people who have not only heard about it but actually read about it and studied it a bit. Nobody. Good. Because I don't know very much about it. I wasn't sure if I was going to include it because while I think I understand it, I don't understand it well enough to explain it really well.
05:02
So I've just done a very brief explanation of return-oriented programming. Then I'll show some things about how to write code with surprising behavior, talk about layered security, information leakage, how to patch binaries, and in the end I will do a summary where I summarize a few tricks for writing insecure code.
05:24
All of this in 60 minutes, or actually 55. So we better get started. But now we understand it's an overview. It's not going deep into anything, really. But take a look at this code first.
05:44
It's a small program. Of course, it's a contrived example. I'm using it just to illustrate a few other concepts. So I had to make some stupid stuff in there. And of course, the key thing here is the use of gets and the buffer, the response buffer.
06:06
That is what we're going to have fun with. And I guess we all know that gets is a function that you should never use. And it has been removed from the language now. But it's still a nice function to have there when you want to kind of explain in
06:25
a simple way how stack-smashing can be done. There are so many other ways you can do it. So don't be confused by me using gets in this example, because there are hundreds of other ways you can do exactly the same thing.
06:42
Of course, it doesn't use gets, so that's good, because it has been removed from the language. But for now, we are going to use it. Let's see what happens when you try to execute this code. So it was supposed to work like this.
07:01
When you run the program, it's asking you for a secret. And if you type in your name, which name should I type in? Anyone remember the movie? Oh, you're close.
07:21
But the hacker was named David. So you try with David first. And David is not the keyword. So access is denied. And the operation completes, nothing has happened, okay? So that was how it was supposed to work if you typed the wrong secret. But if you went into it and typed the right secret, Joshua, the name of the professor,
07:48
then access was granted to the system, and two missiles were launched, operation complete. And now we are going to look at kind of the simplest way of exploiting this.
08:02
Because if we type in a very long string, bad things will happen. And in this particular case, on my machine, I got access granted, even though through
08:21
comp didn't work, and it was launching a few missiles, and operation was complete. Did anyone see something strange here?
08:42
Yeah? We got both access granted and access denied at the same time. So not how we only kind of launched how many millions of missiles, but we have also seemed to create a program, kind of messed up the program in such a way that it becomes
09:04
unstable and seemed to both pick this path and this path at the same time. Now, for those of you who have seen some of my previous talk last year, you might remember why it is like that, I will come back to it and explain it.
09:24
But first of all, I will look at this one, why did we get so many missiles. And in order to understand, to kind of explain that phenomenon, we need to understand how the stack is working and what is happening when the program is executing.
09:45
So due to an overflow, what has happened is that we have messed up, changed the value of allow access, and we have changed the value of end missiles. And this is the strange thing that we will look at later.
10:05
So what we just saw now is what is typically called stack buffer overflow, and sometimes stack smashing. There was a famous paper that came out in the 90s called stack smashing for firm and profit, and that in some way changed the computer industry.
10:23
Because suddenly a lot of attention came into this point that we need to kind of write bug free code and certainly not allow stack buffer overflows because there are so many things we can do.
10:42
And while it is common to hear C and C++ programmers discuss with each other and say, oh, this is a stack variable, and this is on the stack, and here's a call stack and activation frame, et cetera, it's important to know that the standard never says anything about the stack. So although it's very common for a C program and C++ program to actually use a call stack
11:06
and activation frames when executing, this is not something that is mandated by the standard. So if the compiler or the optimizer can create a similar behavior without using a stack, it is allowed to do so.
11:22
So this depends very much on the optimization level. But from a conceptual point of view, it's reasonable to think about an execution stack that works approximately like I'm just going to describe now. It's not while I've been doing this on the Linux on an Intel CPU, this is approximately
11:43
the same as all typical CPUs are working. So for what happens here when we start a program is the operating system is loading the program into memory, and then it's passing control to usually something called start,
12:04
which is often in a C runtime library. And inside of start, it's not doing very much. It's setting up the call stack, maybe initializing some variables, preparing for a dynamic memory allocation, et cetera. But very soon, it will jump into main, and once inside of main, it will start executing
12:34
the assembly instructions that are represented, that is a representation of this code.
12:42
So the call stack has been set up before we enter main, and it's useful to draw it like this, high address low and low address on top, because nearly all call stacks, they grow from high address up to the low address. So if you try to draw it the other way around, it's difficult to reason about it,
13:05
at least sometimes. So the first thing that happens when it's jumping into main is that the start, the runtime start library has pushed the return address, kind of the next instruction that
13:24
is supposed to be executed when main is finished. And so this is coming in here, the return address, the next instruction in start. And once inside of main, main will then set up its own activation frame by putting
13:45
a pointer to the previous stack frame so that it can be restored. And now it has its own activation frame that it can use. So when it's supposed to execute puts, it will first put the pointer to the string
14:07
onto the stack, and then jump into and then the return address, the next instruction in main that should be executed, which is this one. And then it will make the jump into puts that can do whatever it wants, but it's
14:27
usually just building another activation frame and behaving like others, but it doesn't have to do that. And when puts is finished, these things go away, kind of like garbage collection, not
14:43
in use anymore. And before calling authenticate and launch, the same procedure happens again. We push the return address of the next instruction, which is here, and then it jumps into authenticate and launch, and then restore the pointer to the previous
15:05
stack frame, and then allocate space for these local variables. And local variables or stack variables is not the very exact name. So if you wonder how this is working and you use those inexact names, you typically
15:22
end up in discussions between people that is not necessarily experts in the topic. So if you really want to go to the places where they discuss how this is really working, you should use variables or objects with automatic storage duration, because that is the word that is used in the standard.
15:41
And it's in some way a much better word, because it doesn't indicate that this is going on the stack. But now you see we have good stuff here, allow access and missile response. Anyone see something that might be a bit strange there?
16:00
The order? Yeah. So I'm not trying to pick on you there, but thanks for playing. It's very common for programmers to believe that they have a correct idea about how things are laid out on the stack. But there is no reason why programmers should be able to reason about that, because as
16:26
soon as you increase optimization level, for example, things will be rearranged, some things will be stored on stack, something will never get actually a physical memory or whatever. And these kind of rearranging, the order is very typical for all compilers, they
16:43
do that all the time. So there is no correct order in this particular case. But this is what happened, exactly this happened on my machine when I was executing it. And then we go into print. We just follow the same thing, point it to the string, save the return address, save
17:07
the previous stack frame, print can do whatever it wants, and then we get into gets, which is the culprit here. Now, the reason why we get this problem is, of course, gets doesn't have any idea
17:24
about how many characters it's allowed to write. So depending on how many you're writing, it's just going to continue poking stuff into memory. And any inputs, eight characters or more, will cause a problem. And this is exactly the stack data that I got when I executed this on my machine.
17:46
And most of this stack is basically just padding to make sure that things are aligned properly in memory, so it doesn't really matter what kind of values are there. But we recognize, with some training at least, you look at the stack frame and the stack
18:04
data and you start recognizing that this is the return address, this is the pointer to the stack frame, this is allow access and missiles response, and pointer to the response buffer. So by focusing on the stack variables per se, this is what happened when I typed in global
18:25
thermal nuclear war, okay? And now we partly have an explanation for why we got, how much is it? One billion missiles.
18:42
Because this number is 1.8 billion. But we are also close to see why this happened, access granted and access denied.
19:01
And it's because of this one, the 6C here. And the first time I saw this I was kind of surprised, I think it was two years ago when GCC 4.7.1 came out, it was the first time I saw this particular issue.
19:30
But I studied the assembler code, it was blogged by Mark Schroyer first and I continued in his direction and I studied the assembler code and what I saw was that GCC and my
19:45
compiler in this particular case basically said that a bool is always either 0 or 1 and never anything else. Internally in memory.
20:01
So if someone messes up so that it's neither 0 or 1, you get this behavior. This is pseudo-assembler of the C code. Because when you read the assembler, you see that in order to decide whether you should do access granted or not, it basically says, allow access not 0, then I'm going to
20:26
allow grant access and the next thing in the assembler code is that if allow access is not 1, then I'm going to deny access. And if for some reason allow access becomes anything but 0 or 1, you get this kind of
20:43
quantum behavior in your code. And this is not limited to bool. This happens all the time when you mess up the internal data structures inside of your program. So by allowing in some way memory overwrites, you can also get this very, very strange
21:05
and surprising behavior like this. So going back to our understanding of how the stack is working on my particular machine,
21:20
we also see that now we can write a small script, printf Unix command where we basically just send in eight characters because we don't care anyway and then we type in 2A and put that into n missiles and then we put in 1 to allow our self-access.
21:44
And that gives us launching 42 missiles and operation complete. And now we have a way of controlling the program. But using scripts like this is not a very effective way of exploiting. So it's perhaps more common to see exploits, for example, written in C or Python, for
22:08
example, that are in this case building up a structure that is exactly the same as the stack on that particular machine. And now we can programmatically say I want allow access to be true, I want n missiles
22:24
to be 42, and then I'm just writing this into the place where I can kind of put the payload or this is not really a payload, but where I can put the data and really control how the program is working.
22:44
Yeah. So now we see we have a programmatical way to do it. But perhaps we also see a new opportunity now.
23:01
What about this one? This was the return address. This is the place where it's supposed to jump when it's finished with a function. Hmm. What would happen now if we write an exploit more like this? We still we increase our struct because we are going to map a bigger part of the stack
23:25
on the target. So we still allow access. We launch three missiles. But now we are poking also in the particular address for where this function is supposed to return. So it's we're not we're saying I don't want you to go necessarily to put operation
23:46
complete when you're finished. I want you to do something else. And in this case, this particular address happened to be pointing to the beginning
24:02
of authentic get and launch again. And now I have a way to I've changed the path. The execution path of the program in such a way that it's basically launching more and more missiles. And just for fun, I'm increasing the number of missiles every time.
24:21
So four, five, six, seven, eight, nine missiles should be launched. And this is called arc injection. It's not very often used like this going back to your own function. What it's typically more used for is to jump into one of the library routines.
24:41
So, for example, first you push the address to a string that you have poked into memory somewhere. And then you change the return address and jump into a libc function, for example, system. And suddenly you can invoke Unix commands inside of the program.
25:02
And that's the reason why arc injection is often called return to libc, because that's where it's mostly used to jump into libc. Any questions around that? No? Feel free to stop me during the talk.
25:25
So now we have looked at the arc injection and return to libc, which is a very common way of exploiting a program. But we are not limited to that, because we can also let the compiler generate another
25:47
compiler, perhaps generate some values in a small program that we can put on the stack. And in this case, I've just written a small function and compiled it with GCC.
26:07
I already get all kind of the hex values that I need to know, so that now I can take this, create a string with these hex values, and I can write it on the stack, and I can jump to that location in memory and start executing code.
26:27
So this is another way of doing arc injection. You inject some code first, and then you jump to that code. Sometimes it's difficult to kind of calculate exactly where you should start, so it's very common to use a nop slide, where you have thousands and thousands of no operation
26:46
nops first, so that it doesn't really matter where you are hitting, because you will hit somewhere, and then the execution will just do no, no, no, no, no, and then suddenly execute your code.
27:00
And that's what is called a nop slide. So, can I demonstrate code injection? Well, it used to be very easy to do so back in the old days, so that's the reason why I said in the last ten years, because while it was easy to do it on the Commodore 64
27:21
and the early 888 machines, etc., early PCs, for the last ten years, there has been this annoying data execution protection mechanism, which is using a write or execute strategy, sometimes called an Xbit, where it basically says, when a program or operating system
27:43
is asking for a page of memory, it has to say, is this going to be data, or is it going to be executable code? And it cannot be both. And this mechanism is very effective to prevent the possibility of executing code that is
28:04
stored in the data segment and not in the code segment or code page. But it's not the only protection mechanism out there that is useful to know about.
28:21
There are plenty of them, but they are often easy to turn off. And one of them is called address space layout randomization, and it also became common on major operating systems like, I'm not sure, five, six, seven years ago.
28:44
So, now all the regular operating systems tend to implement it. And that basically says that, before, we had the situation where, when you execute to the program, it tends to always go into the same place in memory.
29:04
All the addresses were the same. So, the stack was at the same place, the global, the functions were approximately at the same place, and also the library routines were at the same place. But with ASLR enabled, then every time the operating system is loading a program into memory,
29:28
it tends to put them in different places. So, it becomes slightly more difficult to guess where things are in memory. So, it's more difficult to kind of jump into a particular function, etc.
29:41
Now, this is considered to be a very kind of minor obstacle because there are so many ways you can do information leakage. You can first get the information on where things are stored in memory, and then you can calculate exactly where it needs to be. So, while it's slightly annoying, it's not considered to be a very, very strong mechanism.
30:03
And also, in order for ASLR to work, you need to have compiled with position independent code, and position independent executable. And this is not by default in typical compilers. So, you have to explicitly say, I want position independent code for ASLR to work.
30:26
And often people are not willing to pay that 10% extra, well, 10-ish percent extra price of having position independent code. So, but there are many ways of disabling it.
30:41
You can even, you can poke stuff into your kernel where you basically say, no, I don't want ASLR anymore. You can boot up your machine without ASLR. And of course, you can make sure that your code is not position independent.
31:03
Here is another mechanism which is called stack protector, often. And that one is injecting a so-called stack canary. Here is the stack canary.
31:21
So, the body of the function is the same, but in the preamble, just before executing the body, it's poking supposed to be a random value into a memory location, and XORing that one. And then when it's exiting the function, it's checking, has this canary died?
31:47
Has it changed? And if it's changed, then it's just going to, the program is going to terminate. So, at least it doesn't continue in an invalid state. And this is enabled by doing minus F stack protector, which is often default on modern compilers,
32:03
because this is not a very costly operation, because the stack canary only comes into functions that actually do have a potential for doing buffer overflows. So, it's not something that, it's not the cost you have to pay for every function.
32:20
But not only does it put in this magic value, but it also moves around, often it moves around the variables. So, in this case, the response buffer is at a higher address than it was before. And that is nice, because it means if you are overwriting it, you can't, it's difficult to change allow access and then miss out.
32:49
And this is a deliberate strategy that they are using, putting that kind of vulnerable buffer on a high address. And some people have actually suggested that we can solve all the security issues if we start making our execution stack grow up instead of down.
33:09
I never understood that proposal, and it seems like there is a, it's not a very popular suggestion out there. But it's interesting thought though, that you can, the position of your stack variables depends on how easy it is to exploit the program.
33:34
But finally, when you have, of the protection mechanisms, we have this data execution prevention, stack protectors, ASLR and similar techniques,
33:47
they certainly make it difficult to hack into a system. But there is a very powerful exploit technique called return-oriented programming that has become very popular in the last few years. I think the first paper around ROP came out in 2006, 2007, but it became common to use it only a few years ago.
34:11
And according to someone that knows this much better than myself, ROP is used in nearly every single exploit that you hear about these days.
34:22
And it's a very powerful technique, and it's apparently impossible to stop it. And the reason is how a machine is working and how we are executing code. I take my binary now, which is called launch, and I just dump the data on that binary file.
34:49
And by studying that data, I start recognizing a few things here that is very useful for ROP. Anyone know what these are? C6?
35:02
Not loop? Oh, it's not operation either. It can be used for jump, that's the point. It's actually the RET instruction. It's a RET instruction, and there are other useful instructions out there that you can use,
35:22
but the nice thing about RET is that it's reading from stack and address. And moving it into the EIP, the instruction pointer. And in that way, you can build up a lot of addresses on the stack, and when you start unwinding the stack,
35:50
doing returns, it will just pick another return address and jump around. And I'll show you these things. You usually have programs, and this is called ROP gadget, there are millions of those programs out there,
36:02
which can look at the binary and then figure out, here are all the small kind of bits and pieces of code that is unintentionally in your binary. For example, if I want to, just to show you a very simple example, just before I return back to main,
36:24
for some reason, I want to increment ECX. I just want to run this small assembly instruction, ECX, and then return. Then I can put this address onto stack first, and then have the real return address afterwards.
36:46
And then it will first jump there, run inc ECX, then run return and jump back to main. And it has increased ECX by one, as if nothing has happened. But you're not limited to do it one.
37:01
You can actually build up a chain of these gadgets that are each doing a small little assembly instruction. And it has been shown on, I think it's on Intel, ARM, and Spark CPUs, it has been shown that there is often a turing complete set of these gadgets
37:23
on the programs that are running on them. So it is possible to imagine that by finding these gadgets and creating a long chain of small bits and pieces that it's going to execute by just reading the stack
37:41
and doing return, return, return, return, but just before the return doing a small thing, you can build up a whole program in there. And this is the technique that is often used. It's very difficult to do by hand, so you typically always use programs. Like in this case, ROP gadgets, which can also, it can find the gadgets
38:05
and it can chain them together for you if you want something to do a particular command. And then you have the payload that you can just dump into your target program. Yeah? Yes. You want to get around the no execute program.
38:27
So you basically, you create your program as just the long strings of data. And then you let the original program jump around in its own executable
38:42
doing all these small pieces of instructions. So that's what return-oriented programming does. And just one of the more famous stories around return-oriented programming and probably one of the stories that made it kind of famous
39:01
and magazines and newspapers were writing about it. Some researchers into return-oriented programming, they asked for access to the new vaulting machine that they were using in the States, which had, I don't know how many hundred security researchers employed
39:22
to kind of create a really secure machine. And they used this technique and they showed that they could actually exploit the vaulting machine that was just designed to be unbreakable. And so far there are very few suggestions on how to protect programs
39:42
from exploits by return-oriented programming. So, back to a more, well, this is what I've shown now. It's very realistic, but now to something slightly different.
40:03
Here is a case which I find very, very interesting. And now that I know more about this one and similar cases, I see this all over the place in code bases. Now, this function, which is of course contrived because it's small
40:23
and I have to put it on the page. But there is a fairly common idiom here. And that is, in this function we are given a pointer and an offset and an AND of the buffer that the pointer is pointing at.
40:46
And then a value to poke into it. So, just to be kind of secure, first we check if pointer plus offset is bigger than AND because then we have an out of bounds.
41:01
That's not good. The next thing we do, now the offset is an unsigned value. But still we can have this kind of wrap going on. So, if it's a very large value, you can imagine that pointer plus offset becomes less than pointer. And that gives, oh, it's a wrap.
41:22
And that's not good. We don't want that. So, we return. And if everything is okay, we are confident that, okay, we are within the buffer, then we poke the value in. And according to the researchers that the paper that I read about this,
41:41
it was published, the one I read was published in November last year. This is a very common idiom also in what you would consider serious software. So, they found a lot of, for example, Linux programs even in kernel modules that use this as an idiom for checking for wraps.
42:05
But the problem is, now this is the out of bound guard and this is the wrap guard. So, if we compile this without optimization, it works exactly as it is supposed to work.
42:23
So, we give it some large numbers. It's poking into the buffer. Then it detects, oh, we are out of buffer. And since we use very large initial values, we get this wrap now. And, yeah, so this seems to work.
42:43
But with optimization, this happens. Out of bounds, out of bounds. And now it's just poking into memory. So, why is that?
43:02
And this is something that you will see more and more of when you have code that has been working for a decade or two. And then suddenly a new optimization technique comes in. And it starts reasoning better and better around your code. And in this case, there was a new version of a few compilers that came out.
43:25
Where they figured out that this can never happen. I mean, this is, first of all, pointer overflow is undefined behavior in C. So, this is not going to happen anyway.
43:42
So, why should they generate code for something that will never happen? So, if you look at the assembler here, it looks like this. And without going into details on that one. What has basically happened is that it has, well, this can never happen. So, it just deleted it.
44:01
This code is not present in the assembler. And this is something that you see over and over again, that the optimizers, they become smarter and smarter. And suddenly they remove code that you thought, oh, that's useful code. The compiler is smarter than you and says, if that is going to be true or that is going to be false,
44:22
then there must be undefined behavior. Of course, you're not doing undefined behavior, are you? So, I just removed it. You want false code, that's the reason why you write in C and C++. And the research paper, they claimed that there were like 200 modules or something like that in the Linux code that used this idiom.
44:45
Or that idiom is used in 200 places. I can't remember exactly the numbers, but it's a common thing. And I see it in our code bases as well. It's something that people do. And you might say, oh, that's inconceivable.
45:01
It should never happen. But as long as you're stepping outside of the rules for the language, then anything can happen. So, I have some more stuff about security. Because security comes typically in layers.
45:22
Very much in layers. You put several layers of security on protection mechanisms on top. And in this case, we might go into the source code and we might start trying to fix and use fgets instead of gets, et cetera. But it's easy to forget that the easiest way to find the information you need
45:43
is if you have the binaries, you just search for the strings inside, of course. I mean, it's silly, but I still have to mention it here. Because you can have all these protectionisms in place and suddenly you realize you have actually leaked the password or you have leaked the information.
46:01
Similarly, if you have access to the binary, you can just disassemble the binary. And you can read the authenticate and launch function and you can read how it works. And then we recognize, oh, that's two missiles, isn't it?
46:21
And that is not allowing access. I can just use sed and I can patch that binary file on those two places and then run it. And suddenly we have access granted, whatever we do, and we always launch 42 missiles. So, that's also one way of doing it.
46:42
If you're doing embedded applications, for example, sending out firmware to somewhere and you allow someone to change the firmware before they upload it into your router or switch or whatever, well, that can happen. So, yeah, so we recognize this thing.
47:00
So, that is a nice way of patching it. And you can also patch complete functions, of course. So, here I write my own, my authenticate and launch, David Rock's launch, 1983 missiles. Here is the assembler code, machine code. We just create a printf with that particular thing.
47:23
And we poke it directly into the memory. And as long as the function is smaller than the previous function, it's typically a very easy thing to do. You don't have to move things around or anything. You just replace the function completely. Now, we're doing our stuff here without any fancy way of hacking into the system.
47:45
So, I just wanted to mention those before we go into the summary finale. I'm not going to give 13, 14 tricks about how to write insecure code in C and C++.
48:01
First of all, this one. It's very common for programmers to think that they know which one will be called first. Oh, it's going to call A and then B. And that is true for all languages except for C and C++.
48:20
In C and C++, you can either call A first or B first. The only guarantee you have is that they are not going to execute at the same time. And this is called unspecified behavior. Actually, COBOL, now Fortran also has this feature. But all the other languages that you know about, probably, will have a left to right evaluation.
48:48
So, trick number one. Make sure that you write code that is depending on a particular evaluation order. This is a trick for writing insecure code, of course. Now, unspecified evaluation order has some serious consequences.
49:03
Very serious consequences. And that is, what is the value of n? Any suggestions? Yeah, it's undefined behavior. It can be 42, it can be 0, it can be 7, it can be whatever.
49:22
And even if you have seen your own compiler generating exactly the same value over and over and over again, it doesn't mean that it will do so in the future. And it certainly doesn't mean that it will do so if you change the compiler. So this is an example of a sequence point evaluation. And you get undefined behavior.
49:42
So, trick number two. Try to break the sequencing rules. That will give you some cool effects. And I'm showing you that one in the next one. This one, for those who have been to a previous talk, and I've probably seen this one before, so you don't need to answer. But I think we have time to very briefly look at this.
50:03
What do you think will actually happen if you compile and run this code in your development environment without picking up your computer? Just try to guess. You have some experience with your own development environment. What do you think your compiler will do?
50:24
I'm quite sure this will compile and this will run cleanly, so you don't need to look for semicolons or whatever. It's missing semicolons. It's going to print the value. What do you think it will print? First one?
50:46
Seven? I like that. Anything else? Eleven? I like that as well. Anyone higher? Fifteen?
51:00
Oh, twenty-one. That's even better. Anyone higher? Well, this is what happened on my computer. I have many compilers on my computer. I just happened to pick three interesting ones. If I use the GCC that came with the operating system, not the one that I compile myself, which I do all the time.
51:27
The one that came with the operating system. GCC gives me twelve, Clang gives me eleven, and Intel compiler gives me thirteen. And this is a consequence of the unspecified evaluation order.
51:43
And therefore, the expression that you see here doesn't make sense. Because we don't know whether i has been updated here, or there, or there, or when the side effect of ++i actually happens. Yes, you are ahead of me. Good.
52:07
So, trick number three, write insecure code where the result depends on the compiler. And if you want a detailed explanation with all the assembler involved, here's the link.
52:23
So, if you enable warnings for code like this, I don't know how many flags you want. So, to write insecure code, it's important to know a lot about the weak spots.
52:44
The blind spots of your compiler. Because there is always one annoying colleague that wants to add another flag. Oh, I want minus F, whatever. So, know your blind spots. And the compilers will typically not, well, they will try to warn you about things, but very often they can't see it.
53:04
And sometimes they can see it, but they have decided that I'm not going to warn anyway to diagnose this. And remember, when you have undefined behavior, and that's what we saw, we saw undefined behavior.
53:21
And that's the reason why the compilers can just do whatever they want. Anything can happen. And there is this saying on constancy that when the compiler encounters an illegal construct, it can try to make demons fly out of your nose.
53:41
That's okay behavior. And this is what we see in this particular example. I'm not going to spend time on that one, because we already discussed slightly this Bool problem earlier. But the point is that if you have code like this, and the problem is we don't initialize B, but we are still reading it.
54:01
So that means B in theory can be whatever value. Or in practice can be whatever value. So if you just put on a main and a bar here, just to poke, for example, the value 2 into this memory location, remember how the stack was working on my machine?
54:23
Then we get the situation where it's trying to evaluate B that has an internal representation of something that is not 0 or 1. And the compilers, they behave differently. The internal compiler, in this case, will give true. Clang will give false.
54:42
And GCC have this quantum behavior and gives me both true and false at the same time. So maybe this is the first preparation for quantum computing. With optimization, that just happened to all of them happen to do false.
55:01
So this is also, you see, an example of where the value changes completely with optimization as well. And it's different between the compilers. And once again, I described two concepts in this 90-minute talk that I did at Accu last year with old assembler involved. And it was the previous one, and it was this one.
55:21
So if you want to see old assembler code and explanation for why this is happening, then you can go there and have a look. So write insecure code by messing up the internal state of the program. Now this one. Anyone see a problem with this function?
55:42
Apart from being stupid and doing nothing serious, I mean, we're quite used to write the function that takes an integer and do something with that integer and then return something. So this is innocent-looking code. But there is a problem.
56:03
We can get integer overflow. Because if we send in a large value, in this case, int max, then we get signed integer overflow, and that is undefined behavior in C and C++.
56:22
So, and you remember, when you have undefined behavior, anything can happen. So this might happen. Of course, you might say, well, unlikely. But remember, anything can happen, even that thing, even though I would be surprised if it actually happened.
56:44
It's a real phenomenon. But in theory, anything can happen. So make sure you write insecure code by only assuming valid input values. Don't check for large integers before you're doing calculations with it, because that will make your code more secure and less buggy.
57:06
Number seven, trick number seven. Now trick number eight is this one. Now, I have a small loop that is basically looping through I. And I've already given you a small hint about what might happen here.
57:26
But this one terminates after doing four calculations and printing out four values, which is fine. That is what you would expect maybe. But if you combine with optimization, this can happen.
57:41
And this actually happens on my machine. So what looked like a valid terminating condition is not a valid terminating condition in this case. Because this thing can never be true. No, it can never be anything but true.
58:02
We are doing additions. And signed integer overflow is not allowed. So either it's crappy code or this can never be anything but true. So the compiler will change it to true and therefore eliminate the code.
58:23
If you have a good compiler or lousy compilers, they don't see these kind of things. So write insecure code by letting the optimizer remove apparently critical code for you. And here is a big topic. And there are plenty of C++ examples as well I could have used.
58:42
What is the big problem here? It's of course the use of this one. It looks like kind of a secure way of copying things. And it has a very bad name, but it was invented like 220 years ago, something like that. So they didn't think that far.
59:02
Just kidding about 220 years ago. But the point is that strncopy is not doing what you think it does. Because it's not null terminating the string. So if you write a large string. For example if you write global thermal nuke in this case.
59:31
You are filling up the buffer and it's going to stop there. It's not going to do the buffer overwrite as for example strcpy would have done or whatever.
59:41
But it's going to truncate it without null terminating. So the effect is that when you print it out it will continue after the string and into the next buffer and you get this information leakage. Whenever you see strncopy.
01:00:00
you should expect to find this line afterwards. And if you don't find that afterwards, you are probably looking at the bug. So you can just scan through your source code, look for strand copy, and if you don't see that terminating thing afterwards, you're probably looking at the bug. So write insecure code by using library functions incorrectly.
01:00:23
And of course, you should never allow the stack protector to kick in. So you can disable it with minus F null stack protector. Disable it. You should also make sure that ASLR is not working.
01:00:40
You can either turn it off or you can compile your code that is not position independent. That's useful. And make sure you use some old hardware and old operating systems, like five, six, seven years old machines and operating systems, because they, if you're lucky,
01:01:00
they don't have this data execution prevention, this next bit in place. So go back a few years, use those kind of old stuff. And here is another one. The difference, I've just shown an example here, where I compile my program
01:01:24
with using shared libraries and one with static libraries. And that makes a huge difference in the size of the program. So you see, in this case, it's only 7K and this one it's 780K. The cool thing about that is that the larger the file is,
01:01:42
the more gadgets you can find. So if you want to do ROP, you want to have a huge program that has a lot of kind of random data in there. So you can find a lot of gadgets that you can chain together and start exploiting the program.
01:02:00
Make it easy to find ROP gadgets in your program. And you should not check the integrity of the code. So you should not do a check sum when you distribute the code and you should certainly not do a check sum when you load the program and install it on your machine and start running it.
01:02:25
Because that would make it difficult to kind of do patching of the binaries. So skip the integrity checks. And finally, perhaps most important, you must never ever let other programming
01:02:41
to review your code. So pair programming, forget about that. Never review code because your colleagues might see all the crap that you are creating. So that was the advice for writing insecure code. And here is the summary. And thank you very much.
01:03:03
So we are out of time, but I will hang around here a bit. And if you want to ask about something particular here.
Empfehlungen
Serie mit 7 Medien
Serie mit 13 Medien