Adding AES-ICM and AES-GCM to OpenCrypto
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 41 | |
Autor | ||
Lizenz | CC-Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben. | |
Identifikatoren | 10.5446/18684 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
4
8
13
14
16
22
25
33
39
00:00
Metropolitan area networkATMp-BlockSchlüsselverwaltungGroße VereinheitlichungAdditionDistributionenraumWeb-SeiteHash-AlgorithmusStatistische HypotheseMehrwertnetzTVD-VerfahrenCachingSoftwareCodeGammafunktionSoftwaretestIntegriertes InformationssystemWort <Informatik>Bernštejn-PolynomAlgorithmusCodeDatensatzHardwareImplementierungInformationKryptologieMathematikOrdnung <Mathematik>PolynomRauschenSoftwareTopologieZahlensystemFrequenzDigitalisierungKnoten <Statik>Generator <Informatik>MittelwertSoftwaretestProgrammverifikationÄquivalenzklasseBenchmarkBitDivisionGeradeLoopMereologieMultiplikationPaarvergleichPhysikalisches SystemStatistische HypotheseTabelleTermWeißes RauschenZahlenbereichFestplatteVersionsverwaltungReelle ZahlTeilbarkeitMultipliziererVerzweigendes ProgrammCoprozessorAusnahmebehandlungCASE <Informatik>ProgrammfehlerVorhersagbarkeitUltraviolett-PhotoelektronenspektroskopieATMDatenfeldComputersicherheitMaximum-Entropie-MethodeRandomisierungSampler <Musikinstrument>KorrelationsfunktionNotepad-ComputerDistributionenraumParallele SchnittstelleAdditionPunktNational Institute of Standards and TechnologyQuaderSchnittmengeOffene MengeAuthentifikationKugelkappeKernel <Informatik>Wort <Informatik>KreisflächeChiffrierungIrreduzibles PolynomDisjunktion <Logik>Primitive <Informatik>Data MiningSkriptspracheBefehl <Informatik>QuellcodeBitratePasswortStreaming <Kommunikationstechnik>Schreib-Lese-KopfSchlüsselverwaltungDifferenteAutorisierungPatch <Software>Neuroinformatikp-BlockAdvanced Encryption Standardsinc-FunktionSymboltabelleDickeMultiplikationsoperatorSoftwareschwachstelleAblaufverfolgungStandardabweichungMessage-PassingZweiGraphiktablettCachingFigurierte ZahlSeitenkanalattackeEinsKoroutineWürfelHalbleiterspeicherProgrammbibliothekTypentheorieMAPPhasenumwandlungChiffreAggregatzustandBinärcodeGruppenoperationLeistung <Physik>ParallelrechnerProjektive EbeneWärmeausdehnungQuick-SortFlächeninhaltKonstanteRuhmasseNegative ZahlDistributivgesetzPCMCIAKartesische KoordinatenUmwandlungsenthalpieMailing-ListeAliasingSoundverarbeitungBlackboxAbgeschlossene MengeKontextbezogenes SystemFreewareTVD-VerfahrenOrtsoperatorComputerunterstützte ÜbersetzungComputeranimation
Transkript: Englisch(automatisch erzeugt)
00:03
Should I get started, or? OK. All right. So hello, everybody. My name's John Mark Gurney. I will be talking about my work on adding AES-ICM and AES-GCM to the FreeBSD kernel. First of all, I'll start off with a little bit about
00:22
myself. I've been using FreeBSD since the original 1151 and have been a committer since 1997. So I've been around and seen lots of different things. But about 12 years ago, I actually started working in the security industry.
00:41
I had been interested in security for a very long time. I started using SSH almost immediately once it got released, once I realized it's like, your passwords are in the clear on the network whenever you telnet into a box. That's probably not a good thing. But I never did much with security
01:01
until I started working at Encircle. And then a few years after working at Encircle, I worked for a company called Cryptography Research. I was never very much of a cryptography person, algorithms or anything. But after spending six and a half years working with people who do that for a living,
01:21
you're bound to learn a thing or two about cryptography. So it was during that period that I started doing some work on AES-NI. Originally, if you were here last year, you may actually have attended my talk on adding AES-XTS to the FreeBSD kernel.
01:41
And that was purely in order to make my encrypted hard drives perform faster. Encrypted hard drives means RMA is much easier and quicker. So after doing that, it became Jim Thompson from Netgate approached me and was like, we would
02:02
like to add additional modes to FreeBSD's zipsack. And so he contacted me. And I decided to do some work. So why did I do the AES-GCM? So one of the things is that AES-GCM actually
02:25
performs significantly faster than the most common existing AES mode. And that is AES-CBC. I'll get into that in a little bit more. Security is another good point. AES-GCM is a AEAD mode.
02:43
That means authenticated encryption with associated data. The importance behind this is that normal encryption does not actually say anything about the trustworthiness of this data. So this means that the attackers can do whatever
03:04
they want with the encrypted data. They can manipulate it. They can funge the data. And so encryption is not the end all be all. You need to actually have authentication. Without authentication, encryption
03:21
provides confidentiality. Authentication lets you actually trust who it is. And as I mentioned, the reason Jim Thompson wanted me to do this was that out in the real world, AES-GCM was commonly used for IPSEC. Today, I'll be mostly talking about GCM
03:41
because that is actually the interesting problem. And ICM is actually, as you'll find out shortly, really a sub-problem of GCM. So as I mentioned, why not CBC? Now, I've used this talk before, but this may not be apparent. It turns out that the block cipher encryption
04:03
step is the really expensive step in crypto. XOR are very cheap. You already have your plain text, IV, nothing else much to do. It turns out that because of the expensiveness of it, if you notice, I cannot start encrypting this next block
04:24
until I have fully encrypted this block. And that means that you are limited by how fast your AES cipher run things. You cannot do any pipeline or parallelization in order to improve performance. And that means that it turns out that when
04:44
you do decryption on CBC, you don't actually have that because you always have the next cipher text, and you can do all this in a thing. If you actually run performance benchmarks using OpenSSL or whatever, you'll actually find out
05:00
that CBC mode only runs about 600 megabytes a second where, for encryption, well, decrypt can run at like three to four gigabytes per second. So don't know about you, but I'd much rather have three to four gigabytes per second. So what specifically is ASGCM? So as I mentioned earlier, ASGCM is
05:23
a combined authenticated encryption with associated data mode. So the encryption part is this part. And as you can see, we're using a counter. And so the EK, which is the encryption, which is the AES step, can be done completely
05:41
independent of each other. And we feed in the thing. Now, the other part is the authentication part. This authentication part with the associated data is what makes sure that everything happens. The same key is used, and it outputs an auth tag.
06:03
The length of the auth tag can be changed, but it is recommended to be at least 64 bits. If you go much shorter, it is likely that an attacker would be able to forge it. You could do as much as 128 bits, but that all depends on exactly what you're doing.
06:21
So this is, yeah. And one of the other key parts about that is that if you know the very last step is that you have the length of the authenticated data and the length of the cipher text. If you don't have that embedded, that means the attacker could actually maybe, say,
06:42
change where the authenticated data changed and other stuff, or shave a few bits off. And that would not be good. So one of the, well, I forgot about this. The EK is the AES step. Mult H is a special Galois field multiply
07:05
over GF2 to the 128. And I will now be getting into that. So Galois field, sorry about the math, but it turns out that Galois field math is not as complicated.
07:20
I spent a while, it takes a lot of convincing yourself that it isn't that complicated, but it is actually. So in Galois field math, addition is carry-less. That, unlike the traditional where you add five plus five and you come up with 10, if Galois field was, if you were doing a Galois field of 10,
07:41
five plus five would actually end up being zero. So in this case, and we're doing binary, instead of one plus one being one zero, you actually just end up with zero. And that ends up with the equivalent of XOR or the mathematical symbol circle plus, which is used quite often.
08:01
And so multiplication works, believe it or not, exactly the same way that you do multi-digit multiplication. If you do 15 times 15, or 15 times 23, you do all the long school book, do the additions, and Galois field math works exactly the same way,
08:21
but except for, as I mentioned earlier, your additions does not carry. So in my little quick example of a one one binary multiplied by one one binary, you do the, you add the zero because you're multiplying that in the two's position, and you add it with the one one
08:40
and you end up with one zero one instead of the, if you had done this with normal carry-less, it would be, what would it be? It'd be like, I forget what it is. Yeah, that's what I was thinking it was, but. And so, but, so one minor difference is that
09:00
since we're doing GF to the 128, we need to add a reducing factor whenever our value is larger than 128. This is, each different field in GF 128 has a different reducing factor. This is commonly used irreducible polynomial.
09:23
I don't like that term because polynomials get all confusing and it makes, there's a whole different notation that you very often see where it's x times two to the 128 plus x plus one and other stuff, and I've always been confused by that.
09:41
I'm not that thing. And it also turns out, and this will be very important later on, is that in GF two to the 128, distribution works. Meaning if you do A plus B times C, you can distribute the C over to the A plus B, and therefore it is the equivalent
10:01
to A times C plus B times C. And so there's interesting tricks that we'll be able to play with that. So there's always dangers in implementing different parts of crypto. And when I was working at CRI,
10:21
they developed numerous different techniques on attacking systems through side channels such as EMI. But they were not so much onto the computer side, but they were still needed to keep abreast of the various attacks. So back in 2010, Bonin Hung wrote his master's thesis
10:42
on attacking AES-GCM, GCM secret authentication value H, which I talked about a little bit earlier. And in this paper, he basically goes through and shows that if your Galois field math is not exactly constant time,
11:00
and you do various speed ups where you do pre-compute tables in order to improve performance and doing 8-bit stuff, you can actually snoop on the cache line, figure out what values are being used on the secret values H, and figure out actually what H is. And once you figure out what H,
11:22
as we saw here, if you know H, you can actually forge any message that you want. Yes? Is it that it's, say, an S-box implementation of AES, like what an ARX cipher in that mode would have the same side channel,
11:41
or is it because it's AES that's being used? So in his paper, he was attacking purely the Galois field math and not the AES side channel. But as the second bullet points out, there are well-known side channel attacks for the standard five-table version of AES
12:01
because when you're doing the S-box lookup, you have information, and the S-box lookup is most often implemented as a 256-byte array to do the lookup, and then that becomes complicated. Luckily, if you have access to SSC registers,
12:21
there's other things that you can do to mitigate that, assuming you don't have ASNI. Yes? So I'll have a reference to this paper.
12:40
This is a DJB paper. Yeah. So this, as you said you're familiar with this paper, this paper, he actually demonstrates
13:00
that the cache timing attack is vulnerable over the network. So it comes down to more complicated, obviously. Part of it depends upon how long keys are used, how much data is used. With technology like DPA and other stuff,
13:22
it is as long as there is leakage with enough traces, you will be able to extract what you want. At CRI, we've done a number of different things where a friend of mine who was working on the project,
13:41
I worked with him on some stuff, he basically said that on hardware or using DPA, you could actually target individual transistors on a chip. And when you're attacking over the network, the network just adds a little bit of noise, but the data is still there,
14:02
you just need to average out the noise. And it depends upon how many traces you can provide to lower your noise for and average out the noise. Does that make sense? Yes, but the noise from the processor itself is uncorrelated to what you're attacking.
14:24
And because of that, as you collect additional information, the uncorrelated noise will drop out and only the correlation that you're targeting.
14:40
So you actually basically, when you collect all these, I'm getting more into DPA, but you collect traces and then you say, I'm gonna guess at this S box, I'm gonna guess that this value is either 00 through 255, and you sort all of your traces
15:01
based upon this, and you look at the averages and the ones that don't are chosen ciphertext. Yeah, yeah, passive snooping, it's more difficult to do because you're obviously not injecting anything. So, and also the other, it also turns out
15:24
like other things that need to be constant time is comparing the tag itself, because as, you may know that, but, and so there's been a number of vulnerabilities where, and that is part of the reason why we now actually have timing,
15:42
mem comp timing safe version in the kernel. One of the other interesting things though is that, believe it or not, counteracting side channels can improve performance. While I was doing this work, I imported OpenBSD's version of GCM, and when I first, I was doing benchmarking,
16:03
seeing how to improve performance, and I mistakenly ended up swapping the two values to the Galois field math. Now normally, this should not be a problem, because, well, as we know, you can,
16:22
when you're doing multiplication, 15 times 20 is the same as 20 times 15. But the problem was is that their implementation assumed that one side, one side of the multiplication was, they were doing constant time, but then they were doing an if statement based upon the other. So in the case of GCM, when we go back to the,
16:44
sorry, when we go back to here, we're doing this cipher text is public text, but this H value is not. And so if you swap the two values around, suddenly you're now branching on H
17:02
and leaking information about the private value. And when I discovered this, I was quite distressed, because it was sad to make this. I ended up doing a little tweak to make this constant time, and it turns out that the performance improved. Because now, instead of this if statement
17:22
in a loop that was iterating 128 times, I now was doing a few bit of extra math, but I wasn't mis-predicting the branch. And so performance actually improved, the code was effectively constant time, and closed the side channel attack. Open BSD has actually adopted that patch in their tree.
17:42
So as I mentioned earlier, some of the ways that you improve performance of Galois field math is by doing lookup tables, because you can, people were doing traditional, like eight bits at a time, calculate what the Galois field math against H was,
18:02
calculating it, but as I discussed earlier, cache effects, you can actually figure out exactly what values you're multiplying against. So some people at NSS, I forget the other, who are working on that library,
18:21
were doing some investigation, and it turns out that DJB, and I forget somebody else, actually found out that intra-cache line accesses actually had variation. The first 64 bits of cache line were actually accessed much quicker than the remaining bits of the cache line.
18:43
And so even though you would think it was the same, it is not. And so one of the proposals that they put forward was in order to help mitigate this, it's still, because of the intra-cache line access, it doesn't completely solve the side channel,
19:01
but it helps mitigate, is instead, when you have a 16 row lookup table, you split it into four four bytes, so 16 bytes for each of the GF to the 128 entries, and then you split them across four cache lines. And so now, when you look up the table,
19:21
since you access all four cache lines, you are no longer, you do not have as much of a footprint, obviously, the first two entries, in this case, because those access faster, you'll be able to tell those, but the remaining won't. And it really comes down to
19:43
compromises that you have to make. Improving, adding this performance, I forget that I don't have the numbers right now, but adding this improves the software implementation significantly. Now, instead of, you know, I'm going through the last four times as often, I'm accessing memory and other stuff.
20:03
So, there are improvements, and you really have to decide how much leakage do you want versus how fast you want. You can make things extremely secure, but if it runs at 10 megabytes per second, nobody's gonna use it. If it runs at a few hundred megs,
20:20
then they might actually use it. And as, you know, there's a lot of different noises in the systems. Turns out that the slower systems are a little bit more easily attacked than otherwise. So, as part of my research, I alluded to some of these interesting opportunities.
20:42
When we go, if you remember the GCM, we basically always did a multiply, added it to a next value, and then multiplied the resulting value. And so, that's the equivalent of A times H, plus B times H, plus C times H. Those are a lot of parentheses.
21:01
But because of the distribution property, we can actually pull out all of those Hs and distribute them. And now we have a much easier and less dependent version, where A times H to the four, plus B times H to the third, plus C times H squared, plus D times H.
21:21
And so now, we can actually precompute H to the four, H cubed, H squared, and H, have tables on those, and do lookups, and then decouple this. This was the technique used by the Intel PCL mold QDQ carryless multiply paper,
21:45
in order to improve their performance, because the PCL mold QDQ instruction actually had significant latency, and we wanted to hide that. So, being able to distribute that. Turns out that when I was working on the software implementation,
22:02
I noticed that the software Galois field math was not actually using very many registers. It was like using two or three registers, where even on i386, we have at least eight. And so, I ended up applying this exact same performance benefit, and got a significant performance boost.
22:22
I think it was like over 50% to that. So, obviously a large part of this is, ASNI helps completely avoid cache timing attacks. And that is because it actually implements the thing,
22:42
and we don't have to do lookup tables. One of the interesting parts of the PCL mold QDQ instruction is that Intel did not actually specify it as a full 128-bit multiply. When you do the multiply, you multiply two 64-bit numbers,
23:02
and end up with a 128-bit number. So, that means that you need to do extra work in order to do the 128-bit multiplies that we need for ASGCM. Now, it turns out that if you just think a little bit differently,
23:22
the traditional way that you can do this is just the same way that you multiply the two two-digit numbers, except for in this case, they happen to be two 64-bit digit numbers. And this actually works for any word size that you're targeting about. When I was at CRI, I actually wrote some
23:42
multiplication routines and division routines that was targeting a 600-bit number size. So, it's kind of odd to think about having a single digit containing 600 bits, but that entirely happens. So, as this work was actually funded
24:00
by the FreeBSD Foundation, one of the requirements that they have was to review the code. So, when I had the code reviewed by, I believe this code was reviewed by Watson Ladd, he pointed out one of the things that is very dangerous about ASGCM,
24:21
since GCM is in counter mode, counter mode does not use what is normally called an IV, it uses what is called a knots. If you end up reusing, AES counter mode is effectively a cipher stream that you XOR, it's just a simple way to generate unpredictable noise,
24:42
it's kind of like generating a one-time pad that you XOR onto it, but with very small amounts of data, the key plus the knots. The problem, though, is the sense of counter increments, if you were able to replay a counter, you would actually end up with the exact same key stream and you'd be able to take two cipher texts,
25:02
XOR them together at the appropriate parts, and you'd now get the XOR difference between text. And it turns out, if you like XOR two English texts together, you're probably able to figure out exactly what the two texts are. It's not a very good thing. So one of the requirements that we ended up doing was that for both AES GCM and for counter mode,
25:23
instead of allowing, like in the case of CBC, where if you don't specify an IV, it'll generate random data, the code actually requires you to generate, actually specify the knots. And so if you know what you're doing,
25:42
you know how to generate a unique knots and you're safe. If you don't know what you're doing, well, you shouldn't be writing crypto code. But this actually ensures that the user of the AES GCM code can ensure that they have the secure cryptographic primitives that they want.
26:01
The other interesting thing is, is when I was implementing the AES GCM code, I actually found a bug in the reference paper implementing GCM. When they were doing the decrypt, as part of the decrypt phase, you want to do all of the authentication,
26:22
authenticate that all the cipher text is safe, and then you can do the decryption. It turns out that the comparison of the tag that was calculated was not done correctly. It actually was verifying that all the bits in the tag provided were set
26:44
as the generated tags were. But you could, I forget if, you could either specify a tag of all zeros or all ones, and it would always work. So obviously not a very good thing.
27:01
Luckily, I sent him email, and that was fixed in the next release of the paper. But as you can see, reviews are absolutely necessary when you're coming across crypto code. And that code in his paper was actually on Intel's website for over two years before it was fixed.
27:23
And obviously testing and verification are two. Part of the discovery of that bug was the fact that I was generating knowingly incorrect data and comparing it, but it was actually doing decryption and passing, and that should not pass data that I know is not correct.
27:44
And so as part of this work, I set up an additional set of tests that are actually now distributed entirely part of FreeBSD. They're under, and they're actually, if you have installed, have a head system from within
28:01
like about six months or more, if you go into user tests, sys open crypto run crypto, you'll see a script that there, that run tests and will actually run tests against this code. The tests require that you have Python and the NIST cap ports installed.
28:21
And the NIST cap port is basically a set of known answers to various crypto algorithms that are distributed by the NIST government agency. And so that's the standard checks, CBC, there's like SHA-256 and some other stuff.
28:40
So I would like to thank both Netgate and the FreeBSD Foundation for encouraging me and providing funding for this work. And then also I would like to thank my reviewers, Mike Hamburg and Watson Ladd for their work doing the review and other stuff. And the papers that I mentioned are here.
29:03
So does anybody have any questions? Yes. So just going back to the AS encounter mode. So I've seen implementations where, because they're generating a new key for each instance, they don't bother to set the nonce.
29:23
And so is it because most contacts in GCM are not rekeying is why the nonce is important for? So if, yeah, as long as it like, the whole point, the thing that needs to be protected
29:40
in both GCM and in AS counter mode, as you said, as ensure that the key and the nonce are never reused. So if you always have a new key, then you only have, and the nonce is all zeros, then that is perfectly safe since you're never using the two together. The reason why most people don't do that
30:02
is because key distribution is very, very difficult. And you can, I forget what the sizes is, but you can encrypt like multiple terabytes of data with this. I can go in a little bit more specifics. In the GCM, they use 96, the counter is 128 bits.
30:25
The nonce is 96 bits. And at 96 bits, that's too small to use random. If they were using a full 128 bit, you could use random, but then you also have the problem that if you actually encrypt four gigs of data, then you're back down to 96
30:41
and then there's likely an overlap. And so in GCM, they have a 32 bit nonce. And so that means that you can encrypt, what is it, two to the, like 36 bytes worth of data per authentication without actually rolling over.
31:05
And then also, yeah, so does that answer your question? Yeah, yeah, so as long as you, the whole key part. And then ICM is literally just this part of GCM.
31:23
Any other questions? Yes. Unrelated to this work, I saw some stuff floating around for USD 10.2 and Intel Quick Assist. Are you able to speak to that? Well, Jim can probably talk a little bit more
31:40
on Quick Assist there. I know that they're working on it. I'm not sure how much, how ready or other stuff they're about. Do you wanna say anything, Jim, or? Okay, so I know there's people interested in stuff, but yeah, there's, yeah, it's not.
32:03
So any other questions or, all right. Well, thank you for attending.