Adding AESICM and AESGCM to OpenCrypto
Formal Metadata
Title 
Adding AESICM and AESGCM to OpenCrypto

Title of Series  
Author 

License 
CC Attribution  ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and noncommercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license. 
Identifiers 

Publisher 

Release Date 
2015

Language 
English

Content Metadata
Subject Area  
Abstract 
Adding additional cipher modes may seem simple, but there are many things to consider. Implementing the modes and ensuring security requires more than a simply coding it up. It requires understanding of different standards and computer architecture to make sure things like side channel/timing attacks are addressed or properly understood. Some design decisions can be made to help ensure that consumers of the interface are able to properly use it.

00:00
Aliasing
Metropolitan area network
Algorithm
Multiplication sign
Bit
Cryptography
Frequency
Kernel (computing)
Software
Different (Kate Ryan album)
Order (biology)
Hard disk drive
Cuboid
Circle
Freeware
Information security
02:06
Authentication
Point (geometry)
Area
Asynchronous Transfer Mode
State of matter
Block (periodic table)
Real number
Bit
Thermal expansion
Black box
Parallel computing
Open set
Cryptography
Benchmark
Order (biology)
Encryption
Key (cryptography)
Block (periodic table)
Information security
Asynchronous Transfer Mode
05:14
Authentication
Ciphertext
Addition
Key (cryptography)
Distribution (mathematics)
Length
Web page
Bit
Grand Unified Theory
Mereology
Type theory
Mathematics
Encryption
06:51
Polynomial
Addition
Divisor
Distribution (mathematics)
Multiplication sign
Mass
Field (computer science)
Mathematics
Positional notation
Term (mathematics)
Different (Kate Ryan album)
Circle
Binary multiplier
Position operator
Exception handling
Metropolitan area network
Addition
Multiplication
Distribution (mathematics)
Binary code
Symbol table
Equivalence relation
Irreduzibles Polynom
Type theory
Personal digital assistant
10:08
Authentication
Multiplication sign
Web page
Bit
Cryptography
Mereology
Computer
Side channel attack
Field (computer science)
Hypothesis
Valueadded network
Hypothesis
Mathematics
Message passing
Different (Kate Ryan album)
Hash function
Order (biology)
Figurate number
Table (information)
Physical system
11:31
Ciphertext
Implementation
Multiplication sign
1 (number)
Mass
Mereology
Side channel attack
Tracing (software)
Coprocessor
Field (computer science)
Number
Revision control
Crosscorrelation
Average
Different (Kate Ryan album)
Hash function
Computer hardware
Cuboid
Addition
Noise (electronics)
Standard deviation
Multiplication
Information
Key (cryptography)
Projective plane
Bit
Hypothesis
Data mining
Kernel (computing)
Software
Personal digital assistant
Statement (computer science)
Weißes Rauschen
Table (information)
16:44
Ciphertext
Logical constant
Metropolitan area network
Group action
Addition
Mapping
Information
Distribution (mathematics)
Code
Closed set
Multiplication sign
Branch (computer science)
Side channel attack
Hypothesis
Loop (music)
Network topology
Hash function
Statement (computer science)
17:41
Noise (electronics)
Multiplication sign
Source code
Sound effect
Bit
Line (geometry)
Mass
Cartesian coordinate system
Side channel attack
Field (computer science)
Number
Cache (computing)
Word
Cache (computing)
Spherical cap
Different (Kate Ryan album)
Semiconductor memory
Order (biology)
Bounded variation
Table (information)
Bounded variation
Library (computing)
Physical system
20:31
Distributive property
Implementation
Multiplication sign
Ultraviolet photoelectron spectroscopy
Mereology
Field (computer science)
Power (physics)
Number
Revision control
Mathematics
Cache (computing)
Software
Hash function
Binary multiplier
Exception handling
Metropolitan area network
Multiplication
Digitizing
Bit
Division (mathematics)
Equivalence relation
Word
Personal digital assistant
Cube
Order (biology)
Table (information)
23:54
Ciphertext
Code
1 (number)
Set (mathematics)
Primitive (album)
Streaming media
Mereology
Disk readandwrite head
Software bug
Spherical cap
Different (Kate Ryan album)
Formal verification
Encryption
Software testing
Gamma function
Computerassisted translation
Information security
Physical system
Authentication
Scripting language
Metropolitan area network
Noise (electronics)
Pairwise comparison
Algorithm
Standard deviation
Information systems
PC Card
Code
Bit
Knot
Cryptography
Message passing
Personal digital assistant
Phase transition
National Institute of Standards and Technology
Software testing
Figurate number
Sinc function
Asynchronous Transfer Mode
28:40
Metropolitan area network
Word
Cache (computing)
Bernstein polynomial
Negative number
Software testing
Figurate number
29:12
Area
Scripting language
Point (geometry)
Authentication
Metropolitan area network
Context awareness
Multiplication
Distribution (mathematics)
Web page
Code
Bit
Knot
Mereology
Latent heat
Quicksort
Bounded variation
00:04
get started again right so hello everybody my name strong marketing I will be talking about my work on adding up alias ICM NES cheek AS GCM to the Free BSD kernel 1st of all start off with a little bit about myself but I've have been using Free BSD since the original 1 1 5 1 and have been committed since 1997 so I've been around and see lots of different things so but about 12 years ago actually started were about working the security industry had been interested in security for a very long time I started using SSH almost immediately once it got released once I realized like the passage served in the clear on the network whenever you tell into a box that's probably not a good thing so but I'd never did much with security until I started working out and circle and then there's a few years after working encircle I worked for a company called Cryptography Research and I was never very much of the cryptography person algorithms are anything but after work spending 6 and a half years working with people who do that for a living you're bound to learn a thing or 2 about cryptography so it was during that period that I started doing some work on a yes and I I'm originally I was and if you were here last year you may actually have attended my talk on adding ASX to the previous the kernel and that was purely in order to make my heart encrypted hard drives performed faster be hard rise means army is much easier and quicker so after doing that but it became a Jim Thompson from that date approach being was like we would like to have added additional roads to of the
02:07
previous user upset by and so he contacted me and I decided I decided to do some work so what why did I do that ASG so 1 of the things is is that AS GCM actually performs significantly faster than the most common existing AS mode and that is a a CBC I'll get into that a little bit more on security is another good point 1 ASG C is of AED mode that means authenticated encryption with associated data the importance behind this is that the normal encryption does not actually say anything about the trustworthiness of the state so this but means that the attackers can do whatever they want with encrypted data they can manipulate it they can't find the data and so encryption is not the end all be all you need actually have authentication without authentication you don't have encryption provides confidentiality authentication what's actually trust to its and as I mentioned the reason Jim Thompson wanted me to do this was that out in the real world GCM was a common commonly used for upset I will look into they'll be mostly talking about GCM because that is actually the interesting problem and ICM is actually and as you'll find out shortly really a subproblem of GC
03:52
so as I mentioned why not CBC now all of a user's talk before but this may not be apparent that it turns out that the black box i for encryption step is the really expensive step in and crypto X or a cheap URI have your plaintext ID nothing else much to do it turns out that because of the expansiveness of it if you notice I cannot start encrypting this next block until I have fully encrypted this block and that means that you are limited by how fast you're ASI for things you cannot do any pipeline a parallelisation in order to improve performance and so and that means that but it turns out that when you do decryption on C you don't actually have that because you always have the next so for text and you can do all this and think if you actually run performance on the bench marks on the wood using like openness sell or whatever you'll actually find out that the CBC mode runs about 600 megabytes a 2nd where deep and for encryption where well decrepit can run at like 3 to 4 gigabytes per 2nd so I I don't know about you but I would much rather have 3 or 4 gigabytes per 2nd so
05:16
what specifically is a SGC so as I mentioned earlier AS GCM is combined authenticated encryption with associated data about so the encryption part is this part and that and as you can see where using a camera and so the E K which is the encryption which is the best that can be done for a completely independent of each other and we fit in the things now all the other part is is authentication part this authentication part and with that of an associated data is what make sure that everything happens the same key is used and what it outputs type the length of the off tag can be changed but it is recommended to be at least 64 bits if you go much shorter it is likely that you have an attack would be able to forge at you could do it as much as 128 bits but that all depends on exactly what you're doing so that this is a and 1 of the other key parts about that is is that if you know the very last step is is that you have the length of the authenticated data and the length of the cyphertext if you don't have that embedded that means attacker could actually maybe say change where the authenticated dated changed and other stuff for shaded a few bits off and that would not be so 1 of the well I
06:53
forgot about this so what the as the is the Iast up to a multi H is a special i Japan Galois field multiply over GF to 128 but and I will now be getting into that so Galois film of sorry about the math if you're but it turns out that don't feel masses not as complicated I spent a while it takes a lot of convincing yourself that it isn't that complicated but is actually so in Galois field now the addition is carried us that and like the traditional where you add 5 plus 5 and you come up with 10 if Calif yield was if you're doing our field of 10 5 plus 5 would actually end up being 0 so in this case in were doing binary and instead of 1 plus 1 being 1 0 actually just enough was 0 and I think it ends up with the equivalent of X or or or the mathematical symbol circle plus which is
07:58
used quite often and as and so multiplication works leader not exactly the same way that you do multidigit multiplication you know if you knew 15 times 15 in the 15 times 23 you know we do all the long school would do the additions and our film math works exactly the same way but except for as I mentioned earlier your additions does not carry so my little quick example of L 1 1 binary and multiplied by 1 1 binary you do that you have the 0 because you're multiplying that in the team's position and yet it with the 1 1 and you end up with 1 0 1 and so the tradition uh if you had done this with raw material as it would be well is what would be the like for you to use all of of 0 yeah that's what I was thinking was and so but so 1 minor differences is that since we're doing GF to the 128 we need to add a reducing factor whenever a value is larger to and then 128 and this is because each different fields in GF 128 has a different reducing factor and this is a commonly used as an irreducible polynomial I don't like that term because polynomials to all confusing and it makes there's a whole different notation that use very often see where it's X times to the 128 plus x plus 1 and other stuff I've always been confused by that I mean not that that and also turns out and this will be very important later on is that in G of 1 of G of to do 128 distribution works meaning if you do a plus B. times c you can distribute the C over to the a plus b and therefore it is equivalent to a times C plus B type C and so there's interesting tricks that will be able to play with
10:08
that so there's always dangers in implementing different parts of crypto and when I was working at sea arrived they develop that numerous different techniques on attacking systems through side channels such as my and but they were not so much onto the computer side but they were still needed to keep abreast of the various attacks so back in 2010 and bone in Hong wrote his master's thesis on attacking GCM secretive authentication value h which I talked a little bit earlier and in this paper he iterate the he basically goes through and shows that i if you're field math is not exactly constant time and you do use of various speedups where URI do precompute tables in order to improve performance and doing a that stuff you can actually snoop on the cashline figure out what values are being used on the secret values H and figure out actually when h is and you want to figure out what H as we saw here
11:25
if you know h you can actually and forge any message that you want a list of
11:34
all the way out of all so in his paper he was attacking purely the Galois field mass and not the ASI channel but as of 8 as the 2nd full points out that there are well known attack on side channel attacks further standard 5 Table version of a because when you're doing the best box look up and you have information and what the best box look up is often only is most often implemented as a to 256 byte array to do look up and then that becomes complicated there's likely if you have access to SSE registers there's other things that you can do to mitigate that assume you don't have a surprise but yes so so so I'll have a reference to this paper this is a the paper that it has this so I was so in this picture as he used as he said you're familiar with this paper this paper they actually demonstrates that the catch timing attack is of all of all over the network so and it so can comes down to more complicated obviously they are part of it is depends upon how long keys are used how much data is used but with and that technology like the EPA and other stuff it is as long as there is leakage he with enough traces you will be able to extract what you what I mean when I was that that's your I we've done a number of different things where where from mine who was working on the project I worked with him on some stuff you he basically said that on hardware using DPA you could actually targets individual transistors on a chip so I and when you're going attacking over the network the network just adds a little bit of noise but it's still but you know that the data is still there you just mean average out the noise is and depends upon how long how long how many traces you can provide to lower your noise for because an average out the noise is sensor yes but that noise from the processor itself is uncorrelated to what you're attack and because of that as you can cut and edge to edge to collect additional information that uncorrelated noise will drop out and only the correlation that you're targeting of only so you actually basically and when you collect all these and warranted EPA but you collect traces and then you say I'm going to guess that this as a and a guest at this s box I'm going to guess that this value is there is an 0 0 4 0 through 255 and you saw all of your traces based upon this and you look at the average is and the ones that don't work well those people who had chosen cyphertext that I passes it's more difficult to do because of their obviously not injected i and also the other but it also turns out like a couple of other things that need to be constant time is by comparing the tag itself because as you may know that and so and there's been a number 1 abilities where and that is part of the reason why we now actually have a and timing of men comp time timing safe version of the kernel of 1 another interesting thing tho is is a believer or not correcting sigh channels can improve performance while I was doing this work are imported open BST
15:57
version of GCI and when I 1st and what I was doing benchmarking seeing how to improve performance and I mistakenly ended up swapping the 2 values a and to the Galois field now now normally this should not be a problem because while it as we know you can sort of when you do not multiplication 15 times 20 is the same as 20 times 15 but the problem was that the implementation assumed that 1 side and 1 side the multiplication was and they were doing constant time but then they were doing in this statement based upon the other so in the case of GCM when we go back to when we go back to
16:47
here we're doing this cyphertext is public text but this age value is not and so
16:55
if you swap the 2 values around suddenly the you're now branching on H and leaking information about the private value and when I discovered this I was quite distressed because it was set to make this II ended up looking around and doing a little tweak to make this constant time and it turns out that the performance improved because now instead of this if statement in a loop that was iterating 128 times I now was doing a fugitive action map but I wasn't this predicting the branch and so performance actually improve the outcome the code was effectively constant time and close side channel tack of OpenView actually adopted that patch the tree so as I
17:43
mentioned earlier some of the ways that you improve the performance of the Galois field mass is by doing lookup tables because you can keep people were doing traditional like 8 and 8 bits at a time calculate what the Galois field now against stage was calculating it but as we as I discussed earlier I cash hat and cache effects you can actually figure out exactly what values multiplying so I ate that and some people at and assess I forget the other and were working on that library word doing some investigation and it turns out that the DGB and I forget somebody else actually found out that intron cache line axis is actually had a variation of the 1st 64 bits of cashline were actually access much quicker than the remaining bits of cash and so even though you would think it was the same it is not and so 1 of the proposals that they put forward was in order to help mitigate this it's still as because of the cache line intra cashline access it doesn't completely solve the side channel but it helps mitigate is instead when you have a 16 role lookup table you split into for 4 bytes of 16 bytes for each of the the GEF to 128 entries and then you split them against for cap across for cash lights and so now if you look when you look up table since you access all 4 lines you're no longer but this could you do you do not have as much of the footprint obviously the 1st 2 entries in this case because that those axis faster you'll be able tell those but the remaining what and it really comes down to the and compromises that you
19:44
have to make the improving adding this performance I forget that I don't have the numbers right now but adding this improves the software implementation significantly now instead of having you know I'm going through the last 4 times as often and accessing memory and other stuff so there are improvements and you really have to decide how many how much leakage due 1 versus how fast you want you can make things extremely secure but if it runs at 10 megabytes per 2nd nobody's going to use it you know if it runs at a few hundred next then they might actually use and as you know there's a lot of different noises and the system's turns out the source systems are a little bit easier more easily attacked
20:33
so that as part of my research there are intra and I alluded to some of these interesting opportunities when we go if you remember the GCM we've basically always did the multiply added it to the next value and then multiply the resulting value and so that's the equivalent of a 8 times age plus B. times age plus T times H was a lot of parentheses but because of the distribution property we can actually pull out all of those ages and distribute them and now we have a much easier and less dependent version where 8 1 8 times age the 4 plus B. times each the 3rd type of plus C times h squared and plus T times H and so now we can actually precompute H that H the for each cube squared in h have tables on those and you look ups as a and and a couple of this was the technique used on the by the until he still mold qt q power and careless multiply paper in order and in order to improve their performance because the PCL utq instruction actually had significantly to C and they could we wanted to hide that so being able to distribute that turns out that when I was working on the soccer implementation and I noticed that my that suffered our field math was Act was not actually using very many registers it was like using 2 or 3 registers where even on my 386 we have at the state and so I ended up applying this exact same performance benefit and got a significant performance boost I think was like over 50 % to so on obviously
22:28
a large part of this is a yes I helps completely avoid cash timing attacks but then that is because it actually implements the thing and we don't have to do look up to I 1 of the interesting parts of the of the PCL mole qt qt instructions is that until did not actually specified as the full 128 that multiply when you do the multiply you get 6 you multiply 64 to 64 bit numbers and end up with a 128 bit number so that means that you need you actually work in order to do the 128 bit multipliers that we need for a GCI now it turns out that if you just think a little bit differently and that the traditional way that you can do this is just the same way that you multiply the 2 2 digit numbers except for in this case they happen to be up to 64 bit digit numbers and this actually works for any side any word size that you're the target targeting about when I was at 0 I actually wrote some multiplication teams and Division routines that was targeted at 600 bit number size so it's kind of odd to think about having a singledigit contains 600 bits but that entirely happens so
23:57
I as this work was actually if and funded by the previously Foundation 1 of the requirements that they have was to review the code so I when I had the key code reviewed by the I believe this code was reviewed by Watson lab he pointed out 1 of the things that is very dangerous about GCM since GCM is encounter mode camera mode does not use what it what is normally called an idea uses what is called a not so if you end up reusing at and AES counter mode is effectively a cypher stream that you x or it's just a simple way to generate unpredictable noise it's kind of like generating a onetime pad that you x or onto it but with very small amounts of data the key plus the not the problem there was a sense of counter increments if you were able to replay counter you would actually end up with on the exact same key keystream and you'd be able to take 2 cyphertext X them together at the appropriate parts and you now get x 4 difference between text and it turns out if you like X or to English text together you're probably able figure out exactly what the 2 text or it's not a very good thing so 1 of the requirements that we ended up doing was that for both AS GCM and for Teramo mode reaction instead of allowing them like in the case of CBC where if you don't specify an IV will generate random data the code actually requires you to generate a of of actually specify the knots and so it that if you know what you're doing you know how to do it and generate a unique knots in your 6 if you don't know what you're doing well you should be writing cryptic and so on and so but this actually ensures that the user of the AS GCM code can ensure that they have had the courage secure cryptographic per primitives that they want the other interesting thing is is when I was implementing and the AS GCM code actually found a bug in the Reference Paper implementing GCM when they were doing the decrepit as part of the decrepit phase you want to do all of the authentication and authenticate that all the cyphertext is safe and then he attended the decryption it turns out that the comparisons of the tag that was calculated was not done correctly and it actually was verifying that all the bits in the tag provided were set in this and at the end that as the generated tags were but you could write from it if you can either like you can you specify a tag of all zeros are all ones and it would always work so obviously not a very a very good thing likely I send send him email and that was fixed in the next release of the paper but as you can see reviews are absolutely necessary when you're coming across cryptic and that code in his paper was actually but not on Intel's website for over 2 years before it was fixed and off
27:23
obviously testing and verification are to part of the discovery of that but it was the fact that I was generating knowingly incorrect data in comparing it but it was actually doing decryption passing and like that should not pass data that I know is not and so as part of this work by a set up and an additional set of the tests that are actually now distributed entirely part of previous about and they're actually if you have installed and have a head system from within like about 6 months or more if you go into user tests cis of encryptor run prescriptive you'll see a script that the that run tests and will actually run tests against this code the tests required that you have a Python and the NIST cap ports installed and then this cat port is basically a set of known answers to various crypto algorithms that are distributed by the by the way this government agency and so that's a standard checks CBC there's like 256 and some other stuff so I would like to thank
28:42
both negate and the figure the foundation for encouraging me in providing funding for this work and then also I would like to thank my reviewers my camera and what what's in lab for their work during the review stuff so in the
28:59
papers I review mentioned here so does anybody have any questions on tests on the on the on the left
29:14
you want to use where was he she is all of
29:21
the not so this is because most context you you you you work on this was not so well will if he yes as long is it like that that the whole point of the thing that needs to be protected in both GCM and in some areas covered as you said as ensures that they key and the nonce are never reused so if you always have a new key then you only have and the knots of all zeros than that is perfectly safe sensor never using the 2 together but you the reason why most people don't do that is because the distribution is very very difficult and you can think I forget what the sizes is but you can encrypt like multiple terabytes of data with this that I can go a little bit more specifics in the GCM they and use 96 uh there's the counter is 128 bits the nonce is 96 bits and at 96 bits that's too small to use random if they were using the full 128 bit you could use random but then you also have the problem that if you actually script for gigs of data then you're back down to 96 and then there's likely an overlap and so in GCN they have a 32 bit and so that and because I'm so that means that that you can encrypt but what is it to do the like 36 bytes worth of data per authentication without actually going over and then also on so there's an as you hear there so this so as long as you the whole key part and then i'd ICM is literally just this part of GCO so any other questions that's the really think is sort of all around the the world to all users in the of small gym can probably talk a little bit more on adequate quick assess there I know I know that they're working on I'm not sure how much of how ready here other stuff there about doing same anything Juma so so I know there's people interested in and stuff like that there's there's not so any other questions at the time of thank you for attending