.NET Data Security: Hope is not a Strategy
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 96 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/51826 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
.NET FrameworkCryptography.NET FrameworkPlastikkarteSign (mathematics)Software developerFreewareInsertion lossTotal S.A.E-learningInformationDependent and independent variablesInformation securityCryptographyBitPurchasingComplex (psychology)AuthorizationDirected graphStrategy gamePay televisionCausalityWebsiteInternetworkingLibrary (computing)
02:08
Hash functionData storage devicePasswordEncryptionMilitary operationRSA (algorithm)Vector spaceAlgorithmCovering spaceBitShared memoryTransport Layer SecurityQuicksortPrimitive (album)Video gameElectronic signatureWeb crawlerExpert systemHash functionCryptographyCommunications protocolRight angleMessage passingObject (grammar)ImplementationOperator (mathematics)Instance (computer science)CodeWindowPhysical systemMainframe computerEncryptionData managementAuthenticationKey (cryptography)Limit (category theory)InformationDerivation (linguistics)PasswordStreaming mediaLevel (video gaming)TowerHydraulic jumpWebsitePower (physics)Combinational logicData storage deviceSemiconductor memorySocial classFormal verificationData Encryption StandardDigitizingStrategy gameInformation securityMoore's lawCollisionGraph (mathematics)Sound effectIterationMultiplication sign.NET FrameworkSymmetric-key algorithmSoftwareVulnerability (computing)Different (Kate Ryan album)Moment (mathematics)Source codeComputer hardwareReverse engineeringPublic key certificate2 (number)Scaling (geometry)Vector spacePublic-key cryptographyPrime numberINTEGRALFirewall (computing)Goodness of fitArithmetic meanFront and back endsSystem callInternet service providerFunctional (mathematics)Service (economics)Spacetime1 (number)Computing platformDemosceneParameter (computer programming)Maxima and minimaPersonal identification numberMathematicsGraphics processing unitOperating systemNumberCoprocessorComplex (psychology)Random number generationDivisorBootingForcing (mathematics)Group actionCalculationOpen sourceSoftware testingTelebankingCodebuchFamilySymmetric matrixChainLink (knot theory)Probability density functionBlock (periodic table)BuildingType theoryPoint (geometry)Constructor (object-oriented programming)CryptanalysisEmailAnalogyDesign by contractWeb browserElectronic mailing listVery-high-bit-rate digital subscriber lineSampling (statistics)TouchscreenDependent and independent variablesPlastikkarteNational Institute of Standards and TechnologyProduct (business)Natural numberSoftware frameworkGreatest elementTable (information)Hybrid computerVotingSign (mathematics)NeuroinformatikSoftware developerProgrammer (hardware)Formal languageWeb 2.0Hacker (term)Process (computing)Client (computing)CuboidRandomizationVirtual machineLengthMereologyStack (abstract data type)Line (geometry)Graphics tabletDatabaseElectric generatorDiagramFocus (optics)NamespaceSet (mathematics)MassFingerprintCartesian coordinate systemJava appletStaff (military)Computer-assisted translationExecution unitWorkstation <Musikinstrument>Mathematical analysisNumbering schemeThread (computing)Complex analysisSimilarity (geometry)Module (mathematics)Server (computing)RSA (algorithm)MathematicianStandard deviationDemonComputer fileCASE <Informatik>Beta functionFreewareInheritance (object-oriented programming)Differential algebraic equationReading (process)Disk read-and-write headSurjective functionLibrary (computing)Beat (acoustics)LoginFlow separationSelf-organizationRevision controlDigital photographyData centerState of matterInternetworkingGame theoryValidity (statistics)LaptopUniform boundedness principleComputer animation
Transcript: English(auto-generated)
00:04
OK, thanks for coming along, everyone. My name is Stephen Haunts, and this talk is .NET Data Security. Hope is not a strategy. So a little bit about me first. So I work for a small startup in the UK called Buying Butler, where we aim to make complex purchases like cars very simple for our customers, whilst also providing
00:23
high-quality leads to dealers. And we also do a lot of work with insurance companies in handling their total loss claims. And that's kind of relevant to what I'm talking about today, because we work with a lot of data from insurance companies, and a lot of that is personal information about our different customers. So the security and the custodianship of that data
00:41
is very important. So I'm also a Pluralsight author, and I've done six courses now, working on my seventh. And one of them is about cryptography and .NET, so it kind of goes along with this talk. And I've also written several books for the Syncfusion eLearning library.
01:01
So we're going to cover quite a lot of material quite quickly today. So what I want to do is just make you aware of some resources you can use after this talk, should you want to follow up on any of this. So firstly is my book, Cryptography and .NET, succinctly. The book's completely free. I'm not trying to sell you anything. You just go to Syncfusion website, sign up, and download it. And that mirrors what we're going to be talking about today.
01:22
And I also have the course, PracticalCryptography and .NET on Pluralsight. And that covers what we're going over today, but it goes into a lot more detail when it talks about the why as well as the how of how to do a lot of this stuff. If you don't have a Pluralsight subscription, I have some cards which gives you one month free access. So feel free to come up and grab one after the talk.
01:45
OK, so I strongly believe that as developers is your responsibility to help secure and protect your company's data. And you just have to look at the news each day or each week, and you hear new data breaches that are happening.
02:01
And it's getting quite serious. So taking responsibility for your company's data is more important than ever. And typically, companies tend to make lots of excuses. And I've heard a lot of these excuses in companies I've worked at. So we're too small to be hacked. No one's going to bother with us. That's not necessarily true. It just means you're potentially an easier target.
02:21
And we have a firewall. No one's going to get through that and get to our data. Well, not necessarily. It's great that you've got firewalls in place. But what about your internal operations stuff? If you have any disgruntled employees, you can get access to your data and try and steal it. And the other one I've heard before as well is, we've never been hacked before. So why should we take this seriously?
02:42
Well, just because you've not been hacked before doesn't mean so you won't be in the future. So I truly believe that hope is not a strategy when it comes to security. Hoping that you won't get attacked or having any of your data stolen is not a good strategy for success. So how many people have worked in a company where you've
03:02
got a deadline looming, and you've got lots of features you need to get in, and then security just gets pushed further and further down the list of priorities to get products shipped? Has anyone had that scenario before? I've had it in just about every company I've worked at. And as a developer, it's kind of your responsibility
03:23
to try and push on these things and to try and stress how important it is to make sure security is not pushed to the bottom of the pile. So what this talk isn't about, it's not about deep mathematics or how a lot of these algorithms work. We would not be able to cover that in an hour. And it's 9 o'clock in the morning.
03:40
I'm sure nobody wants to do lots of complex maths. This talk also isn't about cryptanalysis. So cryptanalysis is the art and science of breaking code. So the sort of things that our governments probably spend a lot of their time doing every day, that's not what this talk's about. So what this talk is about is about people like you
04:00
and me who are regular developers. We work for companies, and we produce code day in, day out to provide value to our customers. So a lot of the code we're going to look at today and talk about is based around the .NET Framework. I suppose you can now call it the traditional .NET Framework. So server-side, web APIs, WCF services, or client code,
04:27
so WinForms, WPF applications, all that sort of thing. But just because we're talking about Microsoft APIs specifically in this talk, a lot of the principles we're going to discuss are kind of relevant across any platform. So the principles are all the same.
04:41
The APIs might be slightly different, but what you're trying to achieve is effectively the same across different languages. So Java, Ruby, PHP, Python, Node, the principles are exactly the same. OK, so what we're going to cover. So we're first going to talk about random numbers and why they're so important. We're then going to take a look at hashing and hash
05:01
message authentication codes. We'll then take a deeper look at secure password storage and password management. We'll then take a look at symmetric encryption, and then we'll look at asymmetric encryption with things like RSA. And then we'll look at digital signatures. And that will give us a lot of the building
05:20
blocks that we need to go ahead and build what is called a hybrid encryption scheme. So it's using a lot of these building blocks together to create something more powerful. OK, so what is cryptography? So I think it'd be a good idea to cover this, especially if any people have accidentally walked into the wrong room and are too embarrassed to walk out.
05:41
So cryptography is basically about protecting information, and generally that's done via encryption. And when you encrypt data, you have encryption keys. And that encrypted data then becomes what's called ciphertext. So this is generally what we're talking about with cryptography. And the art of trying to break codes and work out keys to decrypt data is called cryptanalysis.
06:03
But there's more to cryptography than just encryption. So there's four distinct pillars that we look at with cryptography. So there's confidentiality. This is what we all typically think of with cryptography. I have some data. I encrypt it with a key, and that data's completely scrambled. No one can read it. But also, we have the concept of integrity.
06:20
So if I have some data and I send it to my recipient, has that data been tampered with or corrupted in transit? So we can use cryptography to help us with data integrity. Also, we have authentication. So I have some encrypted data. Am I allowed to view this data? Am I authenticated to see it?
06:41
And also, we have non-repudiation. And this is all about proving that you have sent an encrypted message. So it's kind of a similar analogy to a contract. So if I send a contract to someone, and they then try and dispute that I sent them that contract, by using non-repudiation, we can actually prove that it was us that sent that contract to them.
07:03
So cryptography is pretty much everywhere. You can't switch on a device or do anything about any kind of cryptography being in place. So online shopping, if you're buying stuff off Amazon or any of your other favorite websites, you have the little padlock in the browser. Hopefully, you have the little padlock in the browser. ATM machines, when you're drawing out cash from the wall,
07:22
there's a cryptographic handshake that goes on when you put your PIN number in between you and the bank. Mobile phones these days, I mean, that's obviously a very hot topic at the moment, especially what's been going on with the FBI. They were trying to break into the San Bernardino killers iPhone. So a lot of these phones these days are heavily encrypted.
07:40
But also, there's uses like Bitcoin. So Bitcoin, as a currency, is a cryptographic protocol. So that's kind of another use for it. And another example of cryptography is in voting and vote machines. So proving that you've only voted once and using cryptography so that you
08:01
can't cheat the voting system. OK, so let's start off by looking at random numbers. So random numbers are effectively one of the most important primitives that we need when we're dealing with cryptography. And we use random numbers generally for creating encryption keys.
08:21
And a good random number needs to be truly random and non-predictable. Now traditionally in .NET, when you're doing random number generation, you might use something like system.random. And that's OK for trying to do something simple, like simulating a dice roll or some lottery numbers, for example. But when you're trying to generate random numbers for cryptographic keys, system.random
08:42
is not good enough. And it's also not thread-safe. Now system.random, it gives the appearance of randomness. But actually, it's very deterministic. So if you don't change the seed every time when you create a new random number, you'll get the same set of numbers out of it. So for cryptography, it's no good. So in .NET, there's a better class called RNGCryptoService
09:02
Provider. And this lives along with everything else we're talking about in the system.security.cryptography namespace. Now RNGCryptoServiceProvider is a lot slower to run in system.random. But the numbers you're going to get out of it are going to be non-deterministic, which makes it excellent for generating encryption keys.
09:21
So you'll see examples as we go through the talk where we generate 256 bits or 32 bytes random numbers, which we use as keys. Now RNGCryptoServiceProvider, it's not all implemented in .NET. It actually uses the underlying cryptographic platform in Windows, so the same sort of things that you'd use in a lot of C++ or the operating system
09:43
libraries. So RNGCryptoServiceProvider is very easy to use. And this will be a common theme. Everything we're talking about today is actually very easy to use. So in the little bit of sample code here, we have a method called generateRandomNumber, where we pass in a length. So that length is the number of bytes we want to generate.
10:02
So if you want a 32-byte random number, you pass 32 into that. We create an instance of the cryptoServiceProvider class. And then we initialize a new array to the correct length that we want. And then we just call getBytes. And then we return that byte array. So actually generating our encryption keys is as simple as those few lines of code.
10:25
So moving on to the next part in our sort of stack of primitives that we want to look at, we have hashing. The hashing, you can kind of think of it as a bit like a digital fingerprint of a piece of data. So if you have a piece of data,
10:40
be it a byte array of data or a PDF document, et cetera, if you generate a hash code of that piece of data, you're going to get a code out the end of it, which is effectively the fingerprint for that piece of data. If you then go and change that original document in any way and then recalculate the hash code, that hash will be completely different.
11:02
So with hashing, there's kind of four kind of requirements that we need from hashing. So first of all, a hash needs to be easy to compute. So I have a piece of data, run it for a hashing function, I get a hash code out the other end. And it should also be feasible to generate a specific hash. So you shouldn't be able to say, well,
11:20
if I have this hash code here, what's the data I need to create that hash? You shouldn't be able to do that. It's the other way around. You have a piece of data, you generate a hash code. Well, you run it through a hash function, any generator hash code. Another requirement is it should be infeasible to modify the original message without changing the hash.
11:40
So as I said before, if you have a piece of data, generate a hash code, you then just change just one bit of that data, that hash code should be completely different. Not slightly different, but completely different. And the final requirement of a good hashing algorithm is it should be infeasible to find two identical hashes. So you shouldn't be able to get one piece of data,
12:01
generate a hash code, have a second piece of data, and generate exactly the same hash code. That's called a hash collision. You shouldn't be able to do that. So hashing is what we call a one-way operation. So once you generate a hash code, you can't then or you shouldn't be able to then go back to the original message. Whereas encryption, as you can imagine,
12:20
is more like a two-way operation. So we encrypt a piece of data with a key, then we can use the same key to decrypt that data. So it's two-way, it's reversible. Whereas hashing is only one-way. And the most common hashing algorithm that people have probably heard of is MD5. And it's been around for a long time, well, since 1991.
12:40
And what this does is it produces a 16-byte hash value. And it was designed by a guy called Ron Rivest. But the problem with this is in 1996, a hash collision resistance vulnerability was generated. So someone managed to generate the same hash with different values or different pieces of data being passed into it. So MD5 as a hashing algorithm these days
13:01
isn't really good enough to use. But I still mention it here because if you work in a large organization, like a bank, for example, you may still have a lot of older legacy systems you need to integrate with, and they may still use MD5. So an example of that, I used to work for an internet bank in the UK. And our back-end banking platform
13:20
was an old AS400 mainframe system. And whenever we sent messages to and from that system, we had to generate MD5 hash codes. So it is possible you still need to use them. But for a new system, you wouldn't want to use MD5. So moving on from MD5 then, we have the secure hash family
13:42
or the SHA family of hashes. And there's kind of three versions of this. There's SHA1, which generates 160-bit hash function. And then there's SHA2, which can generate 256-bits or 512-bit hashes. And those two are both implemented in the .NET framework. But there is also a new one called SHA3,
14:03
which is now available. So SHA1 and SHA2 were both designed by the National Security Agency in the United States. And rightly or wrongly, that makes some people a little bit nervous. So there was a competition a while ago.
14:20
And the winner was announced in 2012 to find a new variant of the SHA algorithm, which is non-NSA-based. And the winner of that was an algorithm called, I always get this wrong, is it CACAC or SESAC? I'm not quite sure how you pronounce it. But currently, this isn't implemented in the .NET framework. But you can get some open source implementations of it,
14:41
whether you want to trust them or not. It's kind of up to you. I imagine it would be a matter of time before Microsoft implements it in the framework. But for what we're going to talk about today, we're going to look at SHA2 and SHA256. So it's very easy to use. So in our little method here, we pass in a byte array, which is our data that we
15:01
want to generate a hash for. And then we call the static method creates on the SHA256 object. And then you just call compute hash whilst passing in the data you want to hash. And then you get a byte array back, which is your hash code. So again, it's very, very easy to use. So moving on from hashing, to the next level,
15:21
we have what are called authenticated hashes, or hash message authentication codes, or HMACs, as they're often called. And conceptually, this is exactly the same as a SHA256 hash. So you pass some data in. You get a hash code out. But what's different is you can also have a key, which you pass in when you create the hash.
15:42
And what this does is it means that if I then send that hash to someone else, any one of you, for you to be able to recalculate that same hash, you need to have that key. This is where the idea of authentication comes in. You can only generate that same hash if you have the key.
16:01
So it's commonly used for both verified integrity and authentication. And you can use both MD5 and the SHA family of hashes or hash MACs. And the strength of this is based on the key. So if you use a good strong key, which is, say, 32 bytes long, 256 bits, this can be quite difficult for someone to then go and brute force that same key.
16:23
And the most common attack against this type of hashing algorithm is a brute force attack. But as I said, if you use a good strong key, it makes this quite hard to do. So again, using a hash MAC, very, very easy to use. So we have two pieces of information that are passing into our method here.
16:41
So we have a byte array of data to be hashed. And we have a byte array, which is our key. So that key was generated using the RNG crypto service provider. So we create an instance of the HMAC SHA256 object whilst passing the key into the constructor. And then you just simply call compute hash, passing in the data you want to hash. And you get hash code back.
17:01
So again, very, very easy to use. So next up, we want to talk about passwords. And there's various different ways in which you can manage passwords, ranging from not very good up to excellent. So the first one, we'll just get it out of the way
17:20
and then move on, is storing plain text passwords. I don't need to spend much time on this. I'm sure everyone knows that that isn't completely wrong. Well, there are still a lot of sites out there that do this. But you never store a plain text password in your database. So the next best thing is to hash a password.
17:41
And the way this works is you would, say you have a person logging on or signing up to a system, they type their password in. You create a hash, say a SHA256 hash of that password, and then store it in the database. Then the next time they come and log on, they type their password in, the hash is generated on the client and it's then compared against a hashing database. If they match, you put the correct password in.
18:03
But there's a problem with this. And that problem is that you can either brute force those passwords by trying lots and lots of different combinations, or you can use what's called a dictionary or a rainbow table attack, which is a massive pre-computed database of passwords and different password combinations. Even the clever things where you try and turn the vowels into numbers to outfox people.
18:23
All that sort of stuff will be in there. And the way a lot of these attacks work is using tools like Hashcat. You can use your GPUs in your computer, your graphics processing units, to actually do billions of hash attempts per second. So if you imagine if you've got a big powerful machine
18:41
with two NVIDIA GTX 1080s in there, imagine how many hashes you can do per second. So to give an example of how easy a hash password is to crack, there's a screenshot of a website here called crackstation.net. So in the gray box on the left, I've pasted a hash code in there,
19:02
which is SHA256 hash. You click crack hash, and it's worked out that the password is secret69. I mean, it's a very simple example, but conceptually that's how a lot of these sites work. And if you've got an MD5 hash, you can just paste it into Google and it'll reverse it for you.
19:21
Seriously, give it a try. It's quite scary. So has anyone ever worked on a system then where you've used hashing to store passwords in a database? I've worked on systems that do it before. I think pretty much everyone has. So what's the next best thing that you can do that's sort of the next level on from that?
19:41
So you can do what is called a salted hash. And what a salted hash is is the password plus a salt value. And a salt value is just an arbitrary random piece of data. Which generally, you know, it's another random number which you generate with RNG crypto service provider. And then you append that onto your password,
20:00
and then you create a hash of that password and the salt. And this is good. It means it's much, much harder, or probably at the moment impossible, to try and brute force any of these passwords. So that's great. Has anyone done this in any systems? Yeah, you know, again, this is quite a common way of doing it. And, you know, there's nothing wrong with that.
20:20
But the problem with this is, is as GPUs and processes increase over time, you know, a password, a salted password, which is secure now, who's to say it might be vulnerable in five years' time? You just don't know. And this is a problem with Moore's Law. You know, processor speeds and GPU speeds are increasing like that.
20:40
So it's only a matter of time before someone comes out of a GPU which is capable of cracking a salted hash. So what we want to do is we want to go one step further and we want to try and mitigate this problem of trying billions of hash attempts per second. So the next best thing, and sort of the recommended thing to use,
21:01
is what's called a password-based key derivation function. Or if you want to impress your friends down the pub, it's a PBKDF2, if you like acronyms. Again, this is the same as what we've been talking about. So we have a password that we want to hash. We have a salt. But what we also have here is a number of iterations
21:20
number that we pass in. And what this is is it tells the algorithm how many times to rehash that password. And the reason this is good is at the moment, if you can, say, test two billion combinations per second, if you have enough iterations on your password-based key derivation function, you might reduce that down to the fact that you can only test, say, 10 per second
21:42
or two per second, depending on what you pass in there. And I'll show you a graph in a moment of what the different speeds and increases look like. So first of all, I'll show you how to use it. We have our method here. We're passing in a byte array of our data to be hashed, just as before.
22:00
A byte array of our salt, so it's 32-byte random number of just sort of junk that you append onto the password. And we have a number of iterations. Now, the class in .NET Framework you want to use is called RFC 2898-derived bytes. So you'll be forgiven for overlooking that one in the framework, because it's not obvious what it does.
22:24
And under the bonnet or under the covers, the way RFC 2898 works, it uses SHA-1 to do its hashing, which means you get a 20-byte hash value out of it. So when I call get bytes, I only really need to get the first 20 bytes for that hash value.
22:41
So if you look at the chart here, so when I first created this chart, I was using an older laptop, but I tested some hashes. So 100 iterations, it took two milliseconds to hash a password. 1,000 iterations took 16 milliseconds. 10,000 iterations took 196 milliseconds. And you can see it sort of scales up.
23:01
And then when I did 500,000 iterations, it took seven seconds to hash a password. Now, the value you put in there is a trade-off. You have to look at what you're using the hash for and what the speed implications are gonna be for you. So if you have a good robust website, you may notice sometimes when you put the password in, there might be a bit of a delay as you log in.
23:21
That's probably because they're doing a password-based key derivation function call behind the scenes. So systems I've worked on, I've typically used anywhere between 50 and 100,000 iterations to hash a password for logging in the system because that kind of natural delay is kind of okay. Well, I think it's okay. But if you're hashing data
23:40
on something that's sort of high-speed transactional, then 50,000 iterations would be too slow. So you need to think about the trade-offs of how many iterations you want and what the speed penalties are gonna be. So once we're on the subject of passwords, this company's been brought up several times while we're here. It's fun to talk about.
24:03
There's a, I take it everyone's heard of this. Everyone saw Troy's keynote yesterday. So one of the things that happens when Ashley Madison were hacked is the password tables were all stolen. But Ashley Madison had actually been quite good.
24:21
So they'd used something called bcrypt to encrypt their passwords. Now bcrypt is something that's very similar to the password-based key derivation function. It's an iteration-based hash function. It's just a different type of implementation. So they'd use this across their passwords and the attackers tried to recover a lot of the passwords and they couldn't. So that's good. But they then also had access to the source code
24:40
which had stolen. And what they had found was that some unwitting programmer had probably tried to optimize the logging in system. I'm not quite sure what their motive was. But they'd started storing a local token of the password and the username which they then MD5 hashed.
25:01
So I think what it was is when you come back to relog into the system it'll relog you in quicker. So they probably thought they were doing something good. Making the relogging in process quicker. So when the hackers found this out they're like, well, let's not attack the bcrypt passwords. Let's attack the MD5 hashes. So they did and they managed to recover I think 10 million passwords from the system.
25:24
So the reason why I'm saying this story is security is only as good as your weakest link. So their password management generally was pretty good. They used bcrypt to store their passwords. But they had a weak link in the chain where they were storing this token with MD5 hash passwords. Which meant all the good stuff they'd done with bcrypt
25:40
was basically void at that point. So there's a really good article on Ars Technica. So I've put a bitly link there which goes into that story in a lot more detail. And it's quite an entertaining read. I'd definitely recommend reading it. Okay, so let's move on to encryption.
26:01
So first of all we're gonna talk about symmetric encryption. And what this is, is you have some plain text data and you encrypt it with a key which gives you your cipher text data. But then to decrypt the message you decrypt it with the same key. That's why it's symmetric. So you use the same key to encrypt and decrypt.
26:21
But there is a drawback to symmetric encryption. And that is that sharing keys is very difficult to do. So if I encrypt some data and I know I want to send that data to say five of you in the audience, how do we share that key? I can't email it to you if that's a bit of vulnerability. I can't just put it on the network somewhere. Maybe I could meet you all in person and hand it to you.
26:43
So key sharing generally is quite hard to do. And one of the things we're gonna talk about later is how to mitigate the complexities of key sharing. So this is a diagram we looked at earlier. So where we're saying that hashing is a one-way function, this just sort of reiterates the point that encryption is a two-way operation.
27:04
So you have some data, you encrypt it, but you can also reverse that operation and get your data back. Okay, so the way symmetric encryption works is it works by getting the data you want to encrypt and it chops it up into blocks and it encrypts several bytes at a time.
27:20
And these blocks are padded so that they're the same size. So if you have some data, you chunk it up into say 128-bit blocks, if the block at the end isn't the same size, or is too small, then you just pad it out. And there's three symmetric encryption algorithms that are supported in .NET. So there's AES, DES, and triple DES.
27:43
So we're mostly gonna focus on AES because that's the one that's recommended to use these days. But the reason I've put DES and triple DES on there is again, if you're working with legacy systems that use DES to encrypt data, if you need to interact with those systems, you'll then need to use DES to decrypt that data.
28:03
So AES is what we're gonna look at, and it was invented by two mathematicians, Joan Damon and Vincent Ryman, and they created what was called the Rindell Cipher. And then in 2001, the National Institute of Standards and Technology adopted the Rindell Cipher
28:21
as the AES Advanced Encryption Standard. And the way AES works is quite simple. So you pass into it your plain text, so a byte array of the data you want to encrypt. You also pass in a byte array of something which is called an initialization vector. And what that is, it's a small byte array of data
28:41
which is used to help jumpstart the AES encryption algorithm. The initialization vector doesn't have to be kept secret. You can send it along with your message. The secrecy isn't based on the initialization vector. And then you also pass in a key. So AES supports 128, 192, and 256-bit keys.
29:01
So I always recommend you just go straight for 256-bit keys, which is 32 bytes. So you pass all those into the AES algorithm, and then you get your ciphertext back out the other end. And then to decrypt that data, instead of passing in the plain text, you pass in the encrypted data, the same initialization vector and the same key, and then it decrypts your data.
29:24
So in .NET, there's two implementations of AES that you can use. There's one called AES Managed, and there's also one called AES Crypto Service Provider. So AES Managed is natively written in .NET. So it's a CLR-based object.
29:40
And it works fine. I've used it several times. But the main drawback is it's not what's called FIPS197-2 certified. And if you're only encrypting and decrypting data between .NET systems, that might not necessarily be a problem. But if you're working with a lot of other systems that are written in Java, Node,
30:00
or any sort of mainframe systems, using implementations that are FIPS certified means that you're guaranteed that any data you encrypt in, say, .NET, you can then go and decrypt on a mainframe. So the AES Crypto Service Provider object in .NET is FIPS197-2 certified.
30:21
And it's not written in .NET. It uses the underlying Windows crypto platform. So it's quite straightforward to use. So we have a method here called data to encrypt. We pass in a byte array of our key. So that's a 32-byte byte array.
30:42
And we pass in an initialization vector, which is 16 bytes. And then we create an instance of the AES Crypto Service Provider class. We pass in the key and initialization vector. And we then create a memory in a crypto stream because it's all stream-based. And then you just write the data into the stream
31:00
and flush it. And then that gives you your encrypted data back out the other end as a byte array. So decrypting data is very similar. So you pass in the key and the initialization vector, create the Crypto Service Provider object, pass in the key and the IV, and then you create your memory stream and crypto stream.
31:25
And you've got a thing here called the AES Decryptor. And then that decrypts your data back into a byte array. So the next one to look at is asymmetric encryption. So what we've talked about has been symmetric so far.
31:41
You use the same key to encrypt and to decrypt. So the next one is asymmetric encryption. And you've probably heard this commonly referred to as public and private key cryptography. So the idea is you have some data you want to encrypt. You encrypt it with your recipient's public key.
32:01
You then send them that data. And then to decrypt it, they use their private key. They're the only person that will have their private key so they have to look after it. But their public key, anyone can have it. You can post it on your website. You can hand it out. It doesn't matter. So we're going to use an algorithm called RSA.
32:21
And it was developed by a company called RSA Data Security Incorporated by three guys, Rivest, Shamir, and Edelman. And the way RSA works is it's more of a mathematical process, whereas AES is algorithmic. It works on blocks of data. It's very algorithmic in how it works. RSA is more mathematical and it uses modulus arithmetic.
32:42
And the way it works is that there should be no efficient way to factor very large prime numbers. So if we have a key which is 2048 bits, which is the recommended minimum key length, that key is basically one massive prime number.
33:03
The one drawback of RSA is because it's a mathematical scheme, the larger the key size you use, the slower RSA is, and it is quite slow. So as I was saying, the keys are based on prime number factorization. So if you have two prime numbers, 23 and 17, you know, if I say to you multiply them together,
33:23
it's quite easy to do. You can do it in your head or in a calculator. It's very straightforward. We've asked to say what two prime numbers do you need to multiply together to make 5,963? Does anyone know the answer to that? Pretty sure someone's gonna be able to say it one day. And make me look really stupid.
33:43
Okay, so it's a lot harder to do. So the answer is 67 times 89 is 5,963. So the public key is 5,963. That's the number that everyone else can know. But the private key is those two prime number factors,
34:00
67 and 89, and that's the bit you want to keep secret. So there's a lot more to how RSA keys work than that, but fundamentally, it's all based around the complexity of factorizing prime numbers. So that all sounds quite complicated, but to use it, it really isn't that hard. So first of all, we want to generate some keys.
34:22
So we have a method here called assign new key. And we create an instance of the RSA crypto service provider class and we pass in the key strength that we want to use. So we're gonna use 2048 bits in this example. And then to export our public key, we just call export parameters whilst passing in false.
34:41
And then to generate our private key, we just call export parameters whilst passing in true. So in the code there, I mean, we're just storing the keys in memory. Unfortunately, we haven't got time to talk about effective key management strategies. But typically, you know, you don't just want to write these out to files and keep them on your server,
35:01
because that's not very safe. You probably want to use certificates or hardware security modules, which are network appliances that go into your data center, which are designed for storing keys. But for the purposes of the example, we're just gonna store the keys in memory. So to encrypt some data, we have our method here and we pass in our byte array
35:21
of our data we want to encrypt. And we create an instance of RSA crypto service provider again, whilst passing in the strength of the key we want. Then we call import parameters and we pass in our public key. And then we just call RSA dot encrypt. And then that encrypts the data and gives us a byte array of our encrypted data back.
35:43
To decrypt the data, very similar, create an instance of RSA crypto service provider, import our private key, and then just call RSA dot decrypt. And that gives us our decrypted data back. One particular problem with RSA is
36:01
you can only encrypt data up to the size of the key. So if you've got 2048 bit key, you can only encrypt a maximum of 2048 bits of data. So you could, you know, have your data that you want to encrypt and then split it up into chunks and encrypt each of those different bits, bits of data.
36:21
But generally you're limited on how much you can encrypt at once with RSA. That's not necessarily a problem, which we'll come on to later. So the final primitive we're gonna look at is digital signatures. So a digital signature consists of three different algorithms that we're going to use. So we have a key generator, which we've just seen.
36:42
We have a signing algorithm, so we're gonna sign a piece of data. And then we have a signature verifier. So if we have a digital signature of a piece of data, say a PDF document, and we then want to verify that a signature is valid, we use a signature verifier. And the key generator is gonna be based on RSA
37:02
as we've just seen. And the way the signing algorithm works is that we sign our data using the private key. So if you look back to when we did RSA, we encrypted the data with the recipient's public key. When we create a digital signature of data, we're actually gonna use our own private key to create that digital signature.
37:22
Then when the recipient wants to verify that the signature is valid, they use my public key. So typically the way a digital signature works is you don't create a digital signature of the actual data itself. So if you're trying to create a signature of, say, a large PDF document,
37:41
you'd create a SHA-256 hash of that document first, and then you'd do the digital signature of that hash. Because the digital signatures use RSA under the covers, it has the same limitations in the amount of data that you can create a signature for in one go. So typically you create a hash of your data,
38:02
and then you create a digital signature of that hash. So if we look at my expert piece of artwork to demonstrate this, so we have a guy called Bob, and he wants to create a digital signature. So he does that using his private key. He then sends that digital signature over the internet,
38:20
or the intergalactic spider's web, as my picture shows. He sends that to Alice, and then she wants to verify that her signature is valid. So she uses Bob's public key with the signature verifier. And if it was indeed Bob that sent that signature, then it would be valid.
38:43
So earlier on we talked about the concept of non-repudiation, about being able to prove that someone has sent something. The reason we know it was Bob that sent this digital signature is because it used his private key. So only Bob knows his private key. So if we can verify that the signature's valid when it's sent to us,
39:02
it can only have come from Bob unless his private key has been stolen. So to use digital signatures, again we need to generate a key pair, so same code as what we used before. We export our public and private key. And then to sign some data,
39:20
we pass in a byte array, which is the hash of the data we want to sign. So PDF document, create a hash of that data, pass it into this method. We then import our private key, and we then create an instance of a class called RSA PKCS1 Signature Formatter. I don't know who comes up with these names, but again, it's one that you probably,
39:43
it's very easy to overlook it in the .NET framework. So we set a hashing algorithm on that, so under the covers we can use SHA256. And then you just call create signature and pass in the hash of the data you want to create the signature on. And then you get a byte array returned,
40:00
which is your digital signature. To verify the digital signature is valid, we have a method here to do that. So we pass in the hash of the data to sign and the actual byte array of the digital signature itself. We import the public key, because we're using the sender's public key to verify the signature.
40:22
We create an instance of RSA PKCS1 Signature Deformatter, which just rolls off the tongue, that one. Again, set the hashing algorithm to be SHA256, and then you call verify signature, passing in the hash of the data that was signed and the actual signature itself, and that just returns a boolean, true or false.
40:43
True, it's a valid signature, false, it's not. So if, for example, when we go to generate, or when we go to verify the signature, that hash that we're trying to verify, if that's been changed in any way, and you pass it into verify signature with the digital signature itself,
41:01
if that data's been changed, then verify will come back as false, because it's not a valid digital signature for that data. Okay, so if we recap our four main pillars of cryptography, so first of all we had confidentiality. And for confidentiality we've used both AES and RSA.
41:26
For integrity, we've looked at hashing, and we've discussed a lot about SHA256. For authentication, we've used hash message authentication codes based around SHA256. And for non-repudiation,
41:40
we've just looked at digital signatures. So now what we want to do is use a lot of these together to create what's called a hybrid encryption scheme. Okay, so as we discussed, RSA has limits on the amount of data you can encrypt in one go,
42:01
and it's quite slow. But AES is very fast, it's algorithmic, it's quite efficient, but exchanging keys is very difficult. So what we want to do is combine RSA and AES to create what's called a hybrid encryption scheme. So we're using the power of both these asymmetric and symmetric encryption algorithms
42:20
to encrypt data and share keys. So if we, look at the example here, so we create an AES session key, and that's just a first two byte random number that we can use as our key for AES. We encrypt some data with that key, but then what we do is we use our public key and RSA
42:43
to then encrypt that session key. We send that across to our recipient, they use their private key to decrypt that session key, then once they've recovered that key, they can then decrypt the message with AES. So when we send our data to the recipient to decrypt,
43:02
we're sending them three pieces of information. So we've got the RSA encrypted session key, we've got the initialization vector, which we use for jump starting AES, and we have the actual AES encrypted data itself. So let's run through that as an example. So we've got Alice,
43:20
she generates a 32 byte AES key, and she generates her 16 bytes initialization vector. She encrypts her data with AES, so using that session key, we've now encrypted our data, so that's good. We then use Bob's public key and RSA to encrypt that session key.
43:44
We can then send all that data, package it all up and then send it across to Bob. So on the other end, so Bob's received this packet of information, so he uses his private key to decrypt the AES session key. So we've recovered the key, we can now use AES with the initialization vector
44:02
to decrypt our data. And then Bob can read the message, so meet me at noon below the clock tower where a red rose in your button. I've been reading far too many spy novels. So to reiterate this, let's get Bob to send a message back to Alice. So Bob generates his own AES session key,
44:21
because once he's used the other one, we're throwing it away, we're not going to reuse it, we can generate a new one. So he generates a new 32-byte key, he generates his own initialization vector, which is 16 bytes, and he uses AES and that session key and initialization vector to encrypt his reply.
44:40
He then uses Alice's public key, so we're sending a message back to Alice, so we can use her public key to encrypt that session key. He packages it up and emails it, or however he's going to send it back to Alice. So she then uses her private key to recover that AES session key.
45:03
And then she uses that recovered key with the initialization vector to decrypt the message reply. And the message is, I will meet you, I'll be wearing a blue hat and red boots. She's very fashionable. So that's pretty good. So we've used the flexibility of RSA
45:23
to be able to securely share keys, between our recipient and sender. But we've also used the speed and efficiency of AES to encrypt our actual message. So we kind of fixed two of our problems. But now let's add some integrity to that.
45:41
So if Alice sends some data to Bob, Bob wants to make sure that that data hasn't been tampered with or corrupted in transit. So as before, we use a session key, which we generate for AES. We encrypt our data. We then use RSA and a public key to encrypt that session key.
46:03
And then we also generate a hash message or authentication code of the encrypted message. And because we're using a hash Mac, we have to pass a key into it. So we use the session key. So that means on the other side, when we sent the message, the only way that they can, the recipient can check that hash message is valid
46:21
is by recovering the key. So they need their private key to do that. Which is where the idea of authentication comes in. They can only verify that that hash is valid if they can recover the key, the AES key. And they need their private key to do that. So that means when we send our data
46:40
across to our recipient, we've got the RSA encrypted session key. We've got the AES initialization vector. We've got the AES encrypted data. And we're also sending the hash Mac of our encrypted data. So that's all pretty good. So let's take that one step further. So we've added integrity to our message.
47:02
We've got effective key sharing using RSA. And we're using the flexibility and the power of AES to encrypt our message. But now we want to have the ability for our recipient to be able to prove that it was actually Alice that sent the message to him. And that's where we're going to use digital signatures.
47:20
So as before, we generate our AES session key. We encrypt our data with that session key. We use RSA and the public key to encrypt that session key. We then create a hash Mac of the encrypted message, which we've already encrypted with AES using the session key as the hash Mac key. And then we create a digital signature of that hash Mac.
47:46
So we've created the hash Mac already. We then create a digital signature using the recipient's own private key. This means when we send the data across to the recipient, we have the RSA encrypted session key. We have our initialization vector.
48:01
We've got our AES encrypted data. We've got the hash Mac of the data. So that's how we check in integrity on the other ends by checking the hash. And we've also created a digital signature of that hash. So when Alice sends a message to Bob, Bob can be sure that it was actually Alice that sent the message and not some other third party.
48:23
So we've covered quite a lot in a short space of time there. So we've covered random numbers, hashing and hash Macs, secure password storage, AES encryption, RSA encryption, digital signatures, and hybrid cryptography. So I'm sure you're all gonna remember that by five o'clock this afternoon.
48:43
So what next? So what we've talked about today, we've covered a lot in an hour. So really treat this talk as the art of the possible. What can we do with stuff that's in .NET? So if you are interested in using this, I really do encourage you to download the book.
49:01
It kind of mirrors what we've talked about today. That book does come with a lot of sample code. So all the code snippets I've showed you on the screen today, it all comes in a solution file. You can just use the code, steal it, and use it in your own solutions. If you've got access to Pluralsight, say my practical cryptography in .NET course covers what we've talked about in a lot more detail.
49:22
It talks about the why we do a lot of this as opposed to just the how. Again, there's lots of sample code you can download with that course as well. If you don't have access to Pluralsight, come see me afterwards, and I've got some access cards which I can give you. But cryptography itself is a fascinating subject. I mean, when you start looking at the history
49:42
of how cryptography came about, it's an absolutely fascinating subject. So if you sort of want to read a bit more into it, then there's some books here that I highly recommend looking at. So the first one is called The Code Book by Simon Singh. It's a relatively short book. It's about the size of a standard novel. And that covers the history from back in the days like Mary, Queen of Scots,
50:02
and sort of the Romans, right the way through to RSA and sort of modern digital cryptographic protocols. It's quite an easy read. It's not mathematical or very complicated. It's written more like a novel. So I highly recommend that book. My personal favorite book is a book called Everyday Cryptography by a guy called Keith Martin.
50:23
And this book split into two. So the first half goes into a lot of detail about the protocols and primitives that we talked about today. But he actually sort of talks about how they work under the covers. Then the second half of the book is how they're actually applied to real life. So how sort of Wi-Fi encryption works, how SSL and TLS actually works.
50:43
So where we discuss hybrid cryptography, the way TLS works is very similar in how it does the key sharing handshake. May not necessarily use RSA, but it's a very similar concept. And probably the most famous cryptography book and the book that the NSA actually tried to ban back in the 90s, unsuccessfully,
51:00
is a book by Bruce Schneier called Applied Cryptography. This book doesn't cover AES. AES kind of came after when this book was written. It's quite a hard book to read, but if you really want to get into the nitty-gritty detail of how a lot of these algorithms work and you're not scared by a bit of maths, then that book's quite good as well.
51:22
So thank you very much. I'm gonna be hanging around for a few minutes, plus I'll be around the conference for the rest of the day. On your way out, I'd be very grateful if you could vote on the session as well.
51:43
If you press the green button, you are awesome. Thank you.