Reaping and breaking keys at scale: when crypto meets big data
Formal Metadata
Title 
Reaping and breaking keys at scale: when crypto meets big data

Title of Series  
Author 

License 
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. 
Identifiers 

Publisher 

Release Date 
2018

Language 
English

Content Metadata
Subject Area  
Abstract 
Public keys are everywhere, after all, they are public. These keys are waiting to be reaped by those who know their real value. Hidden behind this public face lurks some potentially dangerous issues which could lead to a compromise of data and privacy. Leveraging hundreds of minion devices, we built a public key reaping machine (which we are open sourcing) and operated it on a global scale. Collected keys are tested for vulnerabilities such as the recent ROCA vulnerability or factorization using batchGCD. We've collected over 300 million keys so far and built a database 4 to 10 times bigger than previous public works. Performing the initial computation on over 300 million keys took about 10 days on a 280 vCPU cluster. Many optimizations allow our tool to incrementally test new RSA keys for common prime factors against the whole dataset in just a few minutes. As a result of our research, we could have impersonated hundreds of people by breaking their PGP keys, mimicked thousands of servers thanks to their factored SSH keys and performed MitM attacks on over 200k websites relying on vulnerable X509 certificates. In the end, we were able to do this in an entirely passive way. Going further is possible, but it would lead us to the dark side. Would big brother hesitate to go there?

00:00
Scale (map)
Cryptography
Pointer (computer programming)
Scaling (geometry)
Video game
Electronic mailing list
Key (cryptography)
Information security
Publickey cryptography
Information security
00:27
Server (computing)
Key (cryptography)
Cryptosystem
Building
Line (geometry)
Menu (computing)
Nonstandard analysis
Bit
Prime number
Bit
Number
Mathematics
Cryptography
Lie group
Program slicing
Right angle
Key (cryptography)
Integer
Laptop
RSA (algorithm)
01:48
NPhard
Scripting language
Level of measurement
Divisor
INTEGRAL
Multiplication sign
Curve
Ellipse
Number
Prime ideal
Cryptography
Information security
Physical system
Vorwärtsfehlerkorrektur
Algorithm
Stapeldatei
Scaling (geometry)
Key (cryptography)
Greatest common divisor
Diskreter Logarithmus
Java applet
Bit
Cryptography
Publickey cryptography
Hand fan
Elliptic curve
Lie group
Greatest common divisor
RSA (algorithm)
03:46
Curve
Stapeldatei
Scaling (geometry)
Key (cryptography)
Divisor
Validity (statistics)
Bit
Parameter (computer programming)
Instance (computer science)
Publickey cryptography
Public key certificate
Prime ideal
Frequency
Type theory
Password
Infinite conjugacy class property
Smartphone
Key (cryptography)
Endliche Modelltheorie
Greatest common divisor
Resultant
06:09
Source code
Multiplication
Server (computing)
Public key certificate
Key (cryptography)
Observational study
Set (mathematics)
Plastikkarte
Database
Public key certificate
Key (cryptography)
Greatest common divisor
Endliche Modelltheorie
Resultant
Library (computing)
Vulnerability (computing)
07:20
Stapeldatei
Domain name
Presentation of a group
Implementation
Server (computing)
Parsing
Link (knot theory)
Open source
Computer file
Code
Length
Virtual machine
Boom (sailing)
Set (mathematics)
Open set
Public key certificate
Type theory
Semiconductor memory
Uniqueness quantification
Software testing
Endliche Modelltheorie
RSA (algorithm)
Scripting language
Cone penetration test
Scaling (geometry)
Public key certificate
Key (cryptography)
Demo (music)
Greatest common divisor
Projective plane
Electronic mailing list
Database
Publickey cryptography
Type theory
Uniform boundedness principle
Word
Software
Key (cryptography)
Greatest common divisor
RSA (algorithm)
11:44
Mass flow rate
Key (cryptography)
Demo (music)
Maxima and minima
Set (mathematics)
Database
GEDCOM
Publickey cryptography
God
Process (computing)
Internet forum
Queue (abstract data type)
Website
Software testing
Key (cryptography)
Information security
12:45
Web page
Randomization
Implementation
Divisor
Tape drive
Virtual machine
Control flow
Prime number
Demoscene
Public key certificate
IP address
Neuroinformatik
Twitter
Prime ideal
Goodness of fit
Mathematics
Different (Kate Ryan album)
Reduction of order
Encryption
Divisor
Position operator
Metropolitan area network
Partition (number theory)
Scripting language
Curve
Stapeldatei
Algorithm
Email
Gateway (telecommunications)
Key (cryptography)
Weight
Data storage device
Bit
Demoscene
Electronic signature
Data mining
Word
Message passing
Befehlsprozessor
Website
Key (cryptography)
Greatest common divisor
Table (information)
Resultant
RSA (algorithm)
Router (computing)
00:00
hi everyone my name is Nils I'm from Switzerland I came today with my colleague Jolin we are in the research team at Podolski security I like to fish and I'm really into video games and we are going to talk about collecting public keys and breaking them at scale so who has a public key on github geeet
00:32
lab or has pushed the PGP key on a key server an SSH key ok so a lot of people so what if I told you those who raise your hand we have your key so far so good nothing to worry your back right so now what if I told you we could actually break those keys and retrieve the private key out of your public key sounds more fun right and so if we can do it yes who can do it  like NSA that's easy peasy for them right so one question remains how do you retrieve the private key out of the public key um well that's a really good question so
01:28
I'm a math guy so I'll do a little you know crypto recap first so most keys out there are using the erase a crypto systems so everything is really simple you just need to take two large prime numbers okay that slice is a bit dry it won't do it to large prime
01:48
numbers much easier so you take P and Q Prime really large two thousand or more bits it only you multiply them together awesome you get n that is your public modulus it's your public key and the security of RSA is all based on the fact that you cannot easily factor n into P and it's really hard to factor large numbers and so forth so you get your private key P and Q and n outside on github PGP whatever but what happens if somebody else generated another public key M using the same prime prime Q as you did so let's say he has a key Q times R equal to M you can actually really easily find the greatest common divisor of both and an M that's everybody learnt at school elite algorithm and that's easypeasy so that's really bad for every say now I won't tell me every see is no good there are other fan system out there like you
03:04
know elliptic curve cryptography and elliptic curve cryptography does not rely on the ordinance of the factoring of large integrals it relies instead of the hardness of solving solving the discrete logarithm problem and it is not impacted by aving a lot of keys and like for ever say where you can just try to batch factor all the public is out there and so elliptic curve cryptography or more secured said and ever say again stored scale attacks like these we trial but we'll also be talking about a bit about ECC at the end of our talk so what kind
03:48
of attacks can we actually do against public keys we want to remain silent you know we don't want to take a branch and go hit the guy to ask him for his password he will not die he will remark you know you will why do you hit me so we try we can try to use the return of the coppersmiths attack which was like last year smartphones were generating bad primes and you can easily want to easily factors are produced in tankers so another problem you can have is invalid parameters in DSA for example or if you take it to small key size for your ECC or DSA or even forever say if you take and read bits Prime it's no good you can easily factor of those invalid curves attacker also saying but we'll see later it's not a same claim so in the end the one we're really talking about today is array.c models factorization at scale so it means we do we were doing batch GCD that GCD has actually already be done the past by academics but it was done at academic scale so uncertain data size of maybe 10 million keys 20 million keys 80 million keys we did it at scale and much larger scale these are all known attacks anybody can do it and you are completely deceived you won't even know if you are vulnerable to these so Nell's
05:18
how do we collect these keys so these are public keys are everywhere and we have focused on the three most common key container types and namely certificates of x.509 certificates you can find in HTTPS for instance SSH keys and PGP keys and we have found quite a few interesting results and also a few crazy things that we will talk about later in this talk and for example as you know certificates have a validity period so it's supposed to start being valid at some point and expire at a later date and some certificates actually have a negative volatile period I mean why would you do that
06:10
most keys we have come from certs about 70 percent of our data set came from certificates so that's about 240 million keys we also have 19 million SSH keys and about 10 million pvp keys in our database
06:30
those certs mostly come from HTTP scans that we perform using a custom tool that I will talk about later and SSH key is mostly came from a study that was made by Crocs those guys generated a lot of SSH keys with multiple software libraries and smart cards and they tried to see if there were any vulnerabilities in those labs we were able to validate their results namely we found just as they did that one of the smart card model was generating keys with common factors so that was really bad and also PGP key so most of those keys came from the of SKS key servers so why is it
07:24
interesting to actually have as many public keys as we can so when we run back TCP and RSA keys you are looking for common factors right so the more keys you have the more chances there are of having two keys with a common factor in the data set so this is why it's good to have as many as possible and we currently have over 40 million keys this is still growing we're still ingesting keys from 32 certificate transparency so this is a project that was initiated by Google and it's basically a log servers where you can get certificates so we can grab the keys from there and also from those log servers we were able to collect a list of 270 million domain names and subdomains you can do a lot of stuff with that yeah it's way more than an exact top one mininum oh yeah so we found that today RSA is still the most common key type 95 percent of keys are still RSA so rocket a convert GCD is targeting that so be careful easy keys are getting more popular and this is basically what I guess you should start using today and then we still have some DSA keys I mean who uses that anyway today and a few other less wellknown she words about tools so we've been using a tool that was developed by one of our colleagues it's called scanner so it's not just Network scanner it's also a fingerprinting engine it's written in Erlang it's open source it's on github you can download it today our parsers are mostly written in Python we also have some Golden Calf code and some bash scripts in there the code will be open open source on github it is available today we have the link at the end of this presentation and we also have been using a patch in IFI to define some data pipelines that we used to ingest the keys in our database that database is stored in HDFS and we use presto to run at queries on on the data set then for tracking keys so now that we have them in our database we actually wrote a tool that is written in chapelle to compute the patch the CD of all of those keys so basically computing the GCD between every pair of RSA models in there and this is a distributed implementation so it basically scales you can throw more machines at it it's constant in memory you can if you just needed it to be faster you just add more machines we also check for the Rock attack to the rock at home and we run some checks on easy keys like invalid by Ramirez key lengths and so on so we have a short demo for you
10:48
so here we have two SSH public keys a k1 that bubb a key to the pub we run a script that extracts the RSA models from the keys compute the GCD and since the GCD is larger than one we can retrieve the private key then we just copy is that key to a file push the original public key to a test machine just for testing if we can login with via SSH to it now that the key is on there we can try to log in via SSH using the private key we just reconstructed and guess what where and boom that machine has in [Applause]
11:45
so here what we did is we just compute the GCD between two keys for the example but what we actually did is the GCD between all key pairs and we have made a website that you can go today it's on key Luca but PRS key security comm you can upload your key and get it tested against our own data set we promise we'll tell you if we find your private key the private key we will find it so basically it looks like this you
12:21
just have a forum where you can copy/paste an SSH key assert or a PGP key you hit submit and what you may see is that either the key is already in our database so we can give you an answer immediately or we will add it to the processing queue and you can just recheck later and it will tell you whether it is vulnerable or not within about an hour so how does that work
12:49
behind the scenes yeah you know we have a corporate policy saying no Bitcoin mining so we had to find something to do yeah so we had to use that 280 B CPU cluster and make it to good use so we also have about 2 terabytes of storage required because we have to have some intermediate computations for the batch GCD so that we can reuse the other calculations and do not recompute everything when we just want to test new keys against the full data set so we just recomputed whatever changes and we don't have to recompute everything and if we test just like one more key it would take maybe 10 to 20 minutes 3 keys at once is about an hour and if you want to check a large batch of keys like five million easy and it can take up to 24 hours that data is stored on HDFS we have a tender a node cluster we use partitioned resto tables to have faster gaps and scanner is deployed on 50 machines so we scan from 50 different IP addresses at once that's a really nice canning infrastructure so Yeol and what good results and cool stuff have we found well we've brought in a few keys Mike you know two hundred on ten thousand keys most of them were certificates but some of them are actually in used today so we could perform an in the middle attacks again these websites really easily and passively three thousand SSH keys and most of them most keys so it will mean it's allowing for man in the middle attacks again but if you are using SSH key on your little page and we broke it maybe you're using the same SSH key on your backs you know you shouldn't do that PGP key is that's bad because it means we can decrypt what we wrote you we can sign message instead of you and so on so that's really bad what's fun with PGP keys is that a lot of the kids we've broke we're actually having more than two factors so that's actually saving your day if your key is not just using two primes but three primes and it's really stranded because it's not something that common in implementations today it waved three prime numbers so Rocky attack we actually found PGP keys will you're able to rock attack on key base key tab get everywhere so double check your keys you can check them using Python scripts offline on your computer or using our key lookup website Raqqa is not that bad because if your key is in the twothird that tape four thousand bits you're still safe it will take way too long to actually compute primes the public key but if you were on the weak side like you know 2,000 bits it's feasible you know rotors they've had randomness with now well they still do a lot of Reuters we've seen since 2017 of common factors so some of them have even broken no comments in their certificates like who is using that come are in 2015 actor according to our historical scan data a lot of daling Reuters were aiming problem and that's fun because reduce QD done in 2016 I'm a bad she found that problem and they got it patched that's nice few certificates or not valid before 2040 what weight may be able contem computer by then why will you do that that idea ECT keys on the other hand are getting more and more used in practice which is good certificates on PGP mostly or even yet more and more and SSH most the most use is still a sec yeah an East curve which is not bad per se but curve 255 19 is growing it's in third position right now but maybe next year it will be certain it still has a bit to go you know before being first so more and more and CT keys I said so when we scan everything keys new keys were always you know in the same amount we found always same amount of new keys but for ECC we were finding more and more sec keys and that's really good and actually let's encrypt will soon be able to issue ECC based certificate chain so it's called now what if I told you that some people are using their SSH key to son encrypted PGP mails that sounds strange no people do we found kids that are reused as both PGP and SSH or as both SSH and x.509 certificates which is completely insane it doesn't make sense you should just generate new keys you know most people using PGP even in one subkey why just a multiple you know so if one of the shift key is broken you can still have another one to save your house and some of the keys we found were add more than two factors most of them were PGP I already told so and DSA is the add OpenSSL the break it he did in 2015 3 years ago and nowadays only like 3030 few keys were still using DSA as a signature algorithm and less than 1% of SSH key word DSA bathed so that's really good and if you are still using the SI stop so DSA some people were using 256 bits keys which doesn't make sense so yeah just tap using DSA so in the end mind your keys because anybody can do the same kind of silent attacks and maybe they already do thank you for your attention today if you have questions we'll be outside or you can hit on our and Twitter also and thank you [Applause]