Reaping and breaking keys at scale: when crypto meets big data

Video thumbnail (Frame 0) Video thumbnail (Frame 676) Video thumbnail (Frame 2186) Video thumbnail (Frame 2702) Video thumbnail (Frame 4582) Video thumbnail (Frame 5658) Video thumbnail (Frame 7912) Video thumbnail (Frame 9225) Video thumbnail (Frame 9717) Video thumbnail (Frame 11007) Video thumbnail (Frame 16159) Video thumbnail (Frame 17593) Video thumbnail (Frame 18486) Video thumbnail (Frame 19137)
Video in TIB AV-Portal: Reaping and breaking keys at scale: when crypto meets big data

Formal Metadata

Title
Reaping and breaking keys at scale: when crypto meets big data
Title of Series
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
2018
Language
English

Content Metadata

Subject Area
Abstract
Public keys are everywhere, after all, they are public. These keys are waiting to be reaped by those who know their real value. Hidden behind this public face lurks some potentially dangerous issues which could lead to a compromise of data and privacy. Leveraging hundreds of minion devices, we built a public key reaping machine (which we are open sourcing) and operated it on a global scale. Collected keys are tested for vulnerabilities such as the recent ROCA vulnerability or factorization using batch-GCD. We've collected over 300 million keys so far and built a database 4 to 10 times bigger than previous public works. Performing the initial computation on over 300 million keys took about 10 days on a 280 vCPU cluster. Many optimizations allow our tool to incrementally test new RSA keys for common prime factors against the whole dataset in just a few minutes. As a result of our research, we could have impersonated hundreds of people by breaking their PGP keys, mimicked thousands of servers thanks to their factored SSH keys and performed MitM attacks on over 200k websites relying on vulnerable X509 certificates. In the end, we were able to do this in an entirely passive way. Going further is possible, but it would lead us to the dark side. Would big brother hesitate to go there?
Scale (map) Cryptography Pointer (computer programming) Scaling (geometry) Video game Electronic mailing list Key (cryptography) Information security Public-key cryptography Information security
Server (computing) Key (cryptography) Cryptosystem Building Line (geometry) Menu (computing) Non-standard analysis Bit Prime number Bit Number Mathematics Cryptography Lie group Program slicing Right angle Key (cryptography) Integer Laptop RSA (algorithm)
NP-hard Scripting language Level of measurement Divisor INTEGRAL Multiplication sign Curve Ellipse Number Prime ideal Cryptography Information security Physical system Vorwärtsfehlerkorrektur Algorithm Stapeldatei Scaling (geometry) Key (cryptography) Greatest common divisor Diskreter Logarithmus Java applet Bit Cryptography Public-key cryptography Hand fan Elliptic curve Lie group Greatest common divisor RSA (algorithm)
Curve Stapeldatei Scaling (geometry) Key (cryptography) Divisor Validity (statistics) Bit Parameter (computer programming) Instance (computer science) Public-key cryptography Public key certificate Prime ideal Frequency Type theory Password Infinite conjugacy class property Smartphone Key (cryptography) Endliche Modelltheorie Greatest common divisor Resultant
Source code Multiplication Server (computing) Public key certificate Key (cryptography) Observational study Set (mathematics) Plastikkarte Database Public key certificate Key (cryptography) Greatest common divisor Endliche Modelltheorie Resultant Library (computing) Vulnerability (computing)
Stapeldatei Domain name Presentation of a group Implementation Server (computing) Parsing Link (knot theory) Open source Computer file Code Length Virtual machine Boom (sailing) Set (mathematics) Open set Public key certificate Type theory Semiconductor memory Uniqueness quantification Software testing Endliche Modelltheorie RSA (algorithm) Scripting language Cone penetration test Scaling (geometry) Public key certificate Key (cryptography) Demo (music) Greatest common divisor Projective plane Electronic mailing list Database Public-key cryptography Type theory Uniform boundedness principle Word Software Key (cryptography) Greatest common divisor RSA (algorithm)
Mass flow rate Key (cryptography) Demo (music) Maxima and minima Set (mathematics) Database GEDCOM Public-key cryptography God Process (computing) Internet forum Queue (abstract data type) Website Software testing Key (cryptography) Information security
Web page Randomization Implementation Divisor Tape drive Virtual machine Control flow Prime number Demoscene Public key certificate IP address Neuroinformatik Twitter Prime ideal Goodness of fit Mathematics Different (Kate Ryan album) Reduction of order Encryption Divisor Position operator Metropolitan area network Partition (number theory) Scripting language Curve Stapeldatei Algorithm Email Gateway (telecommunications) Key (cryptography) Weight Data storage device Bit Demoscene Electronic signature Data mining Word Message passing Befehlsprozessor Website Key (cryptography) Greatest common divisor Table (information) Resultant RSA (algorithm) Router (computing)
hi everyone my name is Nils I'm from Switzerland I came today with my colleague Jolin we are in the research team at Podolski security I like to fish and I'm really into video games and we are going to talk about collecting public keys and breaking them at scale so who has a public key on github geeet
lab or has pushed the PGP key on a key server an SSH key ok so a lot of people so what if I told you those who raise your hand we have your key so far so good nothing to worry your back right so now what if I told you we could actually break those keys and retrieve the private key out of your public key sounds more fun right and so if we can do it yes who can do it - like NSA that's easy peasy for them right so one question remains how do you retrieve the private key out of the public key um well that's a really good question so
I'm a math guy so I'll do a little you know crypto recap first so most keys out there are using the erase a crypto systems so everything is really simple you just need to take two large prime numbers okay that slice is a bit dry it won't do it to large prime
numbers much easier so you take P and Q Prime really large two thousand or more bits it only you multiply them together awesome you get n that is your public modulus it's your public key and the security of RSA is all based on the fact that you cannot easily factor n into P and it's really hard to factor large numbers and so forth so you get your private key P and Q and n outside on github PGP whatever but what happens if somebody else generated another public key M using the same prime prime Q as you did so let's say he has a key Q times R equal to M you can actually really easily find the greatest common divisor of both and an M that's everybody learnt at school elite algorithm and that's easy-peasy so that's really bad for every say now I won't tell me every see is no good there are other fan system out there like you
know elliptic curve cryptography and elliptic curve cryptography does not rely on the ordinance of the factoring of large integrals it relies instead of the hardness of solving solving the discrete logarithm problem and it is not impacted by aving a lot of keys and like for ever say where you can just try to batch factor all the public is out there and so elliptic curve cryptography or more secured said and ever say again stored scale attacks like these we trial but we'll also be talking about a bit about ECC at the end of our talk so what kind
of attacks can we actually do against public keys we want to remain silent you know we don't want to take a branch and go hit the guy to ask him for his password he will not die he will remark you know you will why do you hit me so we try we can try to use the return of the coppersmiths attack which was like last year smartphones were generating bad primes and you can easily want to easily factors are produced in tankers so another problem you can have is invalid parameters in DSA for example or if you take it to small key size for your ECC or DSA or even forever say if you take and read bits Prime it's no good you can easily factor of those invalid curves attacker also saying but we'll see later it's not a same claim so in the end the one we're really talking about today is array.c models factorization at scale so it means we do we were doing batch GCD that GCD has actually already be done the past by academics but it was done at academic scale so uncertain data size of maybe 10 million keys 20 million keys 80 million keys we did it at scale and much larger scale these are all known attacks anybody can do it and you are completely deceived you won't even know if you are vulnerable to these so Nell's
how do we collect these keys so these are public keys are everywhere and we have focused on the three most common key container types and namely certificates of x.509 certificates you can find in HTTPS for instance SSH keys and PGP keys and we have found quite a few interesting results and also a few crazy things that we will talk about later in this talk and for example as you know certificates have a validity period so it's supposed to start being valid at some point and expire at a later date and some certificates actually have a negative volatile period I mean why would you do that
most keys we have come from certs about 70 percent of our data set came from certificates so that's about 240 million keys we also have 19 million SSH keys and about 10 million pvp keys in our database
those certs mostly come from HTTP scans that we perform using a custom tool that I will talk about later and SSH key is mostly came from a study that was made by Crocs those guys generated a lot of SSH keys with multiple software libraries and smart cards and they tried to see if there were any vulnerabilities in those labs we were able to validate their results namely we found just as they did that one of the smart card model was generating keys with common factors so that was really bad and also PGP key so most of those keys came from the of SKS key servers so why is it
interesting to actually have as many public keys as we can so when we run back TCP and RSA keys you are looking for common factors right so the more keys you have the more chances there are of having two keys with a common factor in the data set so this is why it's good to have as many as possible and we currently have over 40 million keys this is still growing we're still ingesting keys from 32 certificate transparency so this is a project that was initiated by Google and it's basically a log servers where you can get certificates so we can grab the keys from there and also from those log servers we were able to collect a list of 270 million domain names and subdomains you can do a lot of stuff with that yeah it's way more than an exact top one mininum oh yeah so we found that today RSA is still the most common key type 95 percent of keys are still RSA so rocket a convert GCD is targeting that so be careful easy keys are getting more popular and this is basically what I guess you should start using today and then we still have some DSA keys I mean who uses that anyway today and a few other less well-known she words about tools so we've been using a tool that was developed by one of our colleagues it's called scanner so it's not just Network scanner it's also a fingerprinting engine it's written in Erlang it's open source it's on github you can download it today our parsers are mostly written in Python we also have some Golden Calf code and some bash scripts in there the code will be open open source on github it is available today we have the link at the end of this presentation and we also have been using a patch in IFI to define some data pipelines that we used to ingest the keys in our database that database is stored in HDFS and we use presto to run at queries on on the data set then for tracking keys so now that we have them in our database we actually wrote a tool that is written in chapelle to compute the patch the CD of all of those keys so basically computing the GCD between every pair of RSA models in there and this is a distributed implementation so it basically scales you can throw more machines at it it's constant in memory you can if you just needed it to be faster you just add more machines we also check for the Rock attack to the rock at home and we run some checks on easy keys like invalid by Ramirez key lengths and so on so we have a short demo for you
so here we have two SSH public keys a k1 that bubb a key to the pub we run a script that extracts the RSA models from the keys compute the GCD and since the GCD is larger than one we can retrieve the private key then we just copy is that key to a file push the original public key to a test machine just for testing if we can login with via SSH to it now that the key is on there we can try to log in via SSH using the private key we just reconstructed and guess what where and boom that machine has in [Applause]
so here what we did is we just compute the GCD between two keys for the example but what we actually did is the GCD between all key pairs and we have made a website that you can go today it's on key Luca but PRS key security comm you can upload your key and get it tested against our own data set we promise we'll tell you if we find your private key the private key we will find it so basically it looks like this you
just have a forum where you can copy/paste an SSH key assert or a PGP key you hit submit and what you may see is that either the key is already in our database so we can give you an answer immediately or we will add it to the processing queue and you can just recheck later and it will tell you whether it is vulnerable or not within about an hour so how does that work
behind the scenes yeah you know we have a corporate policy saying no Bitcoin mining so we had to find something to do yeah so we had to use that 280 B CPU cluster and make it to good use so we also have about 2 terabytes of storage required because we have to have some intermediate computations for the batch GCD so that we can reuse the other calculations and do not recompute everything when we just want to test new keys against the full data set so we just recomputed whatever changes and we don't have to recompute everything and if we test just like one more key it would take maybe 10 to 20 minutes 3 keys at once is about an hour and if you want to check a large batch of keys like five million easy and it can take up to 24 hours that data is stored on HDFS we have a tender a node cluster we use partitioned resto tables to have faster gaps and scanner is deployed on 50 machines so we scan from 50 different IP addresses at once that's a really nice canning infrastructure so Yeol and what good results and cool stuff have we found well we've brought in a few keys Mike you know two hundred on ten thousand keys most of them were certificates but some of them are actually in used today so we could perform an in the middle attacks again these websites really easily and passively three thousand SSH keys and most of them most keys so it will mean it's allowing for man in the middle attacks again but if you are using SSH key on your little page and we broke it maybe you're using the same SSH key on your backs you know you shouldn't do that PGP key is that's bad because it means we can decrypt what we wrote you we can sign message instead of you and so on so that's really bad what's fun with PGP keys is that a lot of the kids we've broke we're actually having more than two factors so that's actually saving your day if your key is not just using two primes but three primes and it's really stranded because it's not something that common in implementations today it waved three prime numbers so Rocky attack we actually found PGP keys will you're able to rock attack on key base key tab get everywhere so double check your keys you can check them using Python scripts offline on your computer or using our key lookup website Raqqa is not that bad because if your key is in the two-third that tape four thousand bits you're still safe it will take way too long to actually compute primes the public key but if you were on the weak side like you know 2,000 bits it's feasible you know rotors they've had randomness with now well they still do a lot of Reuters we've seen since 2017 of common factors so some of them have even broken no comments in their certificates like who is using that come are in 2015 actor according to our historical scan data a lot of daling Reuters were aiming problem and that's fun because reduce QD done in 2016 I'm a bad she found that problem and they got it patched that's nice few certificates or not valid before 2040 what weight may be able contem computer by then why will you do that that idea ECT keys on the other hand are getting more and more used in practice which is good certificates on PGP mostly or even yet more and more and SSH most the most use is still a sec yeah an East curve which is not bad per se but curve 255 19 is growing it's in third position right now but maybe next year it will be certain it still has a bit to go you know before being first so more and more and CT keys I said so when we scan everything keys new keys were always you know in the same amount we found always same amount of new keys but for ECC we were finding more and more sec keys and that's really good and actually let's encrypt will soon be able to issue ECC based certificate chain so it's called now what if I told you that some people are using their SSH key to son encrypted PGP mails that sounds strange no people do we found kids that are reused as both PGP and SSH or as both SSH and x.509 certificates which is completely insane it doesn't make sense you should just generate new keys you know most people using PGP even in one subkey why just a multiple you know so if one of the shift key is broken you can still have another one to save your house and some of the keys we found were add more than two factors most of them were PGP I already told so and DSA is the add OpenSSL the break it he did in 2015 3 years ago and nowadays only like 3030 few keys were still using DSA as a signature algorithm and less than 1% of SSH key word DSA bathed so that's really good and if you are still using the SI stop so DSA some people were using 256 bits keys which doesn't make sense so yeah just tap using DSA so in the end mind your keys because anybody can do the same kind of silent attacks and maybe they already do thank you for your attention today if you have questions we'll be outside or you can hit on our and Twitter also and thank you [Applause]
Feedback