We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Modification of the prime generation method of the OpenSSL library

00:00

Formal Metadata

Title
Modification of the prime generation method of the OpenSSL library
Title of Series
Part Number
7
Number of Parts
29
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Random numbers are very important in many fields of computer science, especially in cryptography. One of the most important usages of pseudorandom number generators (PRNG) are is key generation methods for cryptographic purposes. In this presentation a modification of the prime generation method of the OpenSSL library will be presented. The modified version of the library passes every well-known statistical tests (e.g NIST test, DIEHARD test), however while an adversary is still able to reconstruct the prime numbers (P,Q) from the public key. The method can be used for malicious purposes as a sophisticated backdoor. The presented research is based on the theory of kleptography and a recently published research paper.
Library (computing)Prime numberRandom numberElectric generatorNumberNon-standard analysisPresentation of a groupArmUniform resource nameOpen sourceInsertion lossInformation securityOvalAlgorithmEncryptionAdvanced Encryption StandardSource codeKey (cryptography)Statistical hypothesis testingVector spaceImplementationInstance (computer science)Vector graphicsStatistical hypothesis testingTopological algebraComputer hardwareFunction (mathematics)LengthSequenceBinary fileBitFrequencyMountain passMenu (computing)Loop (music)CircleCycle (graph theory)Execution unitMaxima and minimaMetropolitan area networkComputer-generated imageryStreaming media3 (number)HexagonMalwareMereologyLaw of large numbersFactorizationModal logicIntegerMoving averageInteger factorizationCASE <Informatik>Electronic data interchangeSineMultitier architectureLimit (category theory)ModularithmetikGlattheit <Mathematik>Manufacturing execution systemWide area networkLaptopScalable Coherent InterfaceNewton's law of universal gravitationRankingMeasurementCross-correlationOrder (biology)Interior (topology)Public key certificateTunisMach's principleSystem programmingNormed vector spaceOrder (biology)Slide ruleStatistical hypothesis testingBackdoor (computing)Public-key cryptographyCombinational logicVector spaceAlgorithmGoodness of fitLimit (category theory)NeuroinformatikMereologyNumberTerm (mathematics)Client (computing)Function (mathematics)Revision controlComputer hardwareHash functionOpen setPrime numberResultantRandom number generationInternetworkingPublic key certificateDecimalInformation securityMathematicsMultiplication signCommodore VIC-20Server (computing)Topological algebraLaptopCodeCartesian coordinate systemImplementationCuboidDifferent (Kate Ryan album)InformationComputer programming2 (number)Web 2.0TheoryInteger factorizationBitFrequencyRandomizationPolynomialTopological algebraWordNational Institute of Standards and TechnologyPresentation of a groupProof theorySource codeEmailCodeBlock (periodic table)SoftwareBinary codeMathematical optimizationSystem callFunctional (mathematics)Bit rateSequenceoutputLengthExpert systemStatistical hypothesis testingEncryptionKey (cryptography)SupercomputerEllipseLibrary (computing)PseudozufallszahlenMalwareParameter (computer programming)CurveLoop (music)Cross-correlationContent (media)State of matterSampling (statistics)Primality testCycle (graph theory)Medical imagingPhysical systemFlow separationSheaf (mathematics)Standard deviationOpen sourceSet (mathematics)Inheritance (object-oriented programming)Online helpPopulation densityMessage passingReal numberCentralizer and normalizerVertex (graph theory)BendingElement (mathematics)Local ringUniverse (mathematics)PlastikkartePhysical lawOperator (mathematics)SubsetObservational studyAreaVariable (mathematics)Right angleStrategy gameSound effectRule of inferenceEndliche ModelltheorieShift operatorStatisticsSelf-organizationTable (information)Arithmetic progressionContext awarenessCiphertextSolid geometryEstimatorData miningSocial classDomain namePlanningWhiteboardClosed setInterior (topology)ConsistencyNormal (geometry)OrbitMusical ensembleMeasurementDistribution (mathematics)Digital electronicsPersonal digital assistantCondition numberEvent horizonInsertion lossOffenes KommunikationssystemLattice (order)Number theoryMetadataPeg solitaireFormal languageFeasibility studyCASE <Informatik>Cellular automatonLecture/Conference
Transcript: English(auto-generated)
Welcome ladies and gentlemen, my name is Norbert Jani and I was working for the Hungarian NSA more than 7 years and I was responsible to make penetration testing and pepronolysis and we do some researches relating to prime number theory and right now I'm a senior security advisor at Hungard company, as you can see my t-shirt, this is a green
t-shirt, as you know we didn't have such kind of cool t-shirt in the Hungarian NSA, so maybe this was the first reason why I just left the company. In this presentation I'm just would like to show you how to hide some backdoors and how
to make some malicious code or random number generations methods and how to identify these codes and after that how to implement in practice these backdoors and applications on prime number generation methods and in backdoors of random number generations. The general background of the research can be shown
in these slides and the main concept of this presentation is based on the following researches that you can see in the slides, side-channel cryptography, pseudo-random number generation and this presentation also
presented in the Central European Conference on Cryptology in Budapest we went into the mathematical background and right now we implemented the code and right now we would like to show you how to implement in the real practice and how to make some fake certificate, how to implement some sophisticated backdoors or random number generators and this research also
based for injection-based backdoors in pseudo-random number generators. This research started within the umbrella of the Hungarian NSA and right now we try to try to implement all of the codes and this is the main reason we would like
to show you how you can do it and after the presentation you can also get the source code of these algorithms and implementations. As you know OpenSSL there are a number of open source applications that are using OpenSSL
library and you know for example OpenSSL or CURL and all of these application is very important to emphasize that these applications are based on the secret of OpenSSL so it means if you can make some backdoors in this library or you can implement new techniques to
this OpenSSL library after that there's a chance to hide some backdoors. Of course it's not so easy because as you know there are some checksums on the library and there are many many differences how to prevent this kind of malicious activity. So the main question is as I'm trying to
try to show you in my presentation that is it possible to hide the backdoors on well-known algorithms. For example everybody knows that RSA or AES is well known for security experts so it means that if you would like to make
some some white box that we did for example in AES codes if you know how does it work if you know what the algorithm is doing and you have some test vectors it's very easy to identify a backdoors for example in an encryption algorithm. It means as you can see in the slides for example you have
encryption key you have an initialization vector you have test vector and you have a cipher text and if you know how AES algorithm is working on you can make some tests. There is an input there is an output it's working or not if the output is different than the sample or the text then it means that maybe there is some backdoors on the AES algorithm or
there is some programming mistakes on the algorithm. So it means such kind of test vectors are proof of concepts that modification have been made for example on the algorithm and it's very difficult to hide a well-known backdoor in AES,
RSA or well-known encryption algorithm. So this was the main idea that modification an encryption algorithm is lead to a non sophisticated backdoor. So what about random number generators? As you know there are no test vectors it means because the behavior of the random number generators is to produce
random numbers. Of course if you are using only softwares, not hardwares, if several random number generators but of course you can try to make very valuable random numbers. However you have to use some statistical tests like
the National Institute standard of technology test or the diehard tests in order to modify the output or in order to test the output. If you would like to modify there is no test vectors. It means as I mentioned in the previous
slides if you are using AES there are test vectors and it's working or not. In the random number generator methods because there are no test vectors you cannot identify the malicious code on the random number generators just figure out that's imaging that there is an embedded system there are some hardware
random number generation on the embedded system and you don't know the source code you know only the output of the random number generator and if this random number generator passes all statistical tests like the NIST test or diehard test it should be good random number generator and you cannot do nothing with this. So as you can see the main idea how to modify a pseudo
random number generator for example in an open SSL library in such a way that the modified output still passes all statistical tests. I mean if I'm modifying the random number generator it should pass all statistical tests. If not of course
it's cryptographically not secure. So this was the main idea and we tried to make some mathematical background relating to this topic and we try to try to implement in the practice. So the first one is just presented in some conferences and
there is a term that how to use this kind of mathematics relating to number towering to hide some backdoors in pseudo random number generator and after
that you cannot identify the backdoors in the random number generator. So as you can see in the slide that let me define a pseudo random number generator and let Bi denote an ibit length binary sequence and let us define the following function as you
can see Vbi and I wouldn't like to go into details but it's very important relating to this topic that there is a term that there exists a polynomial time p algorithm with two of n periods and Vbi equals zero it means and that's observing only 2m-1
consecutive bits from the output of the pseudo random number generator it is possible to predict the next bit with 100% of success. It's very important the next bit and 100% of success. So it's very easy to create such kind of
algorithm. In these slides there is a proof-of-concept algorithm relating to the AES it's called C-SEF encryption AES and this is a POC proof-of-concept algorithm and as you can see in
the slides, here is a seed, here is an AES encryption, it's a combination of the content mode and the CBC mode as you can see in the slides so the output is shifted back and that is a new input for the next block. As you can see
that I will call capital C because there is also a CI with this one so capital C is this one and the C is only this one so capital CI plus 1 e to the encryption key of CI and it's as you can see or not see
here is the next one, block and there is no loop in the algorithm so it means if I'm using this kind of cycle and using this kind of counter mode it can be proved that the size and length of the SEF encryption AES is minus 2 of 128 bits and there is no loop in the counter mode. In such a way, as you know, AES is designed in such a way that
try to pass all of known, well-known statistical tests like the National Institute of Standards of Technology test, the NIST test or the diehard test
so it means if you are encrypting a pseudo random number generator output in such a way, the output is always predictable and the output cannot be distinguished from two random numbers only by statistical tests. So if you know the source code, if you know how this algorithm is
working, you can predict the next bit with 100% of success. However, with only statistical tests, I have to emphasize that only with statistical tests you cannot distinguish from two random numbers and it's very interesting relating
to the proof of concept algorithm that there is a very important corollary as you can see in these slides. So if you like to observing how many bits you have to observe in order to reconstruct the internal seed of the random number generator
you have to observe only 383 bits. So it's not too much to reconstruct the next bit. So it means, as you can see in this proof, that it's consecutive capital CI and CI plus one is 256 bits, of course, always can be found in 383 bits. If you just would like to imagine this
scenario, there is a slide, there is a picture and you can check that's 258 bits, always can be found, only 383 bits.
Related to this one, it's only just a proof of concept code. So if someone tried to implement these codes in OpenSSL library, it's not a sophisticated because if you have a white box on it and there is a security expert and the
security expert just tried to check these combinations of content mode and CBC mode of the AES algorithm, he will definitely identify the malicious activity in this encryption method. However, it's very interesting that we just make some malware to
modify the OpenSSL library and try to modify the internal state of the prime number generation. As you know, if I'm trying to modify the prime number generation, for example, in OpenSSL library, after that there are some statistical tests that have to pass the
output. So it means if I'm doing some bad prime numbers and so on and so on, after that, they really won't qualify and really don't want to pass these statistical tests in OpenSSL library. So you have to be very careful when you are modifying prime numbers generation method of the OpenSSL library.
In this algorithm, you can see that it's a very interesting algorithm that you are just searching prime numbers less than the binary length of the number is 1024.
So it means you just multiply together prime numbers less than 100,000. And as maybe some of you heard about that in 1947, John Pollard proved that is it possible to factorize very large numbers in polynomial time
if, for example, if it's prime factors, prime p-1 or p-1 has only small prime factors. Small prime factors means, for example, less than 100,000, as we see in the previous section in this slide.
So it means if somebody would like to make such kind of sophisticated backdoor rating to prime number generation, they can use, for example, these techniques to generate some prime numbers. And after that, we can check that how many, how many, thank you.
Okay, so after that, you just generated some prime numbers. You can test with
this test and other sophisticated statistical tests and passes all well-known statistical tests. It means any method that's implemented in OpenSSL library will not identify that any malicious backgrounds activity is going on in the code.
In this slide, you can see that, just a moment, you can see a public key, just we generated a public key with all modified OpenSSL library. And after that, we just published the public key to everybody, and as you
know, it's a 2048-bit public key, and so it's almost infeasible to factorize. However, if you are using such kind of backdoors or if you are using such kind of malicious random number generators, is it possible to reconstruct its prime numbers?
There is a mistype in the slide. Can you see it? This is a public key of the previous public key. This is the decimal number of the N modulus from this public key.
Of course, it is base64 encoded version of the public key, and this is the N modulus decimal number. This is a 2048-bit number, and we just generated in such a way, as I mentioned to
you, with John Pollard techniques, because we would like to reconstruct the private key from the public key. So I would like to emphasize that after that, is it possible to reconstruct the private key from the public key? And of course, if you are using only statistical tests, you cannot find any malicious
activity, only in the numbers, because there are no such kind of activities going on. If you know, you can also check on the internet that Pollard's p-1 algorithm is pseudo-code, as you can see here. The first step is to select the smoothest bound b. I really wouldn't like to get into details how to select this bound, but
after that, there is a very important part of the algorithm, this one, that the algorithm is running till the time limit is not reached. So it means, have you ever tried to factorize a very large number? For example, if it's a large number, it's more than 2,000 bits, it means you try to factorize
it, and your computer is not working. There is no result, there is no result, there is no result. So it means you cannot factorize it. However, as I mentioned to you, this number, is it possible to factorize? This computer is factorized approximately 8 minutes, and is it possible to reconstruct p and q prime numbers?
And if I'm choosing the smoothest b number larger than 100,000, for example, 1 million, 10 million, 100 million, then it means the factorization time is not 8 minutes or not 5 minutes, 1 hour, 1 day, 1 week.
So it means, if you are trying to factorize a large number, and there is no result, you don't know why there is no result in the output. Because this is a very subtle, huge number, or there is a very large, smooth number, and generated in such a way, these numbers.
I really don't want to make an explanation of this algorithm. You can find it also in the internet. We just used a combination of this algorithm and the combination of the previous mentioned algorithm, self-encryption AES, to produce sophisticated and very good quality random numbers in OpenSSL library.
Of course, relating to the huge decimal number, within 1 minute, Cori7 laptop would found all of the
prime factors of the huge numbers, so you can find p and q less than within 1 minute. So it means that, is it possible that a construct from the public key, the private key, and it's, of course, everybody knows that this malicious activity can be used in embedded system, in web pages, or when you are using some SSH or something like that.
Of course, just some words relating to this randomness test, that when you are modifying the OpenSSL output, that is very well-known statistical test that your generated numbers have to pass.
So it means I'm just generating some bad numbers, and they are not passes all statistical tests from the National Institute Standard of Technology test. It means that these numbers are failed, so you cannot use for cryptographic purposes. It's very well-known. However, we tested all of these, all of the 15 and 16 and 17 tests. There are some extended tests in the niche test.
So all of the 15 tests passed the output of the random number generation, and after that, we tried to modify some other numbers.
We tried different approaches also. Maybe you have heard about Sharkey's and Maudith's theoretical tests. We also applied the well distribution, this one, and we also applied correlation measure, this one.
After that, we didn't identify any differences between two random numbers and between the generated random numbers. So we didn't find any statistical test that can distinguish these bad numbers from good numbers.
So after the statistical test, we tried to implement Ris Canario, and we tried to make some implementation of the code. So we just, you can see the vkssr.com, and we just requested this domain,
and we generated CSR and sent to Commodore that we would like to HTTPS certification. You know, we would like to use a secure communication, and we just requested a certificate. And a certificate issued by Commodore, SecuCA, and this is a 2048-bit encryption rating to the RSA, as you know.
And it's very interesting that if you're a CA, it's not possible to identify any malicious activity in such kind of numbers, because you don't have enough researches to make factorization, to find bad behaviors in such kind of numbers.
So it means after that they approved and issued the certificate, we just deployed to vkssr.com. You can check on your mobile phone also, vkssr.com is active and it's working.
Of course, you can check some SSL lab tests, and maybe you know it, that we tried to configure the server very securely. It means A plus one, you know, everything is enabled on this server, so it's very secure. It seems very secure, as you can see, TLH5 also it's enabled, and HTTPS forward secrecy is also enabled on the server.
So relating to the security certificates, everything is turned on, and the server seems very secure relating to the certification, and also certified by the Commodore. So, as you can see in the slides, just using some very basic comments in Linux, for example, open SSL, hash client connect,
you can just download the certificate from the server. It's very easy, you can download, you can try it, and after that you can write out the modulus, and you can write out from the certificate, which is the public key modulus.
This is the huge number. And the next step is try to factorize this number, and we've written the source code in ParaGP, it's a computer algorithm system, it's very easy to use, every function you have to find in relating to number three,
you can very easy write some programs relating to this topic, and you can find some very cool stuff in this program. As you can see in the presentation, and in this slide, here is the modulus, this is the large 2048 bits number,
and you have to factorize it. Of course, if you try to factorize normal number, it takes more than a million years in a single computer, so you cannot do this. However, this method is generated by a sophisticated modified open SSL library, and after that, this is the Pollard p-1 algorithm invited by John Pollard in 1974.
Just, we wrote this program in ParaGP, you can see it's a very short one, and we just try to factorize the huge number, this one, and as you can see, it takes 1 minute and 31 seconds.
Within 2 minutes, we were able to factorize the huge number and the certificate, and it means that is it possible to reconstruct from public key to private key,
and if you see that just, for example, like in vkcesha.com, that everything is secure, the certificate is certified by a very well known CA, and everything is configured well, maybe there is some problem with the certificate also, because you can make such kind of bad prime numbers
that can be used in such kind of certificates. Summarizing the research, it's very important to emphasize that kleptography, this is a so-called solitary of kleptography when you try to hide information with kleptographic methods,
and try to steal information, for example, from companies with kleptographic methods. It's very well known nowadays, and we just modified OpenSSL library, however, you can use this kind of methods in embedded system, and it's very important to know that when you are using some kleptographic methods,
and there is a white box on it, a security expert won't identify any malicious activity, for example, in a source code. And, of course, if you know that such kind of prime number generation exists,
and you will see the source code, you will know that there is some problem with the prime numbers. However, if you know only the outputs, for example, the certificate like in vkcesha.com, you don't have any clue how to factorize it, you don't have any clue how the prime numbers P and Q generated. So, it means you can make some backdoors, you can implement some backdoors in such kind of applications,
like SSH or certificates. This is a working member on Linux, it means that you can find the binaries of the OpenSSL, and is it possible to modify the binaries, and after that, the prime number generation is malicious.
And, of course, as I mentioned to you, it's very important to emphasize that it's not possible to distinguish my statistical tests. So, many, many random number hardware random number generation, or software random number generation generator, try to approve in such a way that they try to analyze the output.
In this way, if you're trying to analyze the output, it's not possible to distinguish from other true random number generators. Okay, thank you for your attention. If you have any question relating to random number generation, or relating to the source code,
I'm ready to answer it, or we can also send it to you in an email to you to try it, or you can also try it to software.shi.com, or there is an SSH that is also weak in vk-shi.com. Thank you very much.
Does anybody have any questions they'd like to ask? We have a few minutes to spare. Come on. Come on. You must have been very clear. Nothing? Ah, there's one gentleman.
Yeah, you mean some problems relating to advanced modem, and some elliptical generation methods? Yes, as you know, relating to cryptography, many huge companies like NSA, or Hewlett-Packard, or something like that,
very huge companies, are using such kind of backdoors. And it's very interesting, because when you are identifying backdoors, relating to cryptography, you don't know if it's really a backdoor, or it's just some mistake in the mathematic programming. When I checked relating to elliptic curve, they are using not the same techniques,
but some kleptographic techniques, how to hide some secret parameters relating to the elliptic curve, and after that, they try to, how can I say, you know, when you are generating prime numbers, we have some statistical prime numbers generator, it means there is a chance that it's not a prime number,
but it's very highly likely that it's a prime number, like Miller-Rabin test, or something like that. And when they are just generating some elliptic curve, they just told everybody that it's 100% secure, and it's 100%, there is a little chance that it's not on the curve, and something like that.
And after that, some years ago, they identified that there is some problem with the curves, and this is the same techniques relating to these other number theoretic groups, like elliptic curves, so they are using maybe a more sophisticated backdoors, and it's, you don't have such kind of researches,
you don't have many supercomputers to analyze that is really a prime number, this is really on the curve, or something like that, and of course they have that research, and they can make such kind of sophisticated backdoors. As I mentioned to you, my computer takes 8 minutes to reconstruct from the public key, the private key,
it means I can choose a larger number, and it takes 8 days. So when you are trying to factorize, after one day, after two days, you just finish it, okay, it's a very secure number. However, if you try to run more than 8 days, you will find the factors. So this is the same. If you don't have enough researches to find some curves on the elliptic curves,
and find some secret parameters, you think that everything is secure on this random number generation. However, after that, everybody tried to find the secret parameters, and everybody tried to calculate everything, after that they found and realized that there is some problem with the random number generator.
Okay, thank you very much. Thank you very much.