We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Remote Attestation with Keylime

00:00

Formale Metadaten

Titel
Remote Attestation with Keylime
Serientitel
Anzahl der Teile
542
Autor
Mitwirkende
Lizenz
CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
In various scenarios, it is necessary to attest the integrity of a remote machine, making sure that the system was booted securely, essential files were not modified and that only allowed software is executed. For this purpose, we present Keylime as a remote attestation solution. It leverages the trust from the Trusted Platform Module (TPM) in combination with UEFI Measured Boot and the Linux Kernel Integrity Measurement Architecture (IMA) which are probably available on your system today. We will present how Keylime works and real world applications for remote attestation.
ComputervirusGeradeSchlüsselverwaltungWorkstation <Musikinstrument>RPCSoftware EngineeringComputersicherheitSoftwarewartungDistributionenraum
Physikalisches SystemKonsistenz <Informatik>ProgrammverifikationStrom <Mathematik>SystemplattformKonfigurationsraumBimodulDigitales ZertifikatSpezialrechnerSinguläres IntegralEinflussgrößeFirmwareLesezeichen <Internet>AggregatzustandMaßerweiterungAlgorithmusHash-AlgorithmusWinkelProgrammierumgebungRootkitVirtuelle MaschineAggregatzustandSchlüsselverwaltungVorzeichen <Mathematik>Endliche ModelltheorieDigitales ZertifikatIntegralHardwarePhysikalisches SystemElektronische PublikationUmsetzung <Informatik>EinflussgrößeHash-AlgorithmusLebesgue-IntegralSystemstartSoftwareOffene MengeInformationSystemplattformRPCAblaufverfolgungSoftwareindustrieMaßerweiterungProgrammverifikationServerNichtlinearer OperatorCASE <Informatik>EreignishorizontIdentitätsverwaltungTelekommunikationGamecontrollerAlgorithmusSelbstrepräsentationBitFlussdiagramm
ProgrammierumgebungFramework <Informatik>Digital Rights ManagementDigitales ZertifikatHardwareFirmwareSoftwareHash-AlgorithmusBootenDigitalsignalOperations ResearchPhysikalisches SystemSimulationAbstraktionsebeneRekursiver AbstiegDatenmissbrauchHill-DifferentialgleichungHypercubePunktwolkeEinflussgrößeExogene VariableDienst <Informatik>FestplatteWurm <Informatik>ComputersicherheitGruppenoperationBetriebssystemSkriptspracheDistributionenraumMereologieIntegralPhysikalisches SystemGrundraumSoftwareNichtlinearer OperatorInstant MessagingSchlüsselverwaltungNotebook-ComputerInformationt-TestSystemstartHardwarePunktwolkeVirtuelle MaschineDatenmissbrauchSichtenkonzeptElektronische PublikationCASE <Informatik>MatchingExogene VariableProgrammverifikationMathematikProgrammierumgebungNegative ZahlOrtsoperatorComputerspielHash-AlgorithmusReelle ZahlStrömungsrichtungKonfigurationsraumFirmwareServerRadikal <Mathematik>LaufzeitfehlerInstallation <Informatik>EinsVektorpotenzialAlgorithmische ProgrammierspracheEinflussgrößeMinimalflächeFlussdiagramm
Exogene VariableHypercubePunktwolkeBootenEinflussgrößeKette <Mathematik>Desintegration <Mathematik>Lineares GleichungssystemLeistungsbewertungAblaufverfolgungSchlüsselverwaltungHoaxMultiplikationsoperatorDistributionenraumMetropolitan area networkMaximum-Entropie-MethodeInformationIdentitätsverwaltungLokales MinimumDigitales ZertifikatEinflussgrößeUnrundheitNegative ZahlOrtsoperatorImplementierungDigitalisierungHardwareBenchmarkExogene VariablePhysikalisches SystemRPCProgrammverifikationProzess <Informatik>MatchingLoginElektronische PublikationComputerspielHash-AlgorithmusElektronische UnterschriftKette <Mathematik>CASE <Informatik>Einfach zusammenhängender RaumPunktEinfügungsdämpfungNotebook-ComputerMereologieRandwertVirtuelle MaschineSoftwareEreignishorizontGrundraumMathematikBeweistheorie
Flussdiagramm
Transkript: Englisch(automatisch erzeugt)
Hello. Okay. Now it works. Kind of, right? Okay. So hello, everyone. Welcome to security room, and we've got our next talk about key
line and remote at the station, which will be given by Anderson. Sorry. Okay. So welcome. Sorry about the trouble. So I'm Anderson Sasaki, I'm a software engineer at Red Hat, and I'm here with Thor. Yeah, I'm Thor, and I'm a maintainer of Linux distribution for school and universities, and
I'm also a maintainer of key line. Yeah, so we are here to talk about remote at the station with key line. So let's get started. Imagine you are like a car vendor who maintains and updates the systems running in cars, but you want to make sure that the systems in the cars were not modified so that you
can check if the customer is still eligible to receive the latest update or something like that. Or you are a software company building software in the cloud, but you want to make sure that the build tooling was not modified, or you are a telecom company
that wants to make sure that the systems you deployed that controls antennas, they were not modified. So what all these cases have in common is, first, they are remote. Second, you don't really have full control of the systems in the wild.
So the question is, how can you check that the system was not modified in the wild? So a way would be if you could somehow get some information about the system and then check if it's what you expected from that. And, of course, in case it's not, then
you would want to have a way to react on that. So if you can do that continuously, get the information checked, then you have like a monitoring of the integrity
of the system. So that's what one of the things the remote attestation can provide is to check remote integrity, remote machine integrity, how it works. So you have a trusted entity running in some controlled environment, and then
you have a trusted agent on the other side running on the monitored system. And you ask for the information to that agent and get back some information called a quote. Then you can verify that that agent is running in a machine in the state that you trust. So that comes with the problem of trust.
So how can you trust the machine or the agent running in some machine that you don't control? So you don't really trust directly the agent, but you trust on hardware root of trust, which is the trusted platform model or TPM.
What are the TPMs? They are pieces of hardware that can perform crypto operations such as generating keys, signing data, and it has this special key and certificate called endorsement key, which are generated during manufacturing.
So the manufacturer generates the key and publishes the CA certificate so that you can verify that it is legitimate. And then this EK, the endorsement keys, can't sign data directly,
but you can generate the attestation keys that are associated with that endorsement key in a way that you can verify the origin of some signed data so that you can make sure that that data was signed by that specific TPM.
And another important thing that the TPM has are the platform conversion registers, which are special registers designed to store measurements about the system in a way that you can verify the integrity.
So how these measurements are done? During boot, each step of the boot is measured by the UFI into the TPM via the PCR extend operation. So each step the boot process goes, you get a hash of the binary or the software that is running
and extends into a PCR. I will explain that soon. So during boot, the UFI is responsible for measuring the boot steps into the TPM.
And after boot, then the kernel, Integrity Measurement Architecture, or AIMA, will measure any open file that matches a policy. You can configure the AIMA and it will measure the files open into a PCR as well.
So if you have the information like the state of the PCR and the event log, or all the operations, extend operations that were performed, then you can verify the integrity of the machine. So how this PCR extend algorithm works is kind of simple.
You get the old value stored in the PCR, concatenated with the measurement from the data. So this measurement is basically a hash. So you concatenate the old value with the hash of the measurement,
calculate the hash of all of these, and put back into the PCR. So that's done for each step. So of course these PCRs, if you know a bit of TPM, they don't match the actual numbers,
but this is just for illustration. So after measuring all these steps, you have the final value in the PCRs that you can calculate like called golden value,
which you calculate like the hash of all the PCR values, and you have a representation of the state of the machine, and that can be verified. So how KeyLime works. So in the left side you have a trusted entity,
where you, like probably a machine that you control, where you run the verifier side of the KeyLime. It's a server, and on the right side you have the monitored system. It is remote, you don't have complete control of it, but the agent has access to the TPM installed in that machine,
and so the server can verify, the verifier can request a state to the agent. Then the agent will access the TPM to get the quote, meaning the PCR values, and also together with the event logs or all the PCR extent operations that were performed,
and throw it back to the verifier. And then the verifier can verify first the origin of that piece of data,
because it's signed by the AEK key, so you can make sure that that data came from that machine that contains that TPM, and you can verify the identity of the TPM using the EK certificate. And with the values you obtained for the PCRs and the event log,
you can replay all the extent operations, so that in the end you can get the values that it should have, and so with all this information you can verify the integrity of the machine. Since you also got the information from AIMA,
like all the files that were opened and matched some policy, AIMA will calculate the hash of open files that match some policy and extend to the PCR. So you get this log containing the file names and the matching hashes,
so you can also, with some policy engine, verify the integrity of individual files in the remote machine, so you can like a full integrity view of the remote machine.
So with that information, the verifier can check, if it's okay, it's okay, the attestation was successful, but if it was not, it doesn't match what you expected, then it's a failure. So in case of failure, we have a revocation framework,
which is a way to, so you can configure some actions to the verifier, some script that it can run to perform some action. It can be some webhooks, so if some attestation fails, it sends some requests to some webhook, or you can notify the agents directly via REST API,
and send some payload to trigger some operation there. The simplest scenario, for example, if you had a cluster with various machines, and one of them failed attestation, you can notify the others to remove that node from the cluster by blocking the network connectivity, for example.
So that's how Keyline works in general. So now I'm passing the mic to Thor, he will continue with the real world stuff. Yeah, so now we heard how Keyline works, and we want to show that you can use it in production,
and what are the challenges that you will run into if you want to try that. We have three main scenarios there, we have first policy creation, then the monitoring, and then how to react on that. So the first part is we want to create policies for our system.
For that we need to know what is actually on our system, and what are our systems. So from a software side perspective, it's normal that we integrate, we have a CI CD pipeline, we get what data gets into that, and we want to save the hashes there. But we need also a lot of other stuff,
we want to know what packages are installed, where they end up on our system, have their signatures, can we verify that. That is what we normally want to have, and either this information is already provided by the distribution, or we need to generate that on our own. Then on the hardware side, we need to know what kind of hardware are we running.
So as we said, we have the EK, so the endorsement key, we need to at least know that to trust the TPM in some regard, and then ideally we want to also know what firmware is running on that device, and which configuration do we have. For example, do we allow secure boot to be disabled and able,
do we have our own keys on there, and stuff like that. So if you have all that information, we can go to the other side, which is the monitoring. That part is implemented by KeyLime. If you have all the necessary information, we provide documentation and tools to generate a policy for that,
and you can feed it in that, and it's also there. The challenge that you run into here is that for many of you, probably I'm a measured boot, and TPMs are new, and if you run into issues, then you also need to try to understand how that works to debug it.
So that is a challenge you run into, that you still need a good understanding of those technologies to make your life easier. But yeah, that is mostly solved by KeyLime. And then we come to the non-technical side, which is we need to react somehow when we have a revocation failure.
So is that a lot actually relevant for us? Because if we have file changes in TEM, we don't really care. But then who needs to be notified if you have that? Then how do we tie that in our current monitoring infrastructure? For example, like with the webhooks. And lastly, if you are a company,
and it's a potential security breach if KeyLime fails in the way that you configured it. So there are service agreements in place which need to be notified and how do you respond for that? But going now from the general part to actual examples, I work on a Linux distribution that does exams for schools and universities
called Landstick. And we developed with the University of Applied Sciences and Arts, Northwestern Switzerland, also called FRNV, a system called Campla, which is secure bring-your-own-device exams.
So what is the problem here? The students want to bring their own device, their own notebook into the lecture hall and want to write their exams on that. We don't want to touch their operating system. So we do something which we call bring-your-own-hardware. They bring their own hardware and we boot a live Linux system on that system and remotely attest if that system is running correctly.
So what do we have? We have our hardware which has a hard drive and a TPM. Now we boot the distribution called Landstick. And on that we have the KeyLime agent running and also IMER and our measured boot stuff. And now the interesting part is we just care about the TPM.
We don't care about the hard drive and what is on that system otherwise. So now we have the actual server solution. So we register to the exam system. And this also includes that we register to KeyLime.
Then we check in return if the system is actually in a trustworthy state. And if that's the case, we release the exam files, which is in our case normally an RDP session, which then connects to the cloud where the people are actually writing their exams.
So why are we doing that this way? The first one is that we guarantee that the environment for every student is the same because they only provide the hardware and it's basically a terminal to connect to the actual exam. So if there's computing intensive stuff, then it doesn't really matter.
And also because they only bring their own hardware and don't need to install monitoring software on their system to write the exam, we don't care what data is on that. We don't want to know. It's first for privacy and also to make setup way easier.
Now back to a more traditional scenario that more of you are probably familiar with the cloud. There we have the example that IBM uses this for hypervisor at the station. They don't use runtime at the station, so not AMR. They use measure boot to see if the hypervisor booted up correctly.
So their challenges were that implementing the actual response procedures, so the procedure from we have an alert to how do we deal with that now, that is the difficult part because one is the technical side,
but how do we structure our teams in a way that we can guarantee that. Then also the other ones are eliminating false positives. That ties into the first point because if a human needs to react, then we want to have no false positives and also no false negatives. Ideally, false negatives are for security very, very bad,
so we don't want to have that. Lastly is keeping the policies up to date. Even if you roll your own distribution and are big enough, it's very difficult to be up to date on the policies and integrate them automatically. Lastly, they have an escalation chain just for illustration purposes.
They use KeyLime to monitor that, tie that into their Jira system, and then have an actual person react on the other side. And then one point from the distribution, so in this case from SUSE, I asked them, and they integrated KeyLime into pretty much any product,
so it's an open SUSE today. If you want to use it in MicroOS, there's instruction to do that. And then also in SUSE Enterprise Linux and in ALP. Their challenges also were integrating it with SA Linux fully and making AIMA usable, so do we have signatures?
How do we provide the hashes? And the general thing for a distribution is how do we provide robust policies in general because we want users to try out that technology and want to experiment with that,
but how do we give them the starting point? And that is still very difficult because, as we saw, there are many data points that need to be collected, and that is a challenge that they're trying to solve actively by making getting the signatures and the hashes easier. So, to say for the end, try remote attestation today.
The technology that you need to do that is pretty much in every device that you have in a notebook that you can use, so you can find KeyLime at KeyLime.dev. And, yeah, thank you.
So, do we have questions? Yeah. Lots of questions. Thank you for a great presentation. One question. You talked a lot about the verification side of the processing.
You talked a lot about the verification side, but to have the golden values or the PCRs in your verified system, you need to provision them. So, I was not sure the distribution side of things was how do you manage that in KeyLime?
Could you shed some light on that? Yeah. Yeah. So, with the golden values, we have the values in the TPM, and then they are also tied to event log and IMR and like a measure boot, and we solved the issue that we don't actually need golden values by having a policy engine. Basically, that verifies the logs itself, checks if those match the values,
but then we check the logs and not the end value. So, and then the distribution can help you because they can provide like a lot of the signatures already and which files are in which packages and how they end up. That makes the life for the distribution easier.
What is the performance of such a check? How much time does it take and how much data is required for such a monitoring? From what I saw, I don't have like benchmarks for that,
but it's pretty quick, like 200 milliseconds, something like that. So, the round trip from the request to the response is like 200 milliseconds in my tests, but of course, it's on my machine, right?
We don't have benchmarks for the performance. Yeah, so it heavily also depends what you want to test. If you just have measure boot, it's the quote time on the hardware TPM maximum to a second, and then it's like a couple of most megabytes, single digits that you have a data that is transferred.
You had said that one of the challenges of implementation was dealing with false positives and maybe false negatives. Can you give some examples of when that would occur? Yeah, so because we are still talking over the network, that is like a false positive if the network connection goes down,
and the other one is it's kind of a false positive and not really is if your policy is not up to date. For the system, it's not a false positive in the traditional sense, but it's false positive because we don't want that alert actually to happen. For the university use case, how do you know that you're actually talking to the real TPM in the laptop?
So we have two ways. First, we verify against the hardware manufacturer, so they have a CA that we can verify against, and also we can enroll the notebooks directly,
so we check if the device is, which I forgot to say that the university part is still proof of concept, so we are currently working on it, but it's not rolled out like large scale. How do you make sure that an alert event, a new change happens,
and how do you make sure that it's not intercepted over the network? Sorry, once again? How do you make sure that when there's an event saying that there's a change in the machine, a new measurement that appears, how do you make sure that the event is not intercepted
in the network between the monitored machine and the trusted system? So is the question how do we deal with the losing connection between the agent, the monitored system and the verifier? Losing connection or having maybe something in between,
making sure that it does not go to the trusted system. There's something in between that makes sure that you're never going to be notified that a system is going to be compromised or just became compromised.
If we have a blocking connection between the agent and the verifier side, then we get a timeout somewhat and then the agent gets automatically distrusted, and if you said from the notification system itself, if you say we notify all the other agents, of course then there is an issue if you cannot reach them on a trusted channel,
then it's basically game over in that direction, so you need to get your revocation alerts if you want to guarantee them all the time through a trusted channel. So the trust boundary is for the attestation part that we see that, but the revocation part, if you want to reach that, then it needs to go through a trusted channel.
So continuing on this question, actually I think how do you make sure that your actual verifiers connect to the right agent and you don't have a man in the middle attack that's happening and rerouting this to a fake agent and fake TPM? Yeah, so that's tied with the EKE certificate, so you trust the manufacturer
because when they manufacture the TPM, they will create this key that cannot be modified or removed in any way, so it provides the identity to the TPM. So when you get the information from the TPM or from some agent,
you can verify that that data came from the TPM that has some EKE key because it's signed and you can verify the origin using the CA certificate provided by the mem factor, so you can check that the TPM is exactly the one you expected using the EKE certificate.
Okay, thank you for the talk, thank you for all the questions, we are out of time. Thank you.