We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Building a secure network of trusted applications on untrusted hosts

00:00

Formale Metadaten

Titel
Building a secure network of trusted applications on untrusted hosts
Serientitel
Anzahl der Teile
542
Autor
Lizenz
CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Deploying to "the cloud" is incredibly convenient, but that convenience normally comes at a cost. The host necessarily becomes a major part of the applications trust domain, and a compromised host means a compromised application or a network of thereof. This prevents several highly-regulated sectors, such as medical or financial, from directly deploying to "the cloud" as opposed to building their own infrastructure. Solutions to this problem exist, but most require a custom and correct implementation tied to a particular hardware vendor and SDK. I will present a hardware-agnostic and cloud provider-agnostic solution to this issue, which, with minimal changes to the implementation, can be used to secure a network of applications and demonstrate strong trust assertions produced by doing so.
14
15
43
87
Vorschaubild
26:29
146
Vorschaubild
18:05
199
207
Vorschaubild
22:17
264
278
Vorschaubild
30:52
293
Vorschaubild
15:53
341
Vorschaubild
31:01
354
359
410
MittelwertTotal <Mathematik>ComputersicherheitQuellcodeSoftwarewartungCodeImplementierungStochastische AbhängigkeitOpen SourceDienst <Informatik>ÄquivalenzklasseFunktionalStandardabweichungIntelKernel <Informatik>PolarkoordinatenHypercubeDatenverwaltungBeanspruchungKryptologieEinflussgrößeElektronische UnterschriftProjektive EbeneVerkehrsinformationKartesische KoordinatenRichtungDienst <Informatik>Kernel <Informatik>Patch <Software>SoftwareComputersicherheitSoftwareentwicklerAbstraktionsebeneMittelwertNeuroinformatikEinflussgrößeValiditätPunktwolkeHardwarePhysikalisches SystemChiffrierungExploitKonfigurationsdatenbankVersionsverwaltungProgrammfehlerFirmwareDigitales ZertifikatIntegralServerBeanspruchungUnternehmensarchitekturBefehlsprozessorMultiplikationsoperatorFront-End <Software>ProgrammierumgebungTopologieOpen SourceHauptidealService providerFormale SpracheWeb-SeiteTLSInformationMailing-ListeHalbleiterspeicherDifferenteProdukt <Mathematik>CodeSoftwareschwachstelleDatenmissbrauchProzess <Informatik>Gesetz <Physik>KryptologieDatenverwaltungSoftware EngineeringÄquivalenzklasseBildschirmfensterBildgebendes VerfahrenMinimumLogistische VerteilungNichtlinearer OperatorWort <Informatik>Workstation <Musikinstrument>QuellcodeKreisbogenWasserdampftafelMathematikOrdnung <Mathematik>Supremum <Mathematik>Total <Mathematik>Stützpunkt <Mathematik>CASE <Informatik>Protokoll <Datenverarbeitungssystem>Güte der AnpassungDatensatzSichtenkonzeptKorrelationsfunktionWeb logBinärcodeURLSystemaufrufMereologieSchlüsselverwaltungSchlussregelComputeranimation
ServerVerschlingungVerzeichnisdienstNormierter RaumSynchronisierungChatten <Kommunikation>Element <Gruppentheorie>TopologieOffice-PaketHochdruckCASE <Informatik>BeobachtungsstudieArithmetisches MittelNeuroinformatikResultantePhysikalischer EffektObjekt <Kategorie>HyperbelverfahrenForcingFundamentalsatz der AlgebraSchlüsselverwaltungRechter WinkelChatten <Kommunikation>ServerElektronische PublikationSoftwareKonfigurationsraumGrundeinheit <Mathematik>Programm/Quellcode
ServerDatenmodellIntelInformationClientBefehlsprozessorATMSpieltheorieDigitales ZertifikatEinfach zusammenhängender RaumBildgebendes VerfahrenAdvanced Encryption StandardCodeProgrammverifikationE-MailKontextbezogenes SystemAdressraumThreadZeichenketteVersionsverwaltungGeradeBeanspruchungGEDCOMPartielle DifferentiationStrom <Mathematik>UmwandlungsenthalpieEinfacher RingFehlermeldungGesetz <Physik>TrigonometrieStreaming <Kommunikationstechnik>ServerWurzel <Mathematik>AdressraumDateiverwaltungEinfach zusammenhängender RaumBeweistheorieFront-End <Software>VersionsverwaltungDifferenteDigitales ZertifikatLaufzeitfehlerBeanspruchungMereologieVirtualisierungTLSVirtuelle MaschineClientDienst <Informatik>RPCInverser LimesSystemaufrufInformationEndliche ModelltheorieBefehlsprozessorURLGüte der AnpassungDatenstrukturMultiplikationsoperatorObjekt <Kategorie>SystemprogrammVorzeichen <Mathematik>WärmeleitfähigkeitOffice-PaketProzess <Informatik>MatchingEntropie <Informationstheorie>WhiteboardCASE <Informatik>Kette <Mathematik>Zellularer AutomatSoftwaretestTransaktionTopologiePhysikalisches SystemProgramm/Quellcode
FehlermeldungFlächeninhaltAdressraumKontextbezogenes SystemBeanspruchungServerVersionsverwaltungZeichenketteThreadPeer-to-Peer-NetzTrigonometrieZeitrichtungLokales MinimumProjektive EbeneCASE <Informatik>SchlüsselverwaltungWort <Informatik>SichtenkonzeptSpeicherabzugStellenringMögliche-Welten-SemantikDatenflussQuick-SortQuellcodeBeanspruchungServerRechenschieberVersionsverwaltungProgramm/Quellcode
Chatten <Kommunikation>Demo <Programm>Web logRechenwerkMomentenproblemVerschlingungVideokonferenzMultiplikationsoperatorProjektive EbeneSoftwarewartungE-MailComputeranimation
Flussdiagramm
Transkript: Englisch(automatisch erzeugt)
Hi, my name is Roman. I'm principal software engineer and network service tech lead at Profian. Today I'll tell you how to build a platform-agnostic and hardware-agnostic secure network of trusted applications on untrusted hosts.
We all love the cloud. It's convenient. It enables companies to save money, grow faster and eliminates the need for a ton of work managing and maintaining our own infrastructure. It simply makes our lives easier. Well, for the most part. Unfortunately, security breaches do happen and they're costly.
According to IBM, cost of a data breach 2022 report, 9.44 million dollars is the average cost of a data breach in the US. 4.35 million dollars is the average total cost of a data breach globally and 10.10 million dollars is the average total cost of a breach in a healthcare industry. Unfortunately, or rather quite fortunately given the risks,
businesses from various highly regulated sectors like financial or medical simply cannot benefit from cloud offerings due to different laws around things like privacy and data protection. But it doesn't necessarily have to be this way. Confidential computing, by allowing protection of data in use, creates opportunities to do things which simply weren't possible before.
One way to benefit from computational computing would be to just simply use the TEs directly. For example, we could use the SDK provided by the hardware manufacturer and equipped with a fixed stack of documentation, all we go. It works, but there are quite a few drawbacks. First and foremost,
security is hard. Writing software directly communicating with a secure CPU is not exactly everyone's cup of tea. If all you need is a simple microservice application with a small REST API, diving deep into internals of a particular hardware technology just should not be necessary. It takes away the precious time that could be otherwise spent on developing revenue-producing business logic.
But let's say we went ahead and develop our secure layer interfacing with a particular CPU technology. Well, now we have to maintain it. Now apart from that, we also have to fix any bugs that might have been reduced and hoped that none of them are exploitable.
People make mistakes, and the more code there is, the more opportunity there is to make one. After putting all this work in, now imagine that you want to switch to a different service provider which does not offer the same hardware technology you've used originally. Or it might be more concerning. What if a vulnerability is discovered in a particular hardware technology you developed against?
The different trust execution environments just are not exactly compatible. So you're left with just two choices, really. Either wait until the vulnerability is fixed and hope your application is not exploited in the meantime, or you go ahead and redo all the work you've already done for the original technology for the new one. Last but not least, chances are that someone had already done this before,
and fundamentally, the concepts that make systems secure do not change. So most likely you're going to just repeat the same work someone else has already done. At Rofin, we are custodians of the Anarchs open-source project, which, among other things, is designed to address exactly the issues I've just outlined.
It's a thin, secure layer of abstraction in between the host and the TEE. It's essentially a secure runtime, which lets you execute your WebAssembly workloads inside arbitrary trust execution environments. Anarchs has support for various backends. Today, that's Intel's GX and AMD's S&P,
but as more and more TEEs are made available, support will be added for them as well. Anarchs project was started in 2019, and in 2021, Rofin was founded, which was committed to being 100% open-source and providing services and support for Anarchs. In 2022, we also launched our enterprise products.
So, now why WebAssembly? It's polyglot. It's supported by languages like Rust, C, C++, Go, Java, Python, C Sharp, Java, Ruby, Zik, and the list goes on and on. It's designed to be portable and embeddable. It has functional equivalents to a usual native binary, so for the most part,
the development process is exactly the same as for developing any other application. There are emerging system API standards, called WASI, to which, by the way, we also contribute. You can run Anarchs outside of TEE for development purposes. It runs on Linux, Windows, and Mac. Both x8664 and ARM64 are supported.
Rust execution is currently only available on x8664 Linux. For SGX, you'll need recent kernel and a few Intel-provided services running, like ASMD and PCCS. And for AMD 7SMP, all you really need is, unfortunately, a recent kernel with a patch set provided by AMD.
So, the patches are not mainline yet, but we also maintain our own kernel tree with everything you could possibly need for this. Now, let's see how is Anarchs actually deployed. On the left here, we have a tenant. Let's call her Jane.
On the right, we have a CSP server with a supported CPU, on which Jane wants to deploy her workload. How does Jane ensure integrity of the workload being executed by CSP and confidentiality of its data in use? To do that, Jane will ask to execute her workload, to ask CSP to execute her workload, in Anarchs.
The first thing that the keep does is it asks a secure CPU to measure the encrypted memory pages containing the keep itself. This is the execution layer and the shim. The CPU then returns a cryptographically signed attestation report containing the measurement along with information about the CPU,
for example, the firmware version used. The execution layer then sends the report to an attestation service for validation. In Anarchs, this attestation service is called steward. The steward will make sure that the keep is indeed trusted. It will check the signature of the report to ensure it is being run in a hardware-based trusted execution environment,
and will also make sure, for example, that the CPU firmware version used is not vulnerable, and will verify that the Anarchs execution layer was not tampered with. On successful attestation, steward then issues a certificate for the keep,
which is used to fetch the workload from a registry. We call it drawbridge in Anarchs. The certificate is also used for performing cryptographic operations, for example, for providing transparent TLS to the workload. Now, let's see how this works in practice.
To begin with, let's see how do we actually run something within an Anarchs keep. The fundamental unit of work executed by Anarchs today consists of just a WebAssembly executable and Anarchs keep configuration. For example, here is what it looks like for my chat server that I'm going to secure later.
This is the keep configuration. Here is my steward configured, my personal steward that I've deployed, my VPS, and my Stern IO configuration. In this case, I want to inherit everything from the host, so that means I want to print everything to the host, and I also get a standard IO from the host. This file would also contain things like network policy,
or trust anchors, and other things like that. I've already uploaded this to my personal drawbridge, and I tagged it with a tag of 010. Let's see what that looks like. For that, I'll do a request to my drawbridge. What I get back here for this request is a tag.
We also call it an entry. An entry is nothing else than a node inside a Merkle tree. It's a Merkle tree because it contains the digest of the contents of itself. Now, what does it mean is that if I would, for example,
go one layer deeper and inspect the actual tree associated with this tag, I'll see that it contains the nr.tunnel and made it wasm we've seen earlier. Now, if I were to, for example, compute the digest of my nr.tunnel, you'll see that this is exactly the same digest we see here and here.
Now, I can go, of course, one step up, and instead of computing the digest of the nr.tunnel, I can compute the digest of the actual entry, the actual tag. For that, I will just do a request again to the same URL,
and again, compute the digest of it. Now, if you remember, you'll notice that this is, again, exactly the same digest that we see in our tag. This digest is, in fact, a digest of the minified JSON of this object that we've seen over here.
This is nicely formed for us by JQ, but when you do a request directly, you just get a minified JSON, which we then hash. So, here I am logged in to an AMD service and P-capable machine.
I could, for example, read the CPU info, and I will grep for model name, and it only wants one entry, and you'll see that this is indeed an AMD EPYC 513 processor. So, I'm going to use nrx deploy,
and I'll also specify the backend explicitly, to deploy the workload we just looked at. So, I'm going to use, again, my custom drawbridge. I'll deploy the chat server version 1.0.1,
exactly the same one that we have seen before, and then I'm going to switch to yet another server. I'm going to get remote. This one has support for the STX. Again, I'll do...
Here we see this is Intel Xeon 6338. And here, I'll also do nrx deploy, and in this case, I will execute the chat client.
Now, once it starts, it will ask me for a URL. I'll put here the address and the port. So, as you can see here, I've connected. Here you can see that the server also acknowledged the connection,
and if you just look here, you'll see the exact same digest we've just seen in our entry. It was over here. So, we also see the slug of the server we just ran in that other server, the version. All this information came from the certificate.
It's cryptographically signed data contained within the certificate, which nrx parts for us and is exposed to the workload. Similarly, the server also received the slug that the client was deployed from, and it also received the digest of the workload.
By looking at the certificate, we now can know exactly what workload is that other party running. We could also try to expect this. We can use OpenSSL to connect, and sure enough, we see our certificate.
We can see here that it's currently encoded as a common name. It should be a SAN, of course, but this is just a proof of concept. So, we can see here in the certificate chain that we have a certificate with a common name associated with the slug
and the digest, and it was issued to us by the steward by the steward that I have deployed in my infrastructure. And there's also my own CA in the root chain, which actually signed the steward cert before.
Note that if we look at the server logs, we'll notice the OpenSSL connection, as you can see, was not let in by the server. And it says here that the client did not present a valid certificate. So this was not a keep with a valid certificate issued by the steward, therefore the server didn't trust it and didn't let it in the secure, etc.
Similarly, if I were to use narcs with a different backend than SGX, for example, I would use a KVM, which is not a real TEE, it's just a KVM backend, it will not even attest to the steward. So the steward wouldn't issue a cert for us.
And we cannot actually execute the workload in narcs. Now, let's look at how we actually achieved this. And to begin with, let's look at the client. And you'll notice it's quite a small executable, actually. And notice also, this workload doesn't actually need to do any TLS itself
or anything like that. nrcs-runtime handles all the TLS connections for it. And by default, all connectors are TLS anyway. So we're going to use a virtual file system to connect to an address at runtime.
Unfortunately, this is required right now due to limitation of the YG spec, but there's more going on providing these APIs. But currently it's not possible to just call or connect syscall like you would normally do. But that's why nrcs provides a virtual file system to actually connect to a particular address.
Now, similarly, there's another virtual file system to extract the peer data from the connection we have established. And in this case, we can simply match on that peer information. So here, for example, if we are presented with an anonymous peer, so we did not have a TLS certificate, we just simply abort.
And this would also be triggered if the certificate would be not signed by a trusted party, like a steward would trust. If it was a local workload and it was executed in a real TEE, we could still trust it because we know the expected digest of the package that we have uploaded to the drawbridge.
This, by the way, is the exact same digest we have seen before, maybe you can see it is over here. So this is the exact same digest we've looked at before. Now, in a HappyFlow, of course, we are presented with the actual nrcs key, which is then associated with slug and the digest.
And what we can do here is we can actually match on the actual workload slug, so where did this workload actually came from, its version, and in this case we don't even need to check the digest because we trust the drawbridge slug.
So in this case, we have verified these three versions and we do not want to allow any other versions. Of course, this would eventually become a keeper configuration, probably. It could be, again, specified in nrc.toml, but for now, just for simplicity, I've included everything in the source code.
Now, similarly, we have the server part, and it has a very similar peer check over here, where it, again, checks for anonymous, local, and key. And it actually doesn't let any local workload in and only allows, essentially, official releases
that were verified and were issued, perhaps, by this entity over here. So let's get back to the slides. If you're interested in this project, you could get involved using one of the links provided over here.
Now, a moment of a sad announcement. Just a few hours before recording this video, I found out that Profian is closing, and therefore the nrc project is looking for maintainers and I'm looking for a job. So if you know anyone who would be interested in the nrc project, or me, please let me know.
Please contact me, or email, or on LinkedIn, and here's my user handle. And, yeah, now it's time for questions. Thank you.