We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Where does that code come from?

00:00

Formal Metadata

Title
Where does that code come from?
Subtitle
Git Checkout Authentication to the Rescue of Supply Chain Security
Title of Series
Number of Parts
542
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
You clone a Git repository, then pull from it. How can you tell its contents are “authentic”—i.e., coming from the “genuine” project you think you’re pulling from? With commit signatures and “verified” badges ✅ flourishing, you’d think this has long been solved—but nope! This is in essence the problem GNU Guix, as a software deployment tool and GNU/Linux distribution, had to solve as we will see in this talk. A key element of supply chain security is updates: how can we make sure software updates are secure? That one doesn’t risk running malicious software when updating software their system? For free system distributions, The Update Framework (TUF) has become a reference on these matters. However, TUF is designed with binary distributions in mind—think Debian or even PyPI—and is not suite for “source distributions” like GNU Guix. In this talk I will present how Guix distributes software packages and the mechanisms central to supply chain security in Guix: reproducible builds, builds from source (the “full-source bootstrap”), and provenance tracking. Software updates in Guix amount to ‘git pull’ so the security of updates translates to the ability to authenticate Git checkouts. Believe it or not, this pretty fundamental problem was still in search of a solution. Guix developed a simple mechanism for Git authentication, which has been used in production for a couple of years. I will present it and, given that the solution is generic, show how it could benefit Git users alike. We’ll also reflect on how Guix’s approach compares to those developed by tools like slsa or in-toto.
Markov chainAreaAuthenticationMereologyContext awarenessComputer animation
Probability distributionMarkov chainDigital rights managementComputer animation
Hill differential equationPhysical systemQuicksortBuildingException handlingOpen sourceSource codeComputer file
BuildingDemonSocial classEndliche ModelltheorieIntegrated development environmentComputer animation
EmulationBootingBuildingPoint (geometry)Server (computing)BuildingBinary codeOpen sourceInternet service providerMarkov chainCodeProbability distributionData storage deviceIdentity managementHash function
Information securityMechanism designCodeDifferent (Kate Ryan album)Server (computing)Software repositoryOpen sourceBuildingMarkov chainProjective plane
Software frameworkSoftwareSystem programmingRepository (publishing)Binary fileAuthenticationVenn diagramDifferent (Kate Ryan album)Fitness functionFormal languageBinary codeCASE <Informatik>BitSoftware frameworkSoftware repositoryProbability distributionException handlingCodeQuicksortInstallation artLatent heatImplementationServer (computing)Computer animationDiagram
AuthenticationPoint (geometry)Computer animation
Server (computing)Revision controlVulnerability (computing)NumberComputer animation
Service (economics)Ring (mathematics)AuthorizationAdditionKey (cryptography)NumberComputer animation
Graph (mathematics)NumberProgram flowchart
Computer fileOpen setRule of inferenceKey (cryptography)
Invariant (mathematics)Inheritance (object-oriented programming)Commitment schemeAuthorizationMereology
AuthorizationGraph (mathematics)Set (mathematics)Inheritance (object-oriented programming)Prime idealInvariant (mathematics)Commitment schemeBranch (computer science)AuthorizationProgram flowchart
Software repositoryInvariant (mathematics)AuthorizationMultiplication signProgram flowchart
RootInformationAuthenticationBootstrap aggregatingLatent heat
SharewareWebsiteNumberProduct (business)Message passingAuthenticationInvariant (mathematics)Authorization
Error messageBitAuthenticationKey (cryptography)Commitment scheme
OvalRing (mathematics)Key (cryptography)TrailGoodness of fit
Commitment scheme
EmoticonPhysical systemVirtual machineInformationProbability distributionMessage passing
AuthorizationMultiplication signSoftware repositoryAuthentication
CodeAsynchronous Transfer ModeComputer programmingBuildingMarkov chainSoftwareProbability distributionCore dumpScale (map)Source codeLocal ringProgrammable read-only memoryMenu (computing)Data modelRevision controlAbstractionSeries (mathematics)System programmingInformation securityOpen sourcePhysical systemSemantics (computer science)ComputerIntegrated development environmentControl flowContext awarenessContrast (vision)Repository (publishing)Mechanism designImplementationVideo trackingSimilarity (geometry)AuthenticationFormal verificationUniform resource locatorSource codeShared memoryMarkov chainDataflowINTEGRALFormal verificationRevision controlKey (cryptography)Information securitySet (mathematics)Different (Kate Ryan album)BitComputer animation
Binary fileVideo trackingSource codeFreewareProper mapSoftwareDirection (geometry)BitControl flowLimit (category theory)Commitment schemeElectronic signatureSoftware developerOpen setINTEGRALPhysical systemFormal verificationPresentation of a groupImplementationKey (cryptography)Probability distributionComputer animation
Program flowchart
Transcript: English(auto-generated)
Good afternoon. Can everyone hear me? It seems to be working. All right. So, I'm going to talk about Git checkout authentication
in the context of supply chain security. It's one of these buzzwords that we hear a lot today and I guess that's because there's a lot to be done in this area. I have to tell you this is going to be a talk about pre-quantum issues. So, it's going to be different. All right. So, I'm going to talk about work that has been done
in the context of GNU Geeks, like Simon was saying. Who has never heard about GNU Geeks in this room? A show of hands? Very few people, actually. This is weird. I'm surprised. Anyway, so this started as part of Geeks, but as you will see, this is useful beyond Geeks, I think.
So, just, yeah, I have to introduce Geeks a little bit. This is an actual birthday cake that we ate a few months ago to celebrate 10 years of Geeks. So, it's an actual, yeah, and it's a real cake.
That's the thing. Yeah, so it's a distribution, a GNU Linux distribution that you can install standalone, like you would install Debian or something. You can also install it on top of your system if you're already running Debian. For example, this is great. And you can also have Geeks on top of Debian, and that gives you an additional package manager.
But anyway, I'm not going to go into the details of what it's like as a user. I want to talk about the, you know, what's behind the scenes, right? So, what it looks like from a supply chain viewpoint. So, this is a package definition for Geeks. Maybe some of you are wondering about the parents, right?
That's okay. It could be JSON. It could be XML. You have similar things with other tools. It's just basically metadata that describes how to build the Hello package. It's telling you where to get the source code that are the GZ file. It's telling you how to build it with GNU build system. So, configure, make install, you know, that kind of thing.
And there are now like more than 20,000 packages in Geeks, and they're all defined like this. So, this is source code, right? And the thing is Geeks is able to build packages from source. So conceptually, you could think of Geeks as some sort of Gen 2, right? In the sense that it's building packages from source,
except that you can actually get pre-built binaries, and that's what people usually do because, you know, it's faster, especially if you want to use LibreOffice or, you know, whatnot. But Geeks is basically as a distro, it's all source code, right? Package definitions. And then when you go and build a package, that's also a salient feature.
So, if you've ever used or heard about Nix before, this is entirely inherited from Nix. This is the functional model. Basically, you say, all right, I want to build that hello package, and you run Geeks build hello, and it's going to talk to a daemon that makes an isolated build of the hello package. So, it's fully hermetic, and that, you know,
that removes the whole class of issues, of non-reproducibility issues that you would have without that isolated environment. Yeah, and so that means that if you look at all these things that we have in that GNU store directory, we have tons of packages and stuff in there. Well, they're all going to be bit identical for everyone,
or nearly there can be issues, you know, but usually it's going to be a bit identical. So, typically, if I look at that GNU store, blah, blah, blah, hello thing up there, well, if I build it actually on my laptop, or if you build it on your laptop, you're going to get both the same hash.
It's going to be identical. So, it's all about reproducible builds, which you've probably heard of. So, this is an effort where many distros are involved. Debian, of course, has been leading the effort, but there's also NixOS, Arch Linux, blah, blah, blah, many distros. It's called reproducible builds,
but we could very much call it verifiable builds. The whole point here is that you don't have to trust binaries that you get from a server. You can always verify that the source that appears, you know, in the package definition that we saw before actually corresponds to the binary that you have, because you can always rebuild locally,
you can challenge the servers that provide pre-built binaries and see if it's the same. So, from a supply chain viewpoint, that's a pretty big deal, I think. In Geeks, we're trying to go a little bit further. So, reproducible builds are nice, but it's not sufficient. Like, if you're reproducing bit-for-bit malicious software,
you still have a problem, right? So, you've probably heard about that Twisting Trust attack, you know, illustrated by Ken Thompson in 1984. That's a long time ago. Well, this is the story. We want to be able to have fully editable code
that's entirely built from source, and actually someone over there in the back of the room with other people has been working on this and has been presenting this last year. We could talk about it for ages, but I have other things to tell you, but I encourage you to take a look at that talk by Janneke two years ago, actually. The thing is we're about to be able to build
the whole Geeks distribution starting from a binary that's just 357 bytes, I think, right? So, pretty big deal. All right, now to be more on topic. So, we have these fancy things, you know, reproducible builds, swappable builds,
building everything from source. That's nice from a supply chain security viewpoint. But, you know, for several years we've had that tiny issue, specifically in Geeks. If you want to update your packages, well, your package collection, the available packages, and the toolset, you would run Geekspool.
So, it's similar to AppUpdate in Debian, for example. That's roughly the same kind of tool. But it's implemented by fetching code directly from the Git repo of the project. And, you know, as you can imagine, you have to think about the implications of this, right?
We're delivering code directly on users' computers, so we'd better be sure they're actually getting, you know, the real code coming from the Geeks project and not something different. For example, if the server that hosts the Git repo is attacked,
well, we'd rather have some mechanism to detect that, you know, to make sure that users are not going to download, to clone a Git repo that contains malicious code, right? So, we need something here. And, you know, we thought about this for quite a long time, actually. And the typical answer to this question is the update framework, tough.
I don't know if you've heard about it. It's sort of the reference for all things update in general. So, it's a specification with implementations in different languages and in different frameworks, like for Python packaging, for Debian, I think, different things. But there's one thing.
It's not quite a good fit for our case. Our case is we're just pulling from a Git repo in the end. The update framework is more about distributions that look like Debian or Fedor where you have binaries on the server and, you know, people are actually downloading those binaries and those binaries are built by machines or developers, blah, blah.
It's a different setup. So, to illustrate that, let me show a bit what the workflow looks like in Geeks. So, here we have what Geeks packagers do. So, as a packager, well, you define packages.
So, for example, Python, and that's the kind of definition that I showed you before, right? And then you can test it with Geeks build Python, for example, like we saw. And eventually, if the packager is satisfied with the package, well, they eventually push it to the Git repo. And as a user, at some point, I'm going to run Geeks pool,
which is very similar to Git pool, except it's also going to compile a few things, but roughly that's like Git pool. And so, at that point, I'm getting the new package definition and I can run Geeks install Python and I'm getting that package. That's the idea.
Optionally, like I said, you can get pre-built binaries. I'm not going to go into details about this. This is optional, but this is something you usually want. But, you know, it's not baked in the model, like you would say in Debian or Fedora. It's really something additional, and because we have reproducible builds,
you know, pre-built binaries, it's substitutable, right? The key thing here is that people are going to pull from the Git repo, and we need to make sure that they are getting the right code, the real code. So, we're really looking at these two things where the users are running Geeks pool
or the build farm that builds packages is running Git pool, and how can we actually make sure they get the right code? And this is all about authenticating Git checkout. It's just Git, after all. There's nothing special here. So, with millions of people using Git, you would think that it's a solved problem, right?
Oh, so I thought. It is not, actually. So, if you go, for example, to GitHub or GitLab, you can see these verified badges. This is a screenshot from GitHub. So, you have verified badges. It's green. It's nice. You have partially verified. What does that mean?
And you have also no badges. So, what conclusion can you draw from that? Is it the real, the authentic repo, or is it not? You know, you can't really do anything with that. So, at that point of the talk, we need to talk about authentication.
Authentication is about making sure we're getting the real thing, you know, the undisputed credibility. So, we would say we want to make sure that people are getting Geeks, the Geeks code, as coming from the Geeks project. That's what it means to me. So, specifically, we want to protect against a number of things.
So, we want to assume that potentially an attacker can gain access to the server that holds the Git repo, and from there, you know, the attacker can push more commits on that repo, or could, you know, introduce malicious changes in many ways, or even make a so-called downgrade attack where the attacker would revert,
or actually remove the latest commits, for example, so that users would be tricked into pulling an older version of Geeks with potentially, like, vulnerable packages and stuff like that. So, this is what we want to protect against, what we want to protect against. There's a couple of additional goals.
We want to make sure we can do offline authentication, like we don't want to, you know, call out to a number of services out there, and, you know, key ring servers, whatever. And, of course, we want to support changing authorizations in the sense that, you know, people contribute to Geeks, and they come and go, right? So, we need to add new people, new contributors, you know, official contributors,
packages, and eventually, maybe we'll remove them. You know, we need to be able to deal with that. So, the solution, well, we're not yet at the solution, but the intuition, at least, that, well, this is Git, so this is a graph of commits, right?
We're just dealing with a graph of commits, so we have commits here, actually, A, B, C, D, E, F, and each commit is made by someone, and the intuition is that we would like to be able to annotate each commit, saying, well, at this point, you know, there's a certain number of people who are authorized to make commits in the project,
and maybe it's going to change, you know, at each node of the commit graph, and, yeah, this is what we would like to do. So, the solution we came up with is to have, basically, inside the REPL, a file that's called geeks-authorization that lists the open PGP keys of authorized committers.
You know, pretty simple. And the thing is, the file lives inside the REPL. And then we need to have a rule to determine whether a given commit is authentic. And so the rule is actually very simple as well.
So, a commit is authentic if and only if it is signed by one of the authorized committers of the parent commit. Got it? This is the main part of the talk. I'm almost done, actually. I could stop here. So, we call this the authorization invariant. So, let's see in practice what this looks like.
So, if I go back to my commit graph here, so let's assume for commit A, this is the first commit, let's assume Alice is authorized there, all right? And then in commit B, Alice is adding Bob as an authorized committer. So, we have this label here. So, at that point, Bob will be authorized to make commits.
And if we look at commit C and E, well, they are made and signed by Bob this time. And it's perfectly fine, because if we look at the parent commit of C, for example, so this is C, the parent commit is here, and we can see that Bob is authorized in the parent commit, right?
And likewise with E, we can have, so, a second branch, the purple branch, and Bob is also committing in that branch, and this is fine, because the parent commit is the same line, and Bob is authorized here, all right? And we can keep going that way, you know, remove people and so on and so forth.
So, the second example, if we take almost the same one, except that on the purple branch here, Bob removes Alice from the set of authorized committers, all right? And then what happens if Alice tries to make a merge commit that has D and E prime as parents?
Well, if we apply the authorization invariant that we showed before, this commit is not authorized, it's not genuine, it's going to be rejected. That's the idea. Yeah, there's a small problem that perhaps you've noticed.
We kind of didn't discuss the first commit, right? There's something to be said about that one, too. Well, we need to introduce the repo in a way. So, we need a way to say, well, this B commit is the first commit where we will start applying the authorization invariant.
So, we call this the introductory commit, and it's needed because, you know, perhaps you have some history already in your Git repo at the time you start using this mechanism, and so we need to be able to say this is the one where it starts. We call that the introductory commit, and so users are expected to know, you know,
what the introductory commit is. So, for example, this is a specification of a channel in Geeks, so a channel provides more packages, and as a user, you would provide not just the URL of the channel, of the repo, but also the introduction information that tells from which commit we're going to start authenticating.
And that solves a bootstrap problem. So, concretely, now that we have this, if we run Geeks pool, and it's been in production for a couple of years, actually, if we run Geeks pool, well, we are going to have a message that says we're authenticating channel Geeks,
and a number of new commits, right, and it's cached, so it's pretty fast. If I tell Geeks pool to use a different URL with a mirror, I'm going to get a warning saying, all right, you chose to use a mirror, that's fine, but be aware that this is not the canonical URL, so perhaps this mirror is stale, but at least we can tell it's authentic because we verified the authorization invariant.
But then, if some evil attacker, you know, does something bad with the repo, then we're going to get an error message directly saying, no, this commit is not signed by an authorized key, you have a problem. And this is it.
So, this is all when using Geeks pool, but there is actually, you can use the same thing, even if you're not using Geeks, or even without using a channel, you can use the Geeks Git authenticate command that works the same, except it's slower level, so you have to specify the introductory commit and the keys that sign the introductory commit.
And the thing is, I think we should all be using that kind of stuff with our Git repos, because right now it's a wild west. But, yeah, the key is a bit not super usable, so I understand we'll have to do some work on this, if you have ideas, I'm open to them. Yeah, and you can specify where the key ring, the OpenPGP key ring is to be found,
because this is not going to talk to key servers, which are very unreliable, as you probably know. Alright, I didn't mention downgrade attacks, I have to be fast, right, I guess. Downgrade attacks, that's another kind of attack we want to protect against.
And the good thing with Geeks is that Geeks keeps tracks of its own provenance. So, for example, when you are running Geeks, you can run Geeks describe, and it's going to tell you, I was built from this commit. So it knows where it comes from, so to speak. And because we have that provenance information, then if you run Geeks pool,
and it detects that it's not going to a commit that's a descendant of the one we're currently running, you're going to have a narrow message, right? Commit coffee is not a descendant of cabbage, of course.
This is pretty cool. And likewise, even at the system level, when you deploy your system, the system itself, the distribution actually running on your machine records, which commit it was built from. So we have the information here if we run Geeks system describe, and so if I run Geeks system reconfigure to update my system,
well, potentially I could get a message that says, no, you're trying to reconfigure to a commit that's older than the one you're currently running. That's a problem. I can override that if I know what I'm doing, but usually you'd better not.
All right. It's time to wrap up, I guess. Yeah. So, to summarize, we have two things here. We have authenticating Git checkouts, which is good for Geeks because it gives us safe Geeks updates. And because we have safe Geeks updates, we can have unattended upgrades, for example. And this is super cool.
You know that the unattended upgrades are either going to work and run the right code, or they're not going to run at all. And this is important, I think. This is inbound and offline, which means all the data needed to perform this authorization, while this authentication step is all inside the Git repo. There's no need to talk to key servers and stuff.
And you can and should use that kind of tool on your Git repo, I think. We really need to think collectively about this issue. And we have, again, protection against the ungraded attacks, which is good for unattended upgrades, and it's deployed in Geeks for a while now.
There's a paper, if you want to see all the nitty-gritty details, there's a URL here. And, yeah, to conclude, I'd like to think a little bit, to reflect a little bit about all these issues of supply chain security. I know I'm sharing this one with speakers about SeekStore, for example, and other projects,
and we have a different approach to things. For example, with Geeks, we have a unified deployment toolbox, so we are very much talking about end-to-end integration of the toolset, verifiability with reproducible builds, for example, auditability, we have the commit graph, you know, we have all the details available at our disposal,
when, you know, often popular approaches are more about assuming that you have a different set of tools, you can have a distro, you can have Docker, you can have Kubernetes, whatever, and you're just combining everything and thinking about artifact flow integrity, attestation, version strings, and stuff like that.
So, I think the key is to really think about going from source code to deployed binaries, that's very much the free software ethos as well, and thinking about ensuring we have proper provenance tracking, and the ability to verify things. This is it. Thank you. Thank you, Ludovic.
We have three minutes for questions. Hello, thank you for the talk. A really common workflow is to use GitHub to merge pull requests,
and whenever you merge pull requests, there usually is a merge commit signed by GitHub. How would you go about allowing merges by GitHub without allowing GitHub's keys to be used for arbitrary commits? That's a very good question. Actually, there's probably a limitation of this model,
so we're not using GitHub or even GitLab for gigs, and actual developers are making merge commits, for example, but typically for automated merge commits like you have in GitHub, it's not going to be practical. That's a limitation, yeah.
Hi. Thank you. First of all, thank you for your brilliant presentation. I see that Guix, or Gix, I haven't... Gix, yeah, thanks. He's a very promising package manager, or even the Linux distribution.
I probably have some off-topic questions regarding to your talk, but I still believe that you can answer it. It would be enough yes or no for me.
Is there some kind of cross-compilation supported by Gix? Gix, sorry. Yeah, there is cross-compilation support, yes. You can even cross-compile systems. Quick question. Thank you so much for your talk. I have a quick question. I saw you're using PGP keys to verify the commits,
but these days you can also use SSH keys to sign your git commits. Is this also supported in Gix? No, it's all open PGP. That's a good question. We started before Gix supported anything other than open PGP, actually. Yeah, so it's a trade-off, I guess.
Have you considered upstreaming this into Gix? Oh, here. Oh, sorry. Have there been any ideas about upstreaming this into Gix itself? I did consider it. It's a bit of work, I guess. Also, we have very tight integration with a small-scale open PGP implementation
that can only do signature verification. So that would mean also having that into Gix itself, which is quite a bit of work. But I think it should be in Git proper eventually, yes. Okay, final question here. Thank you. Have you considered the six-door integration with Gix?
Oh, sorry, can you repeat? Have you considered the six-door integration? Is it possible? Is some work in that direction happening? No, there's no work in that direction happening as far as I know.
I guess I'm not sufficiently familiar with six-door to see how it could integrate with Gix, but I don't know. Maybe there's something we could do. Thank you. Thank you, Ludovic. Five-minute break.