Automatic trusted publishing with PyPI
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 131 | |
Autor | ||
Mitwirkende | ||
Lizenz | CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben | |
Identifikatoren | 10.5446/69406 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
00:00
ComputersicherheitWeb logWeg <Topologie>Web-SeiteVerbindungsloser ServerSoftwareDokumentenserverIndexberechnungDistributionenraumElektronische PublikationFormale SpracheEntscheidungsmodellSpeicherbereichsnetzwerkSystem-on-ChipToken-RingWeg <Topologie>ComputersicherheitBitStandardabweichungPasswortGruppenoperationGarbentheorieZeichenketteEDV-BeratungKontinuierliche IntegrationInstallation <Informatik>Automatische IndexierungSystemprogrammKontextbezogenes SystemProjektive EbeneProzess <Informatik>SoftwarewartungInternetworkingOpen SourceGeradeDokumentenserverToken-RingWeb logComputeranimationVorlesung/KonferenzBesprechung/Interview
03:49
Token-RingGruppenoperationObjektorientierte ProgrammierspracheGruppenoperationProzess <Informatik>VariableDistributionenraumBildgebendes VerfahrenPasswortParametersystemGeradeDatenverwaltungProgrammierumgebungElektronische PublikationToken-RingComputeranimation
04:59
WendepunktToken-RingGruppenoperationDruckverlaufFließgleichgewichtDatenverwaltungMereologieProzess <Informatik>Verband <Mathematik>PasswortArithmetisches MittelRechenschieberToken-RingGruppenoperationKontinuierliche IntegrationComputeranimation
06:01
Token-RingInternationalisierung <Programmierung>UmwandlungsenthalpieDatenverwaltungGruppenoperationVerkehrsinformationQuellcodeKonfigurationsraumToken-RingFehlermeldungDokumentenserverKontinuierliche IntegrationElektronische PublikationProzess <Informatik>MehrplatzsystemMomentenproblemGeradeSchnittmengeHyperbelverfahrenComputeranimation
08:56
GoogolGrundsätze ordnungsmäßiger DatenverarbeitungGruppenoperationLesen <Datenverarbeitung>ProgrammierumgebungKonfiguration <Informatik>VierzigAggregatzustandIdentitätsverwaltungKontinuierliche IntegrationDienst <Informatik>DatenfeldService providerZweiComputeranimation
09:19
ProgrammierumgebungLesen <Datenverarbeitung>GruppenoperationIdentitätsverwaltungGoogolService providerCloud ComputingDokumentenserverAuthentifikationToken-RingStandardabweichungProzess <Informatik>BitSelbst organisierendes SystemCASE <Informatik>BenutzerbeteiligungInformationDienst <Informatik>IdentitätsverwaltungDokumentenserverAuthentifikationLoginProgrammierumgebungEinfach zusammenhängender RaumAnalytische FortsetzungElektronische PublikationPasswortService providerIterationProgrammverifikationFacebookGeradeVorzeichen <Mathematik>Einfache GenauigkeitFront-End <Software>Web SiteComputeranimation
12:02
Cross-site scriptingVersionsverwaltungMomentenproblemDokumentenserverBildschirmmaskeComputeranimation
12:49
Netzwerk-gebundene SpeicherungToken-RingRepository <Informatik>CAN-BusToken-RingDokumentenserverElektronische PublikationDatenfeldBitPunktwolkeMultiplikationsoperatorURLKlasse <Mathematik>GruppenoperationCASE <Informatik>GeradeSelbst organisierendes SystemVorzeichen <Mathematik>Computeranimation
14:02
SchwebungRepository <Informatik>Token-RingKonfigurationsraumGruppenoperationSchlüsselverwaltungFächer <Mathematik>Dienst <Informatik>MultiplikationsoperatorBildschirmfensterSelbst organisierendes SystemProgrammverifikationGenerator <Informatik>URLIterationLokales MinimumKonfigurationsraumAuthentifikationDokumentenserverElektronische UnterschriftKontinuierliche IntegrationPunktProzess <Informatik>Public-Key-KryptosystemToken-RingMomentenproblemDatenfeldOffene MengeAnalytische FortsetzungIdentitätsverwaltungPunktwolkeDatenverwaltungAggregatzustandUmwandlungsenthalpieInformationDatenflussComputeranimation
18:23
VersuchsplanungToken-RingKonfigurationsraumAuthentifikationIterationAnalytische FortsetzungProzess <Informatik>Endliche ModelltheorieAggregatzustandSoftwarewartungComputersicherheitGruppenoperationComputeranimation
19:25
VerweildauerComputersicherheitGemeinsamer SpeicherGruppenoperationProzess <Informatik>PunktURLFront-End <Software>SoftwarewartungBitAggregatzustandComputersicherheitService providerOrdnung <Mathematik>Endliche ModelltheorieATMKonfigurationsraumElektronische PublikationWeb logProjektive EbeneGruppenoperationKontinuierliche IntegrationPunktwolkeMultiplikationsoperatorComputeranimation
21:56
FehlermeldungRollenbasierte ZugriffskontrolleStochastische DifferentialgleichungEntscheidungsmodellSchriftzeichenerkennungEindringerkennungGruppenoperationEndliche ModelltheorieCASE <Informatik>ComputersicherheitMultiplikationsoperatorURLProgrammierumgebungUnrundheitKonditionszahlDifferenteTypentheorieProgrammverifikationProgrammbibliothekToken-RingAutomatische IndexierungIndexberechnungSelbst organisierendes SystemMereologieDokumentenserverGeradeBitUmwandlungsenthalpieService providerOpen SourceCookie <Internet>PlastikkarteRechter WinkelDatensatzKonfigurationsraumVorlesung/KonferenzBesprechung/InterviewComputeranimation
Transkript: Englisch(automatisch erzeugt)
00:04
So as a short introduction, my name is Facundo Dwezca. I'm a senior security engineer at Trail of Bits. We're a medium-sized security consultancy. We do not only security work but also research and open source engineering work. We also have a very active technical blog.
00:21
If you're interested, you can check that out. But yeah, let's get started. So this talk is divided in three big sections. The first one is what is trusted publishing? The second one is what problem does trusted publishing solve? Why would you want to use it? And the third one is how does it work, the technical details?
00:44
So what is trusted publishing? Trusted publishing is a way of uploading packages to PyPI. This is a beginner talk. I'll give a bit of context because not everyone is familiar with PyPI. PyPI is the Python Package Index.
01:02
If you have ever installed a Python package using pip install and the name of your package, that package then was downloaded from the internet and the Python Package Index is where that package was stored. So if you're a maintainer and you want to provide your packages to everyone, you would upload them to PyPI.
01:24
This is what it looks like. If you go to PyPI.org, you can search the projects. You can create an account if you want to upload your own package, et cetera. So trusted publishing is a way of uploading packages to the Python Package Index.
01:41
But before we actually see what it is about, let's see first an example of the usual uploading workflow. The usual uploading workflow requires the user to create something called an API token. An API token is like a password that gives you permission to upload to your PyPI account.
02:00
This is what it looks like. If you go to PyPI and you log into your account, you can create this token. This token is like a password. Don't worry about this one in particular because it's no longer valid. But this is the token that you will then use to upload your package.
02:20
So once you have the token, you can build your package and upload it. The first two lines here just run a command that generates the package artifacts that you would want to upload to PyPI. Then on the third and fourth line, we specify a username and password. When using API tokens, the username is always the same. It's that underscore token, underscore string.
02:43
And the password will be the API token we generated on the previous step. Once you have set that, you can run the command twine upload. Twine is the standard command line utility to upload packages to PyPI. And you specify the path with the artifacts you want to upload.
03:01
Once that's finished, then your package is then available on PyPI. And anyone who runs pip install and the name of your package will be able to download and install it. So this is the manual traditional way. There's another way where you use a continuous integration job.
03:21
This could be, for example, GitHub Actions. This is just by a lot of Python package maintainers where they have on their repository continuous integration workflow which triggers on, for example, a new release mate if they push a new tag for the package.
03:40
When they do that, a new workflow is triggered and that workflow will build the package automatically and upload it to PyPI. So this is an example of what that would look like, the workflow definition for the GitHub action that uploads the package.
04:01
As you can see, we define a job called PyPI publish. We set the name and the image we want to use to run it. And on the steps here, I omit some steps where we would retrieve the files we want to upload. But then the important thing is that final step called publish package distributions
04:20
which uses a pre-made GitHub action called GitHub action PyPI publish which takes in the last line a parameter called password. And this is where you would insert the API token you've generated on the first step. As you can see here, we're using an environment variable. This is usually stored, yeah, exactly.
04:41
Yeah, it's not an environment variable but it's exposed as an environment variable in the job by the GitHub secret manager. And the key thing here is that the user needs to grab that PyPI API token from PyPI, copy paste it into the GitHub secret manager
05:02
and then it's available when this job is run. So that's the important part. So we're having some technical issues. So as we said, what is trusted publishing? Now that we know the usual workflow for uploading packages to PyPI,
05:20
we can say that trusted publishing is a way of doing that. It's useful for continuous integration workflows such as GitHub actions, GitLab and a couple more we'll see later. And the key thing is that we no longer need to manually manage these API tokens with trusted publishing. So why? Why would we want to get rid of the API,
05:42
the manually managed API tokens? Why would you use trusted publishing? So as we saw, the usual workflow requires the user to manage these tokens. An API token is a secret. It's long lived which we'll see what that means in the next slide and it's manually managed. So the fact that it's secret means that it can be compromised.
06:01
It's like a password, it needs to be protected as such. It's long lived which means that if it is compromised, then whoever managed to gets it can utilize it until the owner of that token realizes that it was compromised and manually revokes it.
06:20
And as all long lived secrets, it can be forgotten about which is also a problem. And it's manually managed which means that it can be more easily compromised due to user error. As an example here, it would be the user accidentally committing the secret as plain text in their source repo.
06:40
And it's easy to overscope. An API token can have different permissions for which package is allowed to upload. And you can select a single package but you can also overscope it and say this token has permissions to upload all of a single user's packages which is usually not a good practice.
07:02
So how does Trusted Publishing solve those problems? With Trusted Publishing, users don't need to manage API tokens. So what we had before here where in this GitHub action we specified manually this token which was then accessed from the GitHub secrets manager. We can just remove that and this will work
07:23
and we'll see how that works later. But yeah, we can basically just do that. We need to add a couple of more lines here. We'll also see what that's about. But as you can see, there's no longer a secret that needs to be managed by the user and copy pasted from PyPI to the GitHub secret manager.
07:45
So how does that work? How if we're no longer using an API token generated by the user, how does PyPI authenticate the uploading job that GitHub tries to accomplish?
08:01
So the first step is the user goes to PyPI, they log in into their account and they configure something called a Trusted Publisher. A Trusted Publisher is basically a configuration that says who is allowed to upload packages to that specific PyPI account. For example, we can say a specific continuous integration workflow
08:22
inside a specific GitHub repository will be allowed to upload files for a specific PyPI package. Once a Trusted Publisher is stored and saved, from that moment on, any publishing jobs coming from that GitHub Actions repository will be able to successfully upload packages.
08:43
They will have permissions to do that. And there's no need to provide for a user to provide an API token. This is what the UI looks like if you go to PyPI on your account settings and you fill in the different fields.
09:04
So first you select which continuous integration service you want to use. Currently we support GitHub, GitLab, Google, and Active State. This is called an identity provider in the, one second please, yes, on OIDC,
09:22
which we'll see what that means later, but you select this first, then you configure your repository, basically the username or organization name and the name of the repository. Then you configure the name of the workflow that is allowed to upload packages. In this case we just use a generic release.yaml file.
09:41
This will be the file inside the repository that has the workflow definition for uploading packages. And optionally an environment name, which is a feature that GitHub offers to have more segmentation on the permissions on who is allowed to run a workflow or not.
10:00
After you click add, upload generic generating from that specific workflow in that specific repository using that specific environment will be automatically authenticated. So as we saw before, the workflow definition, this will be the release.yaml we saw before that has this API token. We can just change this and remove that line.
10:25
So how does authentication work in the background? Authentication is based on OpenID Connect. OpenID Connect is, if you have ever used single sign-on, instead of saying a username and a password on a website,
10:40
you click sign in with Google, Facebook, or any of the other big providers that uses SoIDC in the backend. It's a standard for identity verification, basically. So the first step is the GitHub job after building the packages, but before trying to upload them, generates something called an OIDC token.
11:03
An OIDC token is a JSON web token. We'll see an example later. But yeah, it basically has information on the workflow and the repository that is running this job. Then the GitHub job sends that OIDC token to PyPI.
11:22
PyPI verifies that the OIDC token indeed is coming from GitHub, or whichever is the continuous iteration service. It verifies that the OIDC token is coming from a repository that was previously configured as a trusted publisher. And if it passes all of those certifications,
11:40
it will return to that GitHub job a short-lived PyPI API token, which can then be used by the GitHub job to upload the packages. So let's visualize this a bit because I understand that it's hard to see without a graphic. So first we start with the user and PyPI.
12:01
The user here will be the owner of this my-package repository, and it will be the owner of the same package on PyPI. So that user goes to PyPI and creates a trusted publisher for that package. This is the same thing we saw before. They will fill this form and click Add.
12:21
Once that's done, as we saw before, from that moment on, the GitHub repository is allowed to upload to PyPI. Then the user triggers a release workflow. This can be done manually or automatically, depending on how the workflow is defined. But yeah, think as the user, maybe pushing a new tag for the repository
12:40
with a new version. So GitHub triggers that workflow, and first it builds the release artifacts, the packages we want to upload to PyPI. One moment, please. Yes. So after the artifacts are built, it generates this OIDC token.
13:01
The OIDC token is basically a JSON file. It's signed by GitHub. Some of the important fields we can see here are, the first one there contains a repository name and a repository organization or username. The second highlighted line is the release.yaml file,
13:21
the one where the GitHub action is being run from. It also includes the repository ID and the owner ID. This is useful for preventing a class of attacks called account resurrection attacks. If we have some time at the end of the talk, we can go a bit over what that is. But yeah, those are included there for that.
13:43
And finally, we have another important field, which is the issuer field, which is a URL that defines who is the one generating this token and signing it. In this case, that's the URL that all GitHub actions use. GitLab has its own URL, Google Cloud has its own URL, et cetera.
14:03
So GitHub generates that token, which contains the information of the workflow and the repository that is trying to upload packages. And it sends that to PyPI. PyPI tries to verify this OIDC token. So how does it do that? First, it grabs, it takes that issuer URL,
14:22
the one for GitHub. It checks that it's on a whitelist of expected URLs. So right now we expect URLs from GitHub, GitLab, and Active State and Google Cloud. And if so, it can access the well-known URL, which contains the open ID configuration.
14:40
This open ID configuration, it looks something like this. The important field here is, well, they're all important, but the one we're going to explain now is this URL here, which contains GitHub's public keys at this point in time. So PyPI goes to that URL, gets GitHub's public keys,
15:03
and checks the OIDC token signature against public keys. This basically allows PyPI to verify that the token is indeed coming from GitHub, and there was no one impersonating, there was no one else generating that token and trying to impersonate GitHub.
15:20
If that verification passes, then PyPI checks that the identity claims we saw before, like the repository name and the organization name and the workflow name, they match an existing trusted publisher, the thing that the user created previously on PyPI. If it does, and a trusted publisher exists, then PyPI generates a short-lived API token
15:42
that has permissions to upload to that specific Python package, and it returns it to GitHub. So now GitHub has a short-lived PyPI API token that's only valid for 15 minutes, so ideally it will only be used for the duration of this continuous integration job,
16:00
and then it can finally use it to upload the packages to PyPI. And this is basically the whole flow of how authentication happens, from the moment you initially configure the trusted publisher to the moment the artifacts are actually uploaded. Note that the first step, the one where you create the trusted publisher, you only need to do it once.
16:21
You don't need to do it for every upload. This gets stored on PyPI, and then from that moment on, any jobs from that repository will be allowed to upload the package. Sorry? Yes, sorry, one specific workflow inside the repository. Yeah, thank you.
16:41
So yeah, now let's recapitulate, knowing what we know about how that works. Just publishing is a way of uploading packages to PyPI for continuous integration workflows without needing to manually manage API tokens. As we saw before, authentication happens automatically between PyPI and GitHub, or other continuous integration services.
17:04
These were the problems we have with short-lived, manually managed API tokens. So as we saw before, there's a secret that can be compromised. They are long-lived, and they are manually managed. So with trusted publishing, we solve a lot of those issues.
17:20
The user does not need to manually manage secrets anymore. Authentication happens automatically. The process uses short-lived API tokens, both the OIDC token generated by GitHub, and the PyPI API token returned by PyPI. They are both short-lived. I believe the OIDC token is 30 minutes,
17:43
and the API token is 15. An hour, I think. An hour, thank you. That's it. If any of these tokens is compromised, they will only be useful for a small window of time, and there's no risk of the user forgetting about them after creating them.
18:00
And it requires a simple one-time configuration, the one we saw before, which means there's no need anymore to copy-paste secrets between PyPI and the continuous integration service. And the configuration for the trusted publisher guarantees a minimal API scope. The trusted publisher will only generate API tokens
18:22
that are allowed to upload to a specific package, whereas before, with an API token, the user could configure it to upload to all of the user's packages, which is usually something we don't want. So what are the security considerations of this model? The OIDC tokens generated in the first step by GitHub,
18:42
and then the PyPI API tokens returned by PyPI are still sensitive material and should not be disclosed. The fact that they are handled automatically makes it harder to accidentally share them, but it's still possible. And then configuring a trusted publisher
19:01
requires a trust relationship with the GitHub actions, the GitHub actions job, or the continuous iteration job being run. So yeah, that state must be protected. In this case, as an example, it would be basically configuring permissions correctly
19:22
so that only maintainers are allowed to run the publishing jobs. Second place. The good news is that if your project already uses GitHub actions or any other CI workflow to upload packages, those permissions should be already in place
19:40
and there's no new thing that you need to trust in order to use trusted publishing. There's a security mode doc in the PyPI documentation. If you're curious about all of the details, these are the two big things, but also we have third provider security models,
20:01
so GitHub is different from GitLabs and all of the others. And yeah, some final details. Some history. The trusted publishing was initially rolled out in April last year. At that point in time, it only supported GitHub actions. Over the past 12 months, there has been a lot of work to add support
20:21
for these other three, GitLab, Google Cloud, and Active State. Yeah, a bit thanks to Google, who funded a lot of the work in the past year and also in the previous year for trusted publishing, and to Dustin Ingram and all of the PyPI maintainers who designed and reviewed all of this work. Yeah, currently around 1,100 projects
20:41
are actively using trusted publishing. This means that they not only created a trusted publisher, but they have used it to at least upload one package on one file. And trusted, this is something we didn't mention, but trusted publishing can also be used as a way to create projects. What we saw before was creating a trusted publisher
21:02
for an existing project. You can create something called a pending trusted publisher for a project that still doesn't exist, and it will be created on PyPI on the first upload of your file from the continuous integration job. And yeah, we have a lot of documentation
21:20
on how to modify your current workflow to use trusted publishing on that URL. We have configuration instructions for the different continuous integration providers, the security model we saw before, and how this is all implemented in the backend. So yeah, I'll leave you with these two URLs. The first one is the documentation we saw before.
21:43
The second one is a blog post we have on our blog describing in a lot more detail all of the technical details on how this works. And that's it, thank you very much. Yes, thank you Facundo.
22:02
I guess we have plenty of time for the Q&A, so if you have any questions, yes, please line near the microphones. Hello, thank you for your talk. Do you have any experience with using trusted publishers with GitLab? Yes. Okay, so. No, yes, sorry. Because I was trying it out last week,
22:22
just the week before I talked, and I struggled with a few steps, I got it to work, but then maybe we can meet and. Yeah, yeah, just to talk a bit more about that. The GitLab trusted publisher is very similar to GitHub's. You can specify the repository name, the organization name,
22:42
the workflow name, and the environment name. It has a couple of difference on the security model. For example, there's no protection against account resurrection attacks, as with GitLab, because of how GitLab API works when trying to get those IDs
23:01
for a specific kind of repositories. But yeah, there's a tricky part for GitLab that's not the case for GitHub, which is GitHub, with GitHub you can use pre-made GitHub actions, right? You can just copy paste that GitHub action named PyPI publish, and it will automatically take care of all of the Twine upload
23:22
and all of the tricky things of exchanging the OIDC token for a PyPI API token. GitLab does not have this concept of a library of pre-made actions, so you need to basically do this yourself. Yeah, I did it, yes. So in the condition we have examples of how to do that, which basically translates into calling PyPI's API
23:47
with your OIDC token, and exchanging it for a PyPI API token. But yeah, it's not as trivial as what Git has. Okay, thank you very much. Thank you. Any other questions?
24:05
Hello, thanks a lot for the talk, that was very interesting. I have a question, which is you have some private packaging which are reusing some of the PyPI API, but not necessarily using PyPI, and how fast can we expect them
24:21
to also have this type of infrastructure? So how open is the work that was done on PyPI, and how can it be reused by other private packaging providers? Thank you. So if I understand correctly, this is for non-PyPI indices, right? For private indices. Yeah, so the work is totally open source.
24:45
There's no, there's nothing hidden. You don't need any particular agreements with GitHub, GitLab, or any of the others. You only need to access that public URL that well-known URL that contains the configuration, and you can just configure or add the feature
25:01
to your index to access that URL and do the verification yourself. But yeah, as to where will that be implemented by private indices, that's up to all of the individual indices.
25:24
Yeah. Oh, thank you very much. Yeah, for, in case it wasn't heard on the recording, yeah, the GitHub action is index agnostic, so you can just specify the URL of whatever index you use,
25:41
and if that index, that package index supports trusted publishing, it will be transparently used by the GitHub action to upload the package. Okay, so if there are no more questions, the final thing is the small thank you card. Also, if you don't have any allergies to nuts,
26:03
there is a small cookie, yeah? Thank you very much. Thank you for presenting, and please give a final round of applause.