Zen of Python Dependency Management
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 118 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/44843 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
EuroPython 201928 / 118
2
7
8
15
20
30
33
36
39
40
45
49
51
55
57
58
63
64
66
68
69
71
74
77
78
80
82
96
98
105
107
108
110
113
115
00:00
Data managementSoftwarePoint cloudGoogolSpacetimeCodePauli exclusion principleCompilerSynchronizationMetropolitan area networkFocus (optics)Point (geometry)EvoluteComputer fileBitShared memoryResultantPhysical systemProcess (computing)Projective planeCartesian coordinate systemRevision controlIntegrated development environmentCodeSoftwareNetwork topologyMorphingRight angleData managementMultiplication signProduct (business)Pauli exclusion principleNumberWordPrice indexExistenceOpen sourceSoftware developerOrder (biology)Library (computing)NamespaceWebsiteTrailSound effect1 (number)AveragePersonal identification numberCryptographyHash functionVirtual realitySlide ruleReal numberTable (information)Configuration spaceBuildingFluid staticsLevel (video gaming)Information privacyLink (knot theory)TwitterSynchronizationStandard deviationTerm (mathematics)State of matterInfinityInformation securityPresentation of a groupUniqueness quantificationMereologySoftware testingFile formatContext awarenessOvalLecture/ConferenceComputer animation
10:05
Electronic meeting systemData managementLine (geometry)String (computer science)Vulnerability (computing)Revision controlHash functionComputer fileMathematicsGastropod shellPoint (geometry)Virtual realityCodePredictabilityProjective planeImage resolutionQuicksortSoftware maintenanceOpen setBitMereologySubsetMultiplication signGame controllerSpectrum (functional analysis)SoftwareAuthorizationBasis <Mathematik>File formatWeightContext awarenessConfiguration spaceAxiom of choiceSoftware repositorySweep line algorithmPatch (Unix)Endliche ModelltheorieTerm (mathematics)BuildingLevel (video gaming)Integrated development environmentVariable (mathematics)Open sourceDefault (computer science)Contrast (vision)Physical systemData managementSoftware testingLatent heatSheaf (mathematics)Fitness functionAddress spaceVirtualizationDot productSoftware developerCore dumpPlug-in (computing)Total S.A.Ocean currentLoginSurjective function
19:57
Projective planeElectronic mailing listAutomationMereologyPoint (geometry)Series (mathematics)JSONXML
20:19
MathematicsDescriptive statisticsBitError messageQuicksortSoftware bugElectronic mailing listSoftware maintenanceResultantSoftwareCodeComputer fileStress (mechanics)Revision controlPhysical systemSoftware testingHypothesisType theoryDependent and independent variablesContinuous integrationProjective planeProcess (computing)Point (geometry)Repository (publishing)Multiplication signLatent heatEquivalence relationBranch (computer science)Fitness functionLibrary (computing)MereologyContext awarenessMaizeService (economics)LoginPatch (Unix)Shift operatorCategory of beingGraph (mathematics)INTEGRALoutputCircleCommitment schemeXML
26:20
Installation artRepository (publishing)CircleComputer-generated imageryLocal ringConfiguration spaceBitConfiguration spaceCodeCircleType theoryComputer fileScripting languageLevel (video gaming)Inheritance (object-oriented programming)Source codeJSON
27:12
Level (video gaming)Data managementBitSoftwareTerm (mathematics)Moment (mathematics)Utility softwareLimit (category theory)Online helpoutputCircleData miningJSONXMLUML
28:28
Lecture/Conference
Transcript: English(auto-generated)
00:06
Thank you very much for the introduction. Like the gracious man said, I am Justin Mayer. I won't be taking questions after the talk, but I love to connect with people. I would love to talk to you about some of these topics.
00:21
So by all means, please come up to me afterward. I'd be really excited to talk to you. I'll also post the slides on my site after the talk. You can see some links to Twitter and Mastodon for those new kids on Mastodon. I'll post the links to the slides there when I'm done. So I'm originally from Los Angeles, California.
00:41
Last year, I moved to a small village in the Italian Alps. I work on software related to privacy and security, and I also write about it at justinmayer.com. In my spare time, I maintain a few open source projects including Pelican, which is a static site generator.
01:00
And today I'm excited to talk to you about the zen of Python dependency management with a little sprinkle of package release automation tacked on. So why is dependency management important? It's important because we share code with one another. We incorporate open source software that other people wrote into our
01:23
applications and libraries in order to speed up development time and to improve the quality of the things that we ship. It doesn't make sense, you know, for a lot of us to write our own crypto libraries. So we go out and we find someone who knows tons more about that stuff than we do and we grab the bits and pieces that are useful and we integrate them.
01:43
The average package on PyPI relies on someone determined two or three dependent packages, which is not a big deal in and of itself, but that can have cascading effects where those depend on two or three other dependent packages. And as you take this to its logical conclusion,
02:02
you can see how you kind of have this rather significant dependency tree. And this is important also because of the notion of reproducible builds and reproducible builds are important because say you're a company and you have new employees or you're an open source project and you have new users.
02:20
You want them to be able to bootstrap your project as easily as possible and they can't really do that if they're getting dependency conflicts. And so this is a way of, among other things, making it so that they can easily get your software up and running. And you also have other environments. It's not just development. There's
02:41
testing, staging, production. You want to make sure that the thing you're working on in one of those environments is going to behave in as similar a way as possible in the other environments. So reproducible builds are important for that reason. Now a related topic is packaging and packaging is is also important because I
03:02
want to use code that someone else wrote for reasons that I just described. And I want that to be done as easily as possible and packaging is what helps make that more facilitated and easy. And at the same time I also want to share code with other people so that they might benefit from the stuff that I wrote and to make it as easily as possible for them to use it.
03:24
So consequently there's a lot of talks about packaging and there's there's been a number of them here at this conference alone, which I think is really interesting and they all kind of overlap in interesting ways. So the packaging ecosystem in 2018 looks a little bit like this.
03:43
And I say 2018 because we'll talk about some of the newer ones. But so there's like some tools over there on the left, and there's some files over there on the right. And this is kind of what things look like today. And anyone who knows anything about these tools and files can tell that over the years
04:02
packaging has kind of morphed and accumulated like new, you know, appendages and as it's kind of, you know, moved along it's shed other things. But, you know, at some point it kind of starts to look like this conglomeration of, you know, accreted bits over time. And I think that there's a tendency to look at it in a somewhat negative
04:28
way. And I think the cool thing is that some people are saying like, no, actually maybe we can look at this a little differently and look at it as we can celebrate all of these little bits. Does everyone know what a platypus is?
04:42
I don't know the German or French words for platypus. So if you don't know, ask someone who does know. A platypus is a creature, has a duck bill, it lays eggs, it's a mammal. It's a really strange animal. So someone came up with this idea. I wish I could give credit, but I can't recall where I saw it. They said we're gonna celebrate the strange and continuing evolution of Python's packaging systems.
05:07
So I think a platypus is a great metaphor for Python packaging. And there were some of the things they mentioned are, as far as the metaphor, it's a bit odd to start with, but then you realize it's the result of evolution in very unique circumstances.
05:25
And it's actually quite cute and friendly most of the time. And it can incapacitate a human with its venom, just like packaging. So it's really difficult to fully grasp the finer points of how these pieces fit together
05:40
or how they have evolved to their present state. So I'm not gonna try and I'm instead gonna focus on the tooling that exists today and some of their practical applications. For some of the finer points, Dustin Ingram, who works on PyPI, gave a great talk last year on PyPI and packaging and its history. Hinek has given great talks. He gave one yesterday on how to manage a project when it's not your job.
06:05
Mark Smith gave a great talk on packaging in general and how to get something on PyPI. I want to talk a little bit about some of the new stuff. So there's PEP 517 and 518. 518 defines a new configuration file called the pyproject.toml file.
06:22
And this has the potential eventually to replace setup.py, requirements files, setup.config, manifest.in, and probably other configuration files I'm leaving out. It doesn't, this PEP is long like most PEPs, but it actually specifies very little. It specifies a file name,
06:43
the file format, which is toml, a build system table, and a tool table. It's a little bit like the Wild West with this right now where every build system is allowed to put stuff in it and can kind of do the build however it wants. There's no real standard in terms of how they do that.
07:04
Some folks feel like the side effect of this could be a kind of vendor lock-in where you use one particular tool and because there is no standard, then you're kind of locked into it, and it'll be tough to migrate. I suppose that depends on how hard it is to convert, how one tool defines their dependencies, say, in this file, and
07:25
then to migrate that. We'll find out. Still early stages. The build system can be defined in this way. This is how you would define it in a setup tools context. For poetry, you define it like this, and this basically just tells
07:43
the system that we're using poetry to manage and build this project. You can add different configuration on a tool namespaced level. This is how poetry keeps track of your dependencies.
08:00
You can only use a name in the tool namespace if you are the owner of that named package in PyPI. Let's talk a little bit about the PyProject.toml file, which will be relevant when we talk about some of these newer dependency management tools. When I talk about them,
08:23
I'm going to mention the dates of last release, when they were released last, because that relates back to the first slide, you know, which is why is packaging and dependency management important, and that's to distribute and share software, and software that's sitting in a version control system and not inside a shipped release is software that for the most part,
08:43
for average users, may as well not exist. So to me, steady releases are an indicator of project health and thus are important. First, I'm going to talk about pip tools. So pip introduced requirements files, which allows you to pin your dependencies.
09:01
You can have hashes of those dependencies, so you know that the thing you are putting in your requirements file, when it installs, it will actually be that file and not something else. And this improved reproducible builds and security. The problem is that pin requirements get outdated and they need updating from time to time, and that's a bit of a hassle, and
09:22
pip tools has some ways to make that easier. The pip compile command lets you compile a requirements.txt file from your dependencies, and those can be specified either in setup.py or in a requirements.in file. Then you can use pip sync to take the
09:41
compiled requirements file and then update your virtual environment with the dependencies you've declared inside those requirements. And that makes sure that your different environments, wherever they are, are fully up-to-date and reflect the requirements that you've specified.
10:00
So that's a very focused tool and it does something, you know, very discreet. In contrast, pipenv does a lot of things. It manages virtual environments, so you don't have to. It audits packages for security vulnerabilities. It does dependency resolution to make sure that one package might depend on a different version than another package needs.
10:28
So like pip tools, it keeps your dependencies updated and your virtual environment current. It does this in the context of a pip file instead of a requirements file or the newer pyprojects.toml file.
10:42
When I last used it, and I'm going to express opinions fairly freely, it was relatively slow. I don't know the dependency resolution I'm referring to. I don't know if the dependency resolution has improved since then, but it was a bit slow. And, you know, lots of software is opinionated, but I feel like pipenv is quite opinionated on the spectrum of opinionated software.
11:09
It replaces the requirements.txt file, but not setup.py or setup.config or manifest or any of the other parts of the setup tools ecosystem. So you still kind of have to manage all of that
11:22
stuff and you still have to put your high-level dependencies in setup.py and your pinned dependencies in your pip file. Just in terms of how the project is managed, it vendors a lot of packages. It uses its own patched versions of pip, pip tools, and maybe other things.
11:41
That's not really my style to just like, you know, wholesale sweep a bunch of software into my repo, but I'm sure they're doing it because they have so many different things they're trying to accomplish and it's the same way for them to manage it, so I can understand the benefits there. Their virtual environment management, if you just want some other tool to manage your virtual environments,
12:02
you don't want to know where they are, you don't want to know what they're named, you just want something else to do it. It's great for that. For me, I prefer to manage them myself, and so I felt like I was kind of struggling with it. And so one of the things it does is I believe it hashes the path to your project, takes that hash and then appends it to your virtual environment name.
12:23
When you sort of depend on a predictable virtual environment name for other tooling, this can be kind of problematic. Just a silly example is I use the fish shell, and I have tooling that shows me the virtual environment that's activated, and I don't want to see this like big ugly hash like, you know, in my prompt line every time,
12:41
and so I had a really hard time getting around that, and that's obviously minor, but we have opinions. So, you know, this notion of unpredictable virtual environment names, like apparently I wasn't the only one who had a problem with this, there's lots of open issues about it, maintainers kind of were like, eh, we're not going to address this anytime soon, and they just keep closing them.
13:02
But, you know, again, so by default, another thing to note is that when you run pipenv add to add a new package, it updates your locked packages. Now, you can disable that, but for me, I found it to be a strange default.
13:20
I just want to add this package to my project, and then all of a sudden it's starting to update all of my locked packages. And so for me, it kind of violates the principle of least surprise, but again, you can disable it, and I'm sure they had good reasons for making that the default. The uninstall command will remove packages, but not its dependencies.
13:42
So you have to use pipenv clean to remove the dependencies. So it seems to me like it's an extra step. Maybe there's good reasons to separate those steps, but it just wasn't what I expected. It can also manage .env files if you use environment variables and you want them loaded into your environment.
14:01
That's cool if you need that. For me, I managed that at the shell level, so it was kind of like an extra feature that I didn't need. The initial paces for releases of pipenv was really insane. Like, it just seemed like everything was just shifting out from under your feet in the beginning. And then it's kind of slowed, like projects as they mature tend to do,
14:20
but the last release was like eight months ago and at some point you start to wonder, you know, again, they probably have good reasons. It's an important project, you know, for people and they don't want to break things, but it's a long time for there to not have improvements. It's opinionated, which is fine, but it's opinionated for me in ways that don't fit my mental model or workflow.
14:42
I feel like they took on a lot. They kind of over promise. For me, they under deliver a little bit and it seems like we bit off a bit more than we can chew and are kind of under the weight of all of that. Again, that's my assessment. Try it for yourself. Make your own, reach your own conclusions.
15:02
So, poetry is a similar tool. It keeps your dependencies and virtual environments up to date, fast, and for me, more reliable dependency resolution. It manages virtual environments, but only if you want it to. For me, it didn't get in the way. It uses the new pyproject.toml format.
15:24
Unlike the other tools I mentioned, it does not rely on setup tools. It can also, unlike the other tools, can build and publish packages to PyPI. So, if you are managing this particular project, you're also using it to, if you're also publishing it to PyPI,
15:42
you don't need an extra tool at this point to do that. I feel like the project is managed very well. Some PRs get rejected because the author is trying to keep the core manageable and I really respect that. One thing that some folks might run into, you cannot install into a specific
16:02
virtual environment. You can't say like pip install dot dot virtualenv foo. You can only install into an activated one or into the default home wherever you want that home to be. Another thing to note is that users will need pip 19 or higher to install packages that are built without setup tools.
16:23
I also noticed that poetry generates a setup.py file. But there's not much documentation as to why. I assume it's for backwards compatibility, but I'm curious to know a little bit more about what that's about. Only pure Python wheels are supported, so if you're trying to build anything with C code,
16:42
this is not the tool for that just yet. It doesn't, when you build, it doesn't, it increments your version string in the pyproject.toml file, but if you have version strings in other files, you'll have to manually track those down and replace them. There's only main and development environments, so there's nothing like, you can't say I want, you know,
17:03
a specific dependency section for testing or for releasing new things. There's some really good features in it marked for the 1.0 release, including a plugin system, which would help give a home to the, you know, rejected pull requests, you know, and still keep the core as lean as possible. I feel like that's a sensible way of managing things.
17:25
There's per project configuration coming, which could be used, something like this, to specify a virtual environment on a per project basis, giving you a bit more control over where your virtual environment is.
17:41
And also, incrementing version strings is also on the roadmap as a nice helper when building things. So, in summary, I feel like it's managed well, it's updated regularly, it fits my mental model and workflow. This is a newer project with an sort of interesting choice of name. It's like, you know, dependency hell, whee!
18:03
It's like a celebration of that thing it's curing. So, they're requirements.txt, pip file, poetry. They really seem to be like, we're gonna support every piece of this ecosystem. They audit packages for security vulnerabilities.
18:21
It has like a pipx-like ability to isolate command line tools and isolated virtual environments. It's trying to do a lot. It claims to be like better than all other tools, end quote, which to me always gives me pause, but it's a new project and I haven't given it a full run through, so
18:40
something to have on your radar, perhaps. So, at this point, I want to talk about a related topic, which is release management. Once you've managed your dependencies, there is another step towards zen-like enlightenment, because getting something that you're, getting your project onto PyPI
19:01
has, there's simply too many steps and none of them are fun, and all of them are tedious. You generally go through your get commit history and you start putting bullet points into your change log. You start manually updating version strings, sometimes in more than one place. You then commit and push those changes,
19:25
testing, building your package, publishing your package. There's just a lot of steps and usually another thing that people run into is only a subset of folks with commit bits usually have the ability to publish new packages on PyPI. Sometimes it's only one or two people.
19:46
So this is a part of an open source project, ReadMe, that documents how maintainers can release new versions of this particular project. This screenshot shows five steps. There are 14 in total.
20:01
The funny part is this project's purpose is to automate GitHub releases. So like I thought that was really interesting irony, but this is totally normal. The list for some of the projects that I maintain is even longer. So at some point, like did anyone see the Chernobyl TV series? This is how I feel like. So you have your clipboard and your list of steps and you're just hoping that you don't
20:26
miss a step. You don't push the wrong button. You don't end up breaking the package for potentially a lot of users. And so it's a little bit stressful and a side result of that, whether it's
20:42
conscious or not, is that we don't do it as often as maybe we want to or should. As a personal anecdote, I once noticed that a year and a half had gone by since the last time I issued a published release on PyPI and I was horrified to realize how much time had gone by.
21:01
You know, because at this point I feel like I'm failing at my job as a maintainer. It's a volunteer unpaid job, but it's still something where I feel responsibility to people and I don't want to let them down. So it's not good for maintainers because it's stressful. Well, it's also not good for users because you have this slow release cadence.
21:23
And bug fixes and new features are sitting in the master branch. Hardly anyone is benefiting from them because they aren't in a shipped release yet. You know, PyPI account owner is on vacation and some critical bug fix gets merged by another maintainer. Well, it doesn't matter. You can't get it into a shipped release. So you now have this critical bug that people are running into.
21:47
So there are sort of bespoke custom ways of automating this. And so you can use continuous integration to sort of take continuous integration one step further.
22:02
So after it runs your tests, it can then say, okay, well, let's figure out a way to publish this and you can automate it and have that so that it's not as manual and error prone a process. So I want to explore one way of how that might work. One way of doing that is by auto publishing releases upon a PR merge.
22:22
So in this context, the pull request has to include a release file with two bits inside. One is the release type, major, minor or patch. The other is a changelog entry, description of the changes in that pull request. So the maintainer looks at the pull request and says, okay,
22:40
tests are included, docs are included, code looks good, the release file is there, merges it. At that point, the continuous integration system can look for the release file and grab the major, minor version, the designation, increment the version. It can then take the description, prepend it to the changelog and then run the equivalent
23:05
of git add, git commit, git tag, git push, all of that and publish the release to PyPI. So there's some real benefits to doing this. With almost no human input, every code contribution results in a new release in a matter of minutes.
23:22
Every feature and bug fix gets its own release without anyone having to remember to package and publish a new version. If a bug is found, it's now much easier to trace it to a specific release version. And of course, you don't have to use this, if you had the system in place, you could also issue releases manually at any point.
23:45
But my favorite part about this notion is that all contributors get to issue their own releases. What better way is there to welcome new contributors than to reward them with a dedicated release that's composed entirely of their work?
24:02
I'm not saying it's right for all projects, for some it may not be a good fit. If you maintain a library that is depended on by critical network infrastructure or services, maybe this isn't a good fit for you. Maybe you can figure out ways of making it work.
24:21
Some maintainers may think, well, I don't want to see this release history clutter, where every tiny little fix here and there results in this long list of releases. And it's true, even something as minor as a typo fix gets its own release in this model. But I would encourage people who have this reaction to it to really think about it.
24:45
Would that be so bad? Is that a serious problem? Is a tidier history really worth sacrificing all the other benefits? So around this time, I was trying to solve this conundrum. I came across an article that describes this type of solution,
25:04
and Hypothesis is a property-based testing library. They did a really nice write-up of how they arrived to this and how they solved it. And then sometime afterward, I noticed that Patrick Arminio,
25:20
who has a Python GraphQL library, was looking to do the same thing, and he asked his friend Marco if he could figure out a way of adding that same thing for Strawberry. And so he did that. He connected it up with CircleCI, and I really liked the simple, elegant approach that he took.
25:41
And rather than taking the bits and customizing them and then copying and pasting those across multiple repositories that I manage, I thought it would be great if I could just use one tool and just pip install it into these different projects. And so in that way, other maintainers could benefit from this as well.
26:01
So I called Marco, and I said, hey, I generalized this a bit. Is this something you want to work on? He said, sure, if I have time, I would be totally up for that. And so I took his code, I added some more, I put it into its own GitHub and PI package, and I just pushed it last night, as a matter of fact. And so to see how it could potentially work,
26:23
this is a bit of configuration for CircleCI, and it's not obviously the whole thing. This is just the deploy step. And so you can kind of see, I know that the type is really, it's on a black background and it's super small, so you may not be able to see in the back, but essentially it runs through some of the normal steps
26:43
that you would take using CI steps, you know, modifying permissions, installing packages that you need, but it also then uses this new tool to do things that you would normally have to write your own scripts for, checking for the release file,
27:01
preparing the code for this new release, creating the commit, getting the commit into GitHub, getting the GitHub release created. So it's still very, very early stage. Feel free to check it out. All of the good bits in it are Markos. All of the terrible broken bits are mine.
27:23
There's lots of room for improvement and would be very interested in any input or contributions to make it more flexible. It has, you know, very limited utility in terms of scope at the moment. It's meant for using CircleCI. It's meant for people using poetry.
27:41
That could easily be, or at least somewhat easily, broadened to do a little bit more. So I'm interested to see what people can do with some of these new improvements to the overall dependency management and packaging ecosystem, because the overall goal is to make it easier
28:02
to use helpful software that other people have written and to share the stuff that we do and to do that as frequently as possible. So with that goal in mind, I hope that you found this overview of dependency and release management to be enlightening.
28:20
If you have any questions about this at all or just want to chat, please come up and say hello. I would love to talk to you. Thanks very much for coming.