We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

You are sharing your code wrong (and what to do about it)

00:00

Formal Metadata

Title
You are sharing your code wrong (and what to do about it)
Title of Series
Number of Parts
131
Author
Contributors
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Everyone who writes also distributes Python code. The only reliable way to share Python code is by packaging it, any other way hurts your consumers. Packaging can be an intimidating topic most would rather avoid but following just a few best practices of packaging can make your code much easier to share, even without going through the process of uploading to pypi.org.
CodeCodeComputer programmingBitComputer animation
Inflection pointCodeComputer programmingRandomizationShared memoryData managementIntegrated development environmentContinuous integrationWeb serviceLocal ringVirtual machineMultiplication signComputer animation
Point (geometry)Uniform resource locatorLetterpress printingFrequencyPoint cloudLine (geometry)Module (mathematics)Computer fileComputer fileComputer programmingIntegrated development environmentModule (mathematics)Multiplication signProcess (computing)Library (computing)Installation artInformationStrategy gameTheory of relativityMaxima and minimaCausalityRootDot productUniform resource locatorCodeServer (computing)File systemLaptopWebsitePhysical systemScripting languageBitInterpreter (computing)Product (business)Error messageConvolutionAbsolute valueComputer animation
CodeShared memoryTape driveDegree (graph theory)Vector potentialCodeMachine codeProcess (computing)Multiplication signTerm (mathematics)Computer fileWordIntegrated development environmentComputer animation
CodeScripting languageStatement (computer science)Standard deviationDecision theoryProgramming languageSoftware developerShared memoryVirtual machineProcess (computing)Multiplication signWeb pageCausalityAxiom of choiceLibrary (computing)View (database)State of matterNumberStrategy gameOpen setComputer animation
Point (geometry)Computer engineeringUniform resource locatorFrequencyPoint cloudComa BerenicesScripting languageLetterpress printingPrime number theoremSimulationTrigonometric functionsDecision tree learningFitness functionMetadataLibrary (computing)MiniDiscShape (magazine)Revision controlStandard deviationQuicksortShared memoryComputer fileScripting languageDirectory serviceTopostheorieCodeProcess (computing)Data structureProjective planeDescriptive statisticsIntegrated development environmentValidity (statistics)Different (Kate Ryan album)Virtual realityMultiplication signDistribution (mathematics)BitDemosceneRight angleData managementPhysical systemInformationComputer clusterComputer animation
Point (geometry)Uniform resource locatorPseudodifferentialoperatorScripting languageLetterpress printingFrequencyPoint cloudCodeAxiom of choiceFormal languageFront and back endsProjective planePhysical systemBuildingComputer fileMetadataLine (geometry)Library (computing)EmailTable (information)Revision controlMereologyMathematicsDescriptive statisticsHash functionComputer animation
Abstract syntax treeInstallation artSlide ruleCodeEmailVideo trackingDesign of experimentsInstallation artScripting languageProcess (computing)MiniDiscMultiplication signFrustrationProjective planeComputer fileMassStandard deviationQuicksortShared memoryException handlingPower (physics)Module (mathematics)MultiplicationMereologyDirectory serviceLibrary (computing)NumberPoint (geometry)Integrated development environmentCodeFile systemDegree (graph theory)Repository (publishing)Plug-in (computing)Error messageGame theorySoftware developerMathematicsSoftwarePhysical systemComputer animation
Successive over-relaxationRoundness (object)Multiplication signComputer animationLecture/Conference
EmailVideo trackingCubic graphBridging (networking)Computer animation
Transcript: English(auto-generated)
Before anyone leaves the room today, I hope to tell you, if you are in fact sharing your code wrong, what you can do about it before we leave. So I have a bit of a bold claim to make to start this off, besides you sharing your code wrong.
That is, anyone who writes code is already sharing their code. And I think probably everyone out there today is writing code. That's probably why I chose to attend the programming conference. So, I assert that all of you are already sharing code. And so some of you might be sharing it wrong today.
Maybe you're not quite convinced, you think you don't yet share code, that's why you came here today. So you could learn how to share it. You tell me you don't post your code online for random people to come and collect. Maybe not. You don't meet up with people you've met at a programming conference to share code snippets.
Say you don't have any colleagues who are technical or interested in your code. None of your friends are interested in your code. But you're probably sharing code with yourself, if no one else.
I think we all share code with ourselves. Most of all. Have you ever tried to make your code installable to put it in a Python environment? Have you ever used an environment manager like these or others that take care of setting up, tearing down, keeping up to date the environment? It's still keeping that busy work out of sight, but it's still happening, still moving your code into these environments.
If you use automation of any kind, continuous integration or web services or even something local. If you deploy, again using a web service or locally.
Maybe you use Docker and maybe that Docker container never leaves your machine. You're still moving your code from where you created it into this new environment. Sharing your code doesn't require that the code leave the machine that you wrote it on.
And it doesn't even require another person. It often will involve two people or two machines, but it doesn't have to. So if maybe I've convinced you that you are already sharing your code, then how are you doing it today?
You must be taking your code that you spent a long time carefully crafting in a safe little environment. And now it needs to work somewhere else. How do you get it there and have it still work? If you have some Python code and you decide it's time to put it into production.
Someone's asked to share it. What do you do with it? Do you just hand the person the literal code? What are they supposed to do with that? Do they paste it into running Python interpreter? Running Jupyter notebook? Probably someone saves it as a Python file.
This is very common. You give it a good name, you give it the .py ending. And then you can hand that file to the user. To the next person to use the code. But what are they supposed to do with a Python file? I mean it probably doesn't belong in the downloads folder.
Probably what you want them to do is put it into a Python environment. But were they supposed to create a new one? Did they already have one active? Was it supposed to go on the user site? Or system site packages? You know where they know. Maybe you just tell Python where it is. Absolutely. And it can find it.
Sure. But it won't be able to import it necessarily. Not everything on your system is importable. So maybe you moved where download folder is. And then you can run Python. And it can find it as an import or an executable module. And this works for a bit.
But actually your script also had some requirements that need to be fulfilled. So you tell your user, Hey, you're also going to need urllib3 and you're also going to need Geopie. So that's great. What's the user supposed to do with that information? Presumably what you want them to do is install the libraries into the same environment that's going to be running the Python code.
Do you care if they pip install it or if they con install it? Were they supposed to ap install Geopie? Do you remember how you did it? Maybe if you wrote it down in the file you could remember and you could tell them.
And this works too. You sell it to the user to figure out what environment to put it in. And what happens when the program grows and now it's four files. Now you're handing four things out that can get convoluted.
Maybe you zip it up for your user. Now you just unzip it. Or not. The neat thing about zip files is Python can actually import code out of a zip file if you put it on path. It's really cool. It means your user doesn't have to unzip anything.
But it doesn't mean that they have to update their Python path every single time they want to use their code. So now maybe you're teaching your user about updating their bashrc or zishrc. Or maybe you just tell them to just stuff the zip file in your sys.path every single time you want to import this library.
These are all ways of sharing code and I've seen them all be used before. They can get the job done. They get the code where it needs to be and it works. At least for a while. It works until the first time that you forgot to update your Python path with the random location you saved the Python file.
Or you remember, put that file on path but not its sibling module. Or you were using a package import. A dot import. And then you handed your user a bunch of Python files. And just handing a bunch of Python files to someone does not make it a package.
Or maybe you tried to use a relative import like a relative file path. Python can't just walk around the file system arbitrarily. So this doesn't work.
These sharing strategies can get the job done. But we're kind of just doing the bare minimum and then when an error comes up we fix it. Which is a great way to solve problems but if this is the way we keep going we're not really looking at the root cause of why doesn't the code that we wrote and works great for us in our environment work when we share it with someone else.
What's wrong with that? We can't just have this code work wherever we deliver it. And so far we've kind of just been focusing on getting the code out there. Someone asked to see our code.
We need to work on another server. So what can we do to just get it out there? And it's a pretty low effort for us. But it's actually a huge trade-off. We're putting a lot of burden on our user to figure out how to actually get the code that we threw at them to work for them.
So I believe that sharing code is a three-step exercise between two parties. And again you can actually be both ends of this. You can be the sharer and the sharee. Three steps to this process which is you take the code from where you had it working but it's not where it needs to be running.
So you wrap it up into something you can move around. Ideally just one thing, one file. And then you put it where it needs to actually be running. And then you unwrap it. Which is putting the code back together.
But it's also putting the environment around the code back together. To such a degree that it will run again. And these three steps aren't entirely disconnected from each other. If you do the minimal wrapping, you just put a little duct tape around the code and you pass it off to your user.
That's a lot of work for them. And a lot of struggle and a lot of potential failure to unwrap that code in such a way that it will still work. But we can shift the burden the other way around. If we spend more time wrapping up our code nicely and delivering it carefully, the unwrapping experience can work much more often.
It can take a lot less effort. It can even be a nice experience to install. And so we take that time, we will not see so many of these failures on the other end of the pipeline.
And the same pipeline can be described, maybe in slightly more technical words, as wrapping up your code, packaging it, delivering it, distributing it, and unwrapping it and installing it. So I will sometimes use both these terms.
So maybe a little less controversial statement than my opening one. Is that what users of Vue code want is for that code to run. Seems very obvious on the face of it. But as soon as you've convinced someone else that your script is really cool, they want to try it out.
That your library is super important to them and they need it to do their job. Or you yourself need your executable deployed on a new machine. Once they've decided to use your code, the next thing they'd like to do is to use your code. What they don't want to do is stop, switch gears, go to your readme, try to find your
install instructions, and wade through two or three pages of here's how I got it running last time. The user of your code doesn't really care how you decide to wrap up your code or how you decide to deliver it to them. I mean they do care in that they don't want this process to distract them or stop them or frustrate them.
So decisions you make that cause failure or cause confusion to your users are going to stop them. Maybe because they got too frustrated or just they couldn't possibly move forward.
Now you might initially not care either how the user gets your code, but one of you is going to have to think about it. So if you don't decide on the full journey of your code from where you wrote it to where it needs to run, then you're just pushing that burn on to your end user.
And now you're not just giving them code, you're also giving them a fun little problem to solve along with it. So another kind of obvious on the face of it statement, that I believe the best way of sharing code is the one that works the first time, every time.
I mean who doesn't want their code to work every time they try it? That seems kind of obvious. But this is an idealized state. It's not going to work every time, but it's where we should strive to go to if we're sharing our code. So there's a couple of strategies to get to this ideal.
If we want to have less failures, then we should try to reduce the number of steps for sharing code. Make your install story as short and simple as possible. If the step is not there, then it can't possibly fail.
Also I'd say don't or put as few choices as you can into your sharing instructions. Every choice that you make your user decide on is a chance for them to choose the wrong thing. And even if they choose the right thing, they had to stop and think, and you're just adding distraction to them.
And similar to adding distraction is you should try to use familiar tools, best practices standards, because your user again doesn't want to stop and learn a new tool. Learn a new programming language that is required to install your tool.
They just want to get on with the problem that they're trying to solve. So if users are familiar to them, that's much less distracting. If you're sharing Python code with a Python developer, they probably know what pip is, how it works, and maybe even how to solve problems with it.
But they won't know that with some novel tool. So this is the code that we tried to share originally, and we got the job done with a couple of files. But this time we're going to try to simplify the process for the other side of the pipeline.
So it was already pretty simple, but if you're sharing two files, the only way to make the distribution of that simpler is to share one file. And the user had to then install these three things into some sort of environment,
which, you know, can't get too much simpler than that, but maybe install one thing. So instead of two files, maybe you could just have one file. You just have a Python file that has a Python code, and everything about the environment is just a comment.
And the only thing simpler than installing three things would be to install nothing, I guess. So what if you could just run this script? Or if you could just maybe tell an environment manager, everything you need to know about managing a script is written inside of it, so just look there.
These two things are actually not just wishes for this talk, but these are both possible right now. Thanks to new standards and packaging, the file I showed you a couple slides ago is entirely runnable without being installed, and with having no other information on the system with inline metadata.
So this is very exciting to me, at least. You only need one file, so the wrapping story for this is simple. It's already saved as one file. That's one thing to deliver, and the unwrapping story is basically pointed toward it.
You don't have to move it anywhere special or install it, or set up a virtual environment, as these tools handle that for you behind the scenes. Being one file is a big advantage in that way, but it's also a little bit of a restriction in that these sorts of Python files can never grow beyond one file.
That's not how the inline metadata works. This is also a very new packaging standard, actually just came out in January of this year. So the two tools I showed you do work, only for the last version or two.
So you have to make sure you're not using anything too old, and other tools that might know how to run other Python code probably don't recognize this sort of metadata yet. So, in the beginning, the sharing story sounded very easy. All you have to do is run a command, and not even install.
But it gets a little complicated when you have to add instructions about, they must also have this other tool besides Python, and it must be on a version that came from this year. So, still simple, but this doesn't fit all our problems.
Especially as these things are not installable, at least not yet, so you can't make libraries out of inline metadata. So it was a good try, but we are going to have to go and actually restructure our code as a package. That's going to be the way that we can best share our code in whatever shape it takes.
And restructuring a project is mostly changing its layout on disk. Python projects have a special structure when it comes to how we save the files. They all have one file called pyproject.toml, where we write all the details about, human details about the code, its description, its name.
Environment details, I guess dependencies, and anything else. And every pyproject.toml is associated with either one Python file, or just one directory of Python files.
That one top-up directory could have children directories, but just one at the top. And then if we have any directories of Python files, they all must contain an under-under and an under-under.py.
If you don't know, to Python there is a very big difference between a directory of some Python files and a directory of some Python files, one of which is an init. If it has the init, then the directory itself is a valid target for import, and it makes all its children a valid target for import, even if they're not on Python path.
Otherwise, it doesn't. You can't import the directory, and Python doesn't see any relationship between those files, even though they're next to each other. So it helps us stitch together our project.
So back to our last attempt to share this code. It worked pretty well, but it's not going to grow with us. So to move this into a project, we're going to take the same file and we're going to chop it in half again. Python code stayed in the Python file. And the metadata or description moves into the Toml file.
The dependencies line actually doesn't change at all, except we drop the leading hash. But it goes into the project table or the project header in Toml, along with two other pieces of data, the name and version for your package. And this will be used to target your code for an import, or to mark it as
a dependency of yet another package, or to see things that are installed in your current system. And the one other piece of data that every Python project has to have in their Toml file is a build system.
This is kind of an unfortunate part of Python packaging, is that every package must write down exactly what its build system is. And every build system in Python world is a third-party library. There is not one that comes with the Python language, so you have to
go out and make this choice and it can be confusing and overwhelming and frustrating. But you have to just pick one. I'm sorry. But the secret for build systems is that if you're working with one Python file or a few Python files that are not doing anything outside of Python land, your build backend doesn't really matter.
If you've used any in the past, you can just keep using it. Or if you've been recommended one or want to try a new one, they're all basically going to work with pure Python files. So we used Hatch because it supported our inline metadata. It comes with a backend called Hatchling, so I wrote it down here.
So this is our project on disk. We split it up. It's now two files instead of one. But as I wrote it out, it is a full and complete project, and it was built.
It doesn't do anything more than it did as a script, and we've doubled the number of files, but we sort of laid the groundwork so that we can grow. Now when we amass more code and we are good stewards of it and we factor out more modules, we can add them here, and it's still one project.
Except that the pyproject.tunnel can't be associated with multiple Python files. So we stick it all in the directory, and then remember that every directory has to have an init. Now, our project has grown and changed, and we've been able to keep it as a single project.
And we actually didn't have to edit pyproject.tunnel at all, assuming that those dependencies haven't changed. So let us grow our package however we need to. But it's also grown. I thought we were trying to make the sharing experience easier, but now we
have more files than ever, and we're starting to add layers to the file system. Well, this hasn't really necessarily been part of the sharing story. This is groundwork that has to be laid before we even get back to the wrapping step. So now we have a proper project, we can again wrap it up, or build it, or package it into a wheel.
So we can hatch build, but as soon as you lay out your code as a project, any build tool that knows how to talk Python would know how to take your project as it lays on disk and turn it into a single file called a wheel.
And a wheel is basically the de facto standard for wrapping up Python code. And so it's not tied to hatch or any other one tool, it's just Python standard.
And once you have code that can be built into a wheel, it gives you a lot of power. For one, it gives you one thing to deliver, which again is about the easiest thing that you can do for the delivery part of your sharing story. Also, since a wheel can wrap up anything that's Python code, it can wrap up anything you can do with Python.
It can be a script, a library, it can be a plugin, a GUI, a wheel can wrap it all. And it even lets you publish to a package repository, which is a way of actually going just one and a half steps of the sharing journey. Where you push your code out towards your users, but then they come to the back of your repository and they get to deliver your code the final mile.
And a wheel also gives your users a lot of flexibility. Being a standard that is very well adopted at this point, it gives them freedom in how they want to install your package.
As soon as they have the wheel, you might have told them, hey, you need to pip install my package. That's great, they could pip install it, but maybe they decided, you know, I've been UV installing everything.
They can UV install your package, they don't have to tell you, they don't have to come to you and ask you to repackage your thing in a different way. They all just work with wheels. As long as they hold on to your wheel, they can again share your package because they might want it in more than one environment, or they might want to share it with someone else downstream.
They don't have to come and ask you to reshare it. And this is really important because, I mean, we got to this talk and I sort of put it out there that all of you are already sharing code to some degree.
So why would you go through all this effort, all this work to get to the same point you're already at? You were already sharing code and it was probably working for you. And now I'm asking you to do packaging, which is a huge effort. It's frustrating and confusing, it's a constant burden on you as a developer.
Why would you do it? Well, our journey for sharing is wrapping, distributing, unwrapping. But that first step you get to do once, and then you never have to do it again, as long as the code doesn't change.
Every time you want to install it in some place new, you have to re-unwrap it, reinstall it. But you never have to re-wrap it. It's kind of the really nice thing about software. So if you put a lot of effort on the one side and you can reduce the frustration or the errors on the install side,
it's not just one time gain, it can be 100 or 1000 or 10,000 times. And so if you put in all this effort, you're going to save your users a lot of frustration, a lot of problems. And remember that you're probably one of your biggest consumers of your own code.
So you're going to be saving yourself frustration, and when you come back to your code after six months, if you didn't document how to install it with some custom multi-step process, how much time are you going to spend to figure that out?
So packaging can be a burden, but it's one that really pays dividends if you put in the work. And that's all I've got. Thank you. All right, thanks so much to Jeremiah for this great talk.
We've got about two minutes now for Q&A if people have any questions. You just go to the microphone in the middle of the room. I know it takes time to build the packages for those questions. Okay, so all right, so let's give another round of applause and thanks so much to Jeremiah.