There should be one obvious way to bring python into production
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 160 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/33798 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
EuroPython 201727 / 160
10
14
17
19
21
32
37
39
40
41
43
46
54
57
70
73
85
89
92
95
98
99
102
103
108
113
114
115
119
121
122
130
135
136
141
142
143
146
149
153
157
158
00:00
Coma BerenicesSoftwareIntelElectric currentTerm (mathematics)Moment <Mathematik>FacebookProduct (business)Point (geometry)View (database)PredictabilityComputer animationLecture/Conference
01:28
Software testingIterationIntegrated development environmentLocal ringVirtual machineDirectory serviceSystem programmingRevision controlRepository (publishing)BuildingAutomationCondition numberFeedbackComputer fileConfiguration spaceLevel (video gaming)Compilation albumMathematicsSoftware testingCondition numberPhysical systemBuildingProduct (business)Directory serviceRepository (publishing)Server (computing)Integrated development environmentRevision controlControl flowQuantum mechanicsSoftware developerIterationGoodness of fitFeedbackExecution unitProjective planeWordEmulatorLaptopWeb crawlerInteractive televisionPower (physics)Digital Equipment CorporationCompilerVirtual machineOrder (biology)Factory (trading post)CASE <Informatik>Field (computer science)CuboidSchrödinger's catUnit testingCycle (graph theory)Extension (kinesiology)View (database)
08:30
Integrated development environmentPhysical systemCloningSoftware testingCompilerDisintegrationServer (computing)Scheduling (computing)Software developerCloningReal numberProduct (business)MereologyPhysical systemSoftware testingDatabaseLaptopPoint (geometry)Goodness of fitContinuous integrationInformation securityLevel (video gaming)Different (Kate Ryan album)MathematicsCompilerLogic gateBranch (computer science)Right angleControl flowFreewareHuman migrationPoint cloudStreaming mediaInternetworkingIntegrated development environmentService (economics)BuildingGene clusterMiniDiscRevision controlProcess (computing)Server (computing)Condition numberDoubling the cubeRow (database)
15:27
Software developerServer (computing)Software testingRevision controlSoftwareComponent-based software engineeringImage resolutionPhysical systemFile formatData managementSoftware frameworkFormal languageInformation securityOperations researchProgramming languageSpacetimeMiniDiscBand matrixFlow separationIntegrated development environmentSource codeRevision controlImage resolutionSoftwarePhysical systemProjective planeMultiplication signCartesian coordinate systemInstallation artDivision (mathematics)MiniDiscBitSpacetimeProduct (business)Data managementRadio-frequency identificationDirectory serviceAngular resolutionOperator (mathematics)Universe (mathematics)Structural loadComponent-based software engineeringStaff (military)Formal languageStudent's t-testProper mapNumberSoftware developerDecision theoryStandard deviationMoment (mathematics)QuicksortSet (mathematics)Group actionFile formatIntegrated development environmentConfiguration spaceDifferent (Kate Ryan album)WaveCodeStability theoryStack (abstract data type)Crash (computing)Server (computing)
22:25
Electric currentIntegrated development environmentSource codeSoftware testingRepository (publishing)Binary codeSoftware developerStandard deviationPhysical systemImplementationServer (computing)Software repositoryInstallation artBuildingFiber bundleSoftware maintenanceData managementComputer-generated imageryScripting languageSoftwareFormal languageIndependence (probability theory)Kolmogorov complexityProcess (computing)Image resolutionInformation securityJava appletPerformance appraisalMereologyVector potentialFloating pointCodePatch (Unix)Library (computing)State of matterMultiplication signPhysical systemLaptopData managementSource codeProduct (business)Standard deviationRepository (publishing)Revision controlImplementationWebsiteProper mapForcing (mathematics)PlotterFormal languageVirtual realityMedical imagingInformation securityImage resolutionCartesian coordinate systemDivision (mathematics)Traffic reportingGastropod shellKernel (computing)Complete metric spaceVulnerability (computing)Server (computing)Computer fileRight angleVirtualization
28:24
BitFormal languageFile formatDivision (mathematics)Kernel (computing)Data managementMultiplication signMedical imagingPhysical systemFlagGoodness of fitComplex (psychology)OSI modelLecture/Conference
Transcript: English(auto-generated)
00:04
Welcome, I'm really glad so many people here. Maybe some wanted to see the Facebook talk, then you were in the wrong room, so maybe you have to switch. If you're really here for that talk, I'm really glad, thank you. So, yeah, it's a long title, and so a brief agenda, what we want to do today.
00:25
So first of all, we have to settle some of the definitions, so I want to give you a brief overview of things like what is a delivery pipeline, what I understand about this, about dependencies,
00:40
maybe some packaging, and then I want to do a small walkthrough through the different possibilities of how to bring Python into production, and I will also give some pros and cons. Yeah, and in the end, I mean, all of you, of course, want to know the one question,
01:00
so what is the obvious way? So, yeah, first of all, I want to give some insights. I think what the future might look like, this is my very personal point of view, but yeah, it's difficult to do predictions for the future, we all know this,
01:20
and then I want to do a small discussion, so I want to know what you think, what the one and obvious way should be. Okay, so what are we talking about and why? So here I have a small picture just sketching out what a thing is, what we call delivery pipeline,
01:43
so who here has heard of this word before? Yeah, some of you, okay. So, yeah, so it means how do we bring things we develop in the end into production, which means we deliver something to our customers, whoever it be, whatever it will be, it doesn't matter.
02:03
And here we have some stages, and I will go through these stages, so there's development, some building, packaging, testing, staging, Q and A, and then in the end, it hits production. Of course, not every, each of these single steps is always present in each project.
02:20
There are small projects, maybe they don't have any Q and A, but that's more or less the general view, how such a pipeline looks like. Okay, so let's start. We start in the beginning, also how a normal delivery looks like, we start in development. So we have some requirements there.
02:41
We all know we are, most of you are developers, I think, so we know what we want to have in our development stage so we want to have fast iteration cycles, we want to change something, and then immediately get something, get some feedback on whether it's good or bad. And I would say that's a hard requirement, we need to be able to run our automated tests
03:03
in our development environment. Maybe not all of them, but at least we all noticed some unit tests, things like this, so yeah. Maybe a nice-to-have thing is that we have some kind of an emulation like the production system, so maybe we are able to start our Flask server,
03:23
or things like this, or maybe we have a small Postgres running like in production, but this is already a nice-to-have. For some projects, it will just not be possible to really have a production-like system on your laptop. And we also have some risks here,
03:40
so we all notice, oh, what happened? It works on my machine, so this is the typical problem we all have on the developer's box, everything looked fine, but in the end, production breaks. And this also happened to all of you, I think, so everything went really well, but we forget to commit the very important file,
04:02
so in production, it is missing, so this is also, we have some kind of a dirty working directory. Okay, so then the next stage would be building and packaging. So of course, you would say something like Python,
04:21
we don't have a compiler, but that's not true, so they are extensions, C, Cython extension, things like this, so in general, there will always be some kind of step where we do a compilation of things, so we build our artifact. What would be required here?
04:40
So I would say a very good best practice is we want to build this artifact just once, and then use it everywhere. Yeah, there should be also, of course, there should be a possibility to compile for our target system, which means mostly the production system, but also maybe for the target system,
05:01
so we need some kind of, so either our build environment is really similar, or at best, it's really equal, or at least we have to find some ways to do cross-compiling or whatever. And another thing is if we have an artifact
05:20
and it has a version, it should be unique, so if someone in the company says, I have version 1.02, it should be really unique thing. There should never, ever, somewhere a tar file or whatever with the same version, but in the one, there was this one config item missing, and in the other one, we compiled it in,
05:41
this should never happen, so this should be just unique, and I would say that's purely required. If this is not the case, then you should at least think of how you can get there in the near future. Then here, the nice-to-have is something like an artifact repository for Python.
06:01
Maybe there is DevPi, or there's artifactory Nexus, whatever it is, so it means we have a server where we can put our artifacts, and everyone in the company can just pull off these artifacts which we built in this build step. Of course, the risks here, if we have this artifact builded,
06:25
it might, of course, be that there are bugs, so maybe there's some misconfiguration, maybe we have some dependencies missing, whatever, so this is the risk what we have here. So, the next step, testing, so we could argue maybe testing comes before,
06:43
building the artifact, or maybe, yeah, some other ways, but I would say, in principle, because we want to use the artifact and test this artifact, it should be after the build step. So, what is required here? It should be automated, and we heard this in the talk of Martelli,
07:02
with the testing layer, so tests that are not running automatically, we don't call it test, we call it another development step, maybe, or whatever, so if it's just someone saying, ah, I clicked through it and it looked good, this is not a test, this is something else, but it's not a test. And the conditions in these tests,
07:22
they should be as near to production as possible, and I mean that's clear, so in the end, we want to test if production will work out or if it breaks, so it should be just very similar. We want to have reproducible conditions there, so not that at 12 o'clock it fails,
07:40
but on one o'clock it works again just because there is some condition changing. And that's more or less the quantum mechanical thing, so maybe you know Schrodinger's cat, if we measure something, if we test something, we always interact with the system, but these interactions, these changes we do in our thing
08:05
we want to test, they should be as minimal as possible. So for example, a thing here is that you install PyTest, and PyTest brings in a requirement, which is in the end missing in production because you didn't write it into your requirements file,
08:21
because in the test it was present because PyTest installed it, but in production we don't have PyTest installed, so it breaks, this would be a change which is happening just because we tested. Then here, nice to have, it should be as fast as possible, just that if we commit a change,
08:43
we want to have as fast feedback, it breaks or it doesn't break. But of course, there are tests that run maybe for several hours just because it's a matter of fact, then it should be as fast as possible. And also nice to have, if we are able to really test all the tests we have
09:04
for all commits, not even just on master, but maybe if we have feature branches or even short-lived branches, we test on every branch or pull request or whatever. The risks here, of course, I mean, it's hard to read.
09:21
The tests test the test environment, but not production. That's obviously the thing, so yeah. We know a pretty famous example of this, the VW diesel gate thing was exactly this. So the test was measuring that the test has the right conditions,
09:43
but in production, it's a complete different thing. So this is exactly what is here. Maybe we don't want to do it in purpose, like some people in VW, but it might happen. Okay, so the last step here, before we hit production,
10:01
is some staging or quality assurance requirements here. We have an automated deploy, so this is now the different. We really test how we install it on our target systems, and it should be really a production-like system. And we don't want to have any changes for this staging.
10:25
So for example, with the pytest example, in this stage, we do not want to have pytest installed on this Q&A system. So it should be really a production-like deploy. A nice step here, if we can afford it,
10:41
we just have a real one-to-one clone of the production system. If we are able to afford it, good. If not, maybe it's good enough to have some smaller servers, smaller databases, but in the end, it should be really as production-like. If we have a real production-like system, then we also would have the possibility,
11:01
for example, to do a B test. So for example, 10% of our customers already hit this new Q&A systems because that's the best test we can have. So we then are pretty, pretty sure it works for everyone if it works for 10% of our customers, but the risk is very small.
11:20
It just hits 10% of our customers and not 100. The risks here, maybe that's not so critical anymore. We have cloud, we have things like this, but 10 years ago, these systems were always purely maintained. There were things like, oh, we forgot to update the last database migrations,
11:41
and it's running a very old MariaDB, and it's not really working. We don't have installed all the dependencies there. So this, of course, should not happen. So it should really be as good maintained as the real production system. Because in the end, it's part of this value stream, what we do, development and so on.
12:02
So there is really no, yeah. We cannot afford to say, okay, it's just the staging system. If it's red, it's not that bad. That should not happen. Okay, then we hit production. So here are some hard requirements. I know that some of them might not be present
12:21
for every one of you who is running a production system, but I would say, same thing here. Maybe you should fix it as soon as possible. No compiler in production. This is just a matter of fact. So we cannot afford it. It's a security nightmare. There is no compiler.
12:41
No internet at all. So there is no point in reaching PyPy, whatever. This might be the hardest thing, but I mean, we can just say this is a requirement, and if we can afford it to have it, then it's really nice. We exclude a whole bunch of security issues
13:02
and some kind of health monitoring. So nice to have here that the deploy is automated. In the meanwhile, it's happening everywhere. Everyone is using Ansible, things like this, but it was not 10 years ago. Automatic monitoring. Maybe it's possible to have some infrastructure which automatically monitors some of the health checks,
13:25
some automatic self-healing, which would mean, okay, if one thing goes down, it automatically gets rescheduled and comes up again. And maybe, so this would be really the best thing, have some rolling updates,
13:42
which means we have a new version, and we don't have to schedule a downtime and bring down the whole stuff. Deploy the new version and bring it up, but we can maybe start with one server, then the next, then the next, so a zero down upgrade. And if we, in this process, find some issues,
14:03
we can just do a rollback, and we are just there where we were at the beginning of our update. The risk here, it's pretty simple. Your business is going down. Maybe it's even fatal, so you will not recover anymore, and you are out of business. Maybe that's the most extreme one,
14:21
but this really is production, so we cannot afford any failure there. Okay, so now I scheduled maybe some, let's say, good setup. Maybe it's not the best in the world, but I would say this is something which would be seen as good. So we have our normal developers box,
14:41
so a laptop or whatever. For the building, packaging, and testing stuff, maybe the staging, so this is more or less not so clear. We have some kind of continuous integration server, Jenkins, whatever, you name it, Travis. And then in production, it would be good
15:02
to have some kind of modern cluster scheduler, so not just some hosts which can go down and things, but maybe some Mesos cluster or whatever, so because there we can get things like rolling updates for free because they implemented it already, so we don't have to care for it.
15:21
It's just implemented already. Maybe a bad setup, but I think most of you already seen some of those setups, so you just develop and build everything on your computer, and in the end, production means you have some server on Rackspace which is two years old,
15:42
and you install with some pseudo app, get things you never, you can't remember. This happens, for example, for some hobby project, I think this is completely okay. If it goes down, it doesn't matter, but for your production system, I think this would be a bad setup.
16:01
Okay, dependencies, so that's my definition. I made it up, all shared software components that need to be present in the correct version so that the application works correctly. Okay, maybe we can work with this, so this means somehow software needs to be present. But there's a thing called dependency hell.
16:21
It's pretty famous because normally you screw up these kinds of thing always. For example, you can have those conflicts and transitive dependencies, so A depends on B and C, and C on D in different versions, so you cannot really resolve it. Then there's another issue.
16:41
Armin Ronachal brought it up also in the keynote. In Python, we only have application global dependencies. That means if last depends on something, I cannot depend on this in another version. This is possible in JavaScript or maybe some other languages, but at the moment, it's not possible in Python.
17:01
So we all have to agree in the whole application on one version of a dependency. PIP, that's a sad story, of course. We all know it. Still doesn't have really a proper dependency. Resolution, all it does is it takes the first one, which comes, which, yeah, fulfills the first requirement,
17:22
and it just takes this one. But, so there is a issue, number 988. It's some years old, four years old. Yeah, but the last comment there is that this year, there's a Google Summer of Code project, and a student gets paid to implement this in PIP. So fingers crossed, maybe we will have
17:40
some proper dependency resolution as far as it can get, because still it's global dependencies, and there are just unresolvable problems. This is another thing. We have this system Python dependencies. For example, Debian comes normally with a PIP installed.
18:02
Who knows which version PIP is on? Jesse? No one? It's one, five, six, so it's really, really old. The newest one is nine. There were some steps missing in between,
18:20
but still it's a really, really old PIP version. So if you just want to install something in Python on a Debian Jesse, which is not the very newest, but the last stable one. In the meanwhile, it's not the last stable one, but it's still fully maintained. It's really a hard time, because you have to upgrade the system PIP,
18:42
and this crashes maybe everything if you do not really take care what you do there. Okay, then the next thing, package management. So how do we manage those dependencies? So in the Python world, I think we all agreed.
19:03
In the meanwhile, there's one package manager called PIP, and that's it. There's one package format, there's this wheel. But still, there is quite some, I feel there's quite some confusion around. There's still setup tools used quite often. There's still distutial stuff used somewhere.
19:21
We still, I find always some X lying around in my working directory, and I have no clue where they are coming from, but they exist. So it's really a sad story, and if you look up some, how should I install this package? Also in Stack Overflow, they are quite outdated sometimes,
19:40
so they would propose things like setup tools or things like this, and yeah. So it's still not really throughout the whole language environment universe that PIP and wheel is the standard, and there's nothing else. Also templating, so here's the developer of PyScaffold. Wave your hand.
20:01
Hello? But there's no standard one. So if we want to set up a package, a Python package, it's still really pain in the ass to really set it up. So what do I have to write in my setup Py? How do I get a Swings documentation built automatically, things like this?
20:20
So there's PyScaffold or versiony, or I think yesterday in the lightning talk there was another one. Yeah, okay, so I have to be a bit faster. It's gotten better, now we have setup config and setup tools, SCM, things like this, so it's gotten better, but it's not really good. Okay, maybe, so, okay, there is one big problem,
20:44
and this is we have the whole world of system dependencies managed by things like YAM or Epcot, and there are the language dependencies. And this is not only a problem of Python at all, it's throughout the whole IT, the whole IT thing.
21:03
So there is NPM for JavaScript, there's Conan for C++, things like this. So every language comes with its own package management for the language packages, but they somehow interfere with the system packages, and they rely on some system packages to be installed,
21:23
things like this. So it's a huge problem. I think it comes from this DevOps problem, so we had the big division between operations, people, and they managed the system dependencies, and then we have the developers, and they care for their dependencies in their own language.
21:42
And still we have this chasm, and it's, so I see no real movement that we really want to break it up. It's gotten better, but there is this problem. So I would say this is a fail. We have to fix it somehow, but there's nothing really inside.
22:00
Yeah, so as I explained already, I think, so there are some historical reasons for it. Why do we have dependencies at all? This is, yeah, maybe disk space was expensive, so it's better to share some shared artifacts. So we install, let's say, glibc just once, and not 20 times for each application running.
22:21
So these are the things why we have this chasm. So what is the current state? So I don't have too much time left, so we have to go through it quite fast. So now I have some approaches that we can do. For example, there would be the classical approach.
22:40
So we start developing, then we build a normal package. So this is the classical approach. So I think all of you would agree that this is more or less what you would come up. So we build a normal package, then we push it to Git. On Jenkins, we build some artifacts. This means we build our wheel. We test this, and then, yeah,
23:02
we release it maybe to a devpy server, and then our wheel is up there or on the upstream PyPy server. And then in production, we just go to our repository and grab this artifact with the version. So this is the classical approach.
23:21
So the pro side here is this is the standard thing. Everyone understands how it works. We have good tooling around. We can use pip. We can just use all standard tools, and it's really well understood. The cons maybe is we have to resolve the dependencies in production. This might be a problem.
23:45
Yeah, and we have to build those wheels for all target systems. So if we have some CentOS or some Debian, then we have to build it for Debian and for CentOS. Then there is this approach. We somehow conserve our virtual environment.
24:04
There are many of those implementations. Oh, just one minute left. So for example, there's Plutter or PEX or DH Virtual Env. So the principle idea is we just pack the Virtual Env together and ship the whole tar file,
24:20
and then we just extract the tar file in production. So yeah, the biggest pros on this side are we don't have to resolve the dependencies on the target system. The con side here is we still don't have the system dependencies. It's just the Python dependencies. So we still have to have installed all glibc's
24:43
in the right version or numpy, lxml, whatever. Then there's the OS package approach. So I think you all can imagine. So we build Debian packages, for example, proper Debian packages. The biggest drawback here is, of course,
25:00
that we need to package all dependencies we have as Debian packages, and it's a huge amount of work. And then, of course, we all know there is Docker, and this solves everything. It does not.
25:20
What I here want to address is containers as package management. So if you just use Docker, it's not the same like if you use Docker as package management, which means that you build your Docker file once, put it to an artifact repository, and then grab this image in production as your artifact.
25:44
If you just use Docker on your laptop, and then you rebuild your image on production, it's not containers as package managers. I mean, there are, of course, some pros, because you can use the same image everywhere,
26:00
and it doesn't matter which host OS system you have. But still, we have some huge problems. For example, we still have the division between system and language packages. It doesn't go away, you have to install things in your Docker image with apt-get, and then switch to pip or whatever, however you do it.
26:23
Security is a huge issue. If someone updates the shell shock vulnerability on the host system, it doesn't care, because it's still running in your container. So somehow, someone needs to fix all the updates there. And then, there is another thing.
26:43
I would call it the next package manager. For example, there's Conda, you all know it. And this means, okay, we just invent a whole new thing. It's not pip, it's not apt-get. We invent another package manager, and they are taking care of already
27:00
pre-compiling all the dependencies. So, another thing here would be maybe Nix, who has heard of Nix? Oh, some of you, okay. It's a complete new thing, and I think this is the first time this would solve this chasm between system dependencies
27:22
and language dependencies. So this would be a complete language-agnostic approach. We can use JavaScript and other packages, but it would go down directly to the kernel. So if we then install something with Nix, we can, yeah, we all have, everything is installed,
27:42
and of course, the good thing is if we then update a system dependency, it would be really in our application. So this whole Nix takes care that our application gets restarted, things like this.
28:00
A last thing, but yeah, okay, I have to stop. This would be the Google approach. You just take the package and download the source code and put it in your source code. The good thing there is you don't need the dependency resolution. It's everything just in your source code, but of course, you all know the bad things there.
28:21
You would just screw it up. Okay, so that's it. So the future, who knows? Thank you. Thank you very much.
28:41
Thank you very much for a nice talk. We have time for one or two very quick questions. So on the Linux side, there's new kinds of packaging formats are coming out which include flat, okay, so on Linux, there's these new package formats designed for this problem which are called Flatpak,
29:02
and then there's another one called Snap. So do you have any experience with those, and how would they tie into this? No, sorry, I don't know. Thanks, it was a good talk. Instead of a short discussion, do you have a brief opinion?
29:21
Yeah, so of course, I thought a bit about what could be a solution to all that, and so I mean, the first thing I wrote here is containers will stay. So they won't go away, and I think it's a good thing. So containers will stay.
29:40
We have those image formats, but I think we have to break up the division between system and language dependency, so I'm pretty sure we would need one language agnostic, so it doesn't matter if it's JavaScript or C++, one package management which can resolve dependencies
30:00
and goes down deeply until the kernel. And so maybe there are some, so the nearest thing there would be using Nix inside a container. That could be a solution which might work out, but yeah, I think it would take some years to get it through every, so yeah, I don't know,
30:22
but that would be maybe my shot. Okay, so with this outlook, let's thank Sebastian again for his talk. Thank you.