Grocker, a Python build chain for Docker
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 89 | |
Number of Parts | 169 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/21131 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
EuroPython 201689 / 169
1
5
6
7
10
11
12
13
18
20
24
26
29
30
31
32
33
36
39
41
44
48
51
52
53
59
60
62
68
69
71
79
82
83
84
85
90
91
98
99
101
102
106
110
113
114
115
118
122
123
124
125
132
133
135
136
137
140
143
144
145
147
148
149
151
153
154
155
156
158
162
163
166
167
169
00:00
InformationChainOpen sourceProduct (business)Information systemsCartesian coordinate systemService (economics)Medical imagingBuildingLibrary (computing)Software developerContext awarenessMultiplication signDesign by contractSystem programmingProcess (computing)MathematicsWage labourFocus (optics)GenderInformation retrievalDifferent (Kate Ryan album)Service-oriented architectureCondition numberMereologyLecture/Conference
01:56
Cartesian coordinate systemPeripheralValidity (statistics)Software bugMetadataRevision controlProcess (computing)QuicksortDifferent (Kate Ryan album)PropagatorPower (physics)Multiplication signPredictabilityExistenceLogicComputer animation
03:33
Server (computing)ExistenceProduct (business)Source codeMedical imagingProcess (computing)Virtual machineAlpha (investment)Cartesian coordinate systemMultiplication signVarianceCellular automatonPredictabilitySpacetimeAdventure gamePhysical systemResultantPressureFood energyStandard deviationData managementComputer animation
04:58
Installation artServer (computing)Distribution (mathematics)Computer-generated imageryCompilerExtension (kinesiology)Medical imagingCovering spaceMathematicsFormal grammarLimit (category theory)Computer programmingInstance (computer science)ResultantCartesian coordinate systemLibrary (computing)Matching (graph theory)Flow separationDatabase normalizationDependent and independent variablesBridging (networking)Message passingMultiplication signOrder (biology)Data structureMereologyExtension (kinesiology)INTEGRALNatural numberPredictabilityWeb serviceUnabhängigkeitssystemRun time (program lifecycle phase)AreaWordServer (computing)Web 2.0Form (programming)Connected spaceScripting languageMusical ensembleReading (process)Context awarenessChemical equationSoftware testingRootkitProcess (computing)CompilerProduct (business)BuildingImage resolution2 (number)Phase transitionComputing platformRevision controlComputer animation
10:44
Computer-generated imagerySoftware testingRootkitPoint (geometry)BuildingData typeInformationObject (grammar)Default (computer science)OctahedronInteractive televisionOnline helpPhysical systemConfiguration spaceRun time (program lifecycle phase)Source codeReceiver operating characteristicDirection (geometry)Configuration spaceCondition numberMedical imagingBasis <Mathematik>Library (computing)Run time (program lifecycle phase)PredictabilityCartesian coordinate systemWebsiteLengthDistribution (mathematics)Multiplication signPhysical systemINTEGRALBitLocal ringSet (mathematics)Convex setInferencePattern recognitionSummierbarkeitFrequencySource codeGoodness of fitExtension (kinesiology)Matching (graph theory)Process (computing)Software developerDifferent (Kate Ryan album)DatabaseComputer fileRevision controlAreaBridging (networking)Operator (mathematics)Server (computing)Software testingProduct (business)BuildingFormal languageRootkitService-oriented architectureFlagImplementationPoint (geometry)File formatQR codeDependent and independent variablesMessage passingMereologyLecture/Conference
17:43
Key (cryptography)Set (mathematics)Run time (program lifecycle phase)Multiplication signType theoryElectronic mailing listBuildingWeb pageKeyboard shortcutExtension (kinesiology)Musical ensembleRevision controlDifferent (Kate Ryan album)Incidence algebraComputer animationLecture/Conference
Transcript: English(auto-generated)
00:00
question. Really looking forward. It's a pleasure. It's Fabien Boucher with the talk Grokka, a Docker build chain for Python applications. So, thank you.
00:22
I'm here to talk about Grokka. It's a build chain for build Docker image from Python package. But first, let's get some context. I work at Polyconcet. Polyconcet is a company behind Autolib information system. And Autolib is an electric car
00:46
chain services based in Paris. In fact, we have five electric car chain services in the world, but Autolib is the first and the largest one. So if you have heard of us, it's probably by Autolib. The Autolib information system is composed
01:10
of certain application. And when I say application, I do not include backend like database, ready server, and stuff like that. Those application are mostly
01:25
based on Django application, bunch of open source libraries, and your run libraries to own business logic. And our problem was to deploy those application in production. I work on developed subjects from time to time.
01:49
And that's why I'm here to introduce Grokka. Why we build Grokka? And before using Grokka to deploy your application in production, we use
02:01
Debian packaging. And Debian packaging was held in 2015 for Python application. You have to edit your Debian package metadata by hand to, excuse me, by hand to report the versions that are already
02:27
declared in your Python package. In your worst case, we took 48 hours to package your application and all its dependencies. And we aim to deploy
02:42
your application once a week. If it took 48, sorry, if it take two days to package your application, we have only three day lives for validating your application, do the bug fix, and then revalidate the
03:02
application with a bug fix. And your validation process is very long. It took about a day. It's because we have physical device like car, charge point, and on-check cost. The device have to be manipulated by humans. So we move to Docker. Docker allows us
03:29
to have atomic updates. Once you want to update a system, an application with Docker, you have just to pull an image and run it. You can have an alpha install application. The image
03:48
is based on another machine. So you, if the build process fails, your production is fine. Grokka allows us to put
04:02
multiple applications on the same server. Before we have only one application by server, most of the time the server do nothing. Yes, there are other tools that do the same thing as
04:20
Google, but in a different way. For example, OpenShift source to images do approximately the same thing, but it starts from source and now a Python package. In fact, once we start writing Docker, we do not know that source to image exists.
04:45
Docker comes from many approach we tried. The first one was to use pip and sell on the production server. But in fact, we are no use it approach because it's too dangerous. With this approach, you have to build your C
05:05
extension in place and the production server and you may fail. The next approach was to improve your Debian packaging tools. And like I said before, it was L. So we get
05:22
up and we switch to Docker approach. The second approach have also another issue. You have your application library linked with your infrastructure program libraries. So if you want to update one of your libraries, you have to be
05:44
sure that your application and an infrastructure program works with a new version. And it's very difficult. You have to do two changes in the same time. And not a good thing. The next approach we try was to use Dockerfile.
06:06
But with Dockerfile, we have to build and sell your build dependencies and sell your application and remove build dependencies in one layer. Otherwise, the image size will
06:24
be very big, more bigger than accepted. Because Docker does not delete files. It just knocks them as deleted. It's
06:46
just a mask. And the last approach we try was the slug approach. In fact, it's a Heroku-like approach. Heroku
07:00
is a very popular path, platform as a service for those who don't know it. In the slug approach, the build phase and the run phase has a split. In the build phase, we start with a base image, which include your runtime dependencies and your build time dependencies. You
07:23
instantiate one instance of this image, and you build your application on it. You save the result of the build process, and you delete the instance. And then, once you run your application, you will start a new instance of
07:41
the image, put the result of the build process on it, and your application will run in a new instance. The approach has a problem. You have one big base image for all your
08:01
applications, so you cannot have an application with special requirements. So you have a very big and fat image. In fact, Docker makes the two last approach. We
08:21
will use separate build and run phases. So how it works. Once you build an image with Docker, we start by pulling a base image on the Docker Hub, which is for new Debian GC. On this image, we will add your
08:44
runtime dependencies to create the root image. This image will use this image as base for the compiler image, which is a base image with extended by build time
09:00
dependencies and a compiler script. This image will be run to compile the wheel for your application and the dependencies. They will be stored in a data volume, and then we will use a web server to expose them. When
09:22
the web server is running, we will create the runner image, which is the final product of your Docker build chain, from the root image, and we will install the wheel. We have the pre-compiled compiler image. In
09:44
fact, it's more complex than this because we have one root image by runtime dependency set, but it does not matter. It's the best principle of how Docker works.
10:00
The compiler image is here to allow to compile wheel and especially C extensions and link them to the library that will be installed in your runner image, and it avoids the Docker layer problem. Docker
10:25
works fine. We use it every day to deploy our application in production, but it currently has some limitations. One of them is that the base image, the
10:41
root image, sorry, is 200 megabytes for your test application. It's a very, it's a test application. You use ZBar, and it's a library which allows you to decode the QR code. And for a more cumbersome
11:01
application, you have a base image of 600 megabytes. It's a very big base image, root image. Maybe it's because we use Debian JC, which base image is 125 megabytes, compared to a language distribution like
11:23
Alpinenux, which image is only five megabytes. But Alpinenux do not have ZBar, and we use ZBar in your production application. Another problem with the current implementation is that it can only build
11:41
packaged applications. And in fact, these were applications that have to be packaged and on a PyPy server. It can be a private PyPy server, but it has to be on a PyPy server. For the first limitation, we're currently working on it. We hope to have Alpinenux in the near future. Yes. So how
12:13
to use it? It's very simple. You just have to install the broker, pip install broker, and then
12:20
we write on your command line broker build your application name, true EGL sign, and your application version. And after a long time, we will have a Docker image of your application, just like you can run with Docker. In fact, it's a little more complex than that. Your application has to be
12:44
packaged, and it must use Python 3, or at least be competitive with Python 3, and do not have runtime dependencies. So IPython is a good example. Otherwise, we have a config file that
13:02
allows you to set build dependencies, runtime dependencies, and build dependencies to other stuff like that, or you can use config flag. If you read carefully the example, I use one of them. I set the entry point to IPython. So like
13:21
you can see, is IPython that is running once you start the Docker image. By default, Docker use an entry point named broker runner, but IPython don't have it. Yes. So, Docker was open
13:41
sourced yesterday. So you can see the source on pre-conceited Docker GitHub background. The package is on PyPy, and the doc can read the docs. If you want to contribute, you are welcome. Thanks for listening. Do you have questions?
14:13
Any questions? Okay.
14:25
Why is the size of your image such a big issue? Can you repeat? Why is the size of your image such a big issue? I mean, we have also Docker images, but they are way bigger than 600 megabytes. Because we want to use a pass. So if we do not
14:46
know where the image will be pulled. If you end up your pace die, for some reason, you have to pull your image. If it is one gigabyte is a
15:01
very long process. If you have smaller image, the response will be faster. Hi, thanks for the talk. You say was a PIP was
15:21
dangerous. And can you give us more details in other approach? Oh, yes. When you build C extensions, you do not know if you have good build time dependencies installed. If you don't be very meticulous of
15:44
what you install on the server, you can miss either or something like that. And if you do it from your production server, the if the build process failed, your application is not available anymore. So it's not a good thing.
16:04
More questions? The config file you talked about what kind of format does it support? The configuration file
16:20
you you can supply? Oh, yes. What kind of format does it support? It's a YAML file. Maybe I have an example somewhere. It's very small, but I don't know
17:21
how to increase the size. The part we took on file for your application is, yeah, so you can, I can read it. You can set your
17:46
library, your volumes, your entry points, your your ports, I think. If you go on the
18:07
documentation, we explained that on the on this page. So you can say you can put
18:24
dependencies just for dependencies, you can say that it's just a runtime dependencies. Because, for example, we use GEDAL for, I don't know, but you do not have a binding
18:41
essay extension is just use a set type or something like that. You can have a dependency we have runtime, one runtime dependencies, or a list of runtime dependencies, build time dependencies. More questions? We have time enough? No?
19:03
Okay, thanks again.