We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Grocker, a Python build chain for Docker

00:00

Formal Metadata

Title
Grocker, a Python build chain for Docker
Title of Series
Part Number
89
Number of Parts
169
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Fabien Bochu - Grocker, a Python build chain for Docker Grocker is a Docker build chain for Python. It transforms your Python package into a self-contained Docker image which can be easily deployed in a Docker infrastructure. Grocker also adds a Docker entry point to easily start your application. ----- At Polyconseil, we build Paris electric car sharing service: Autolib'. This system is based on many services developed using web technologies, Django and our own libraries to handle business logic. Packaging is already a difficult problem, deploying large Python projects is even more difficult. When deploying on a live and user- centric system like Autolib', you cannot rely on Pip and external PyPI servers which might become unavailable and are beyond your control. In the beginning we used classic Debian packaging: it was a maintenance hell. It took hours to build our packages and update their metadata to match our Python packages. So we switched to Docker. Docker allows us to have a unique item that is deployed in production systems: code updates are now atomic and deterministic! But before deploying the Docker image, you need to build it. That's where Grocker comes in. Grocker is a Docker build chain for Python. It will transform your Python package into a self-contained Docker image which can be easily deployed in a Docker Infrastructure. Grocker also adds a Docker entry point to easily start your application.
11
52
79
InformationChainOpen sourceProduct (business)Information systemsCartesian coordinate systemService (economics)Medical imagingBuildingLibrary (computing)Software developerContext awarenessMultiplication signDesign by contractSystem programmingProcess (computing)MathematicsWage labourFocus (optics)GenderInformation retrievalDifferent (Kate Ryan album)Service-oriented architectureCondition numberMereologyLecture/Conference
Cartesian coordinate systemPeripheralValidity (statistics)Software bugMetadataRevision controlProcess (computing)QuicksortDifferent (Kate Ryan album)PropagatorPower (physics)Multiplication signPredictabilityExistenceLogicComputer animation
Server (computing)ExistenceProduct (business)Source codeMedical imagingProcess (computing)Virtual machineAlpha (investment)Cartesian coordinate systemMultiplication signVarianceCellular automatonPredictabilitySpacetimeAdventure gamePhysical systemResultantPressureFood energyStandard deviationData managementComputer animation
Installation artServer (computing)Distribution (mathematics)Computer-generated imageryCompilerExtension (kinesiology)Medical imagingCovering spaceMathematicsFormal grammarLimit (category theory)Computer programmingInstance (computer science)ResultantCartesian coordinate systemLibrary (computing)Matching (graph theory)Flow separationDatabase normalizationDependent and independent variablesBridging (networking)Message passingMultiplication signOrder (biology)Data structureMereologyExtension (kinesiology)INTEGRALNatural numberPredictabilityWeb serviceUnabhängigkeitssystemRun time (program lifecycle phase)AreaWordServer (computing)Web 2.0Form (programming)Connected spaceScripting languageMusical ensembleReading (process)Context awarenessChemical equationSoftware testingRootkitProcess (computing)CompilerProduct (business)BuildingImage resolution2 (number)Phase transitionComputing platformRevision controlComputer animation
Computer-generated imagerySoftware testingRootkitPoint (geometry)BuildingData typeInformationObject (grammar)Default (computer science)OctahedronInteractive televisionOnline helpPhysical systemConfiguration spaceRun time (program lifecycle phase)Source codeReceiver operating characteristicDirection (geometry)Configuration spaceCondition numberMedical imagingBasis <Mathematik>Library (computing)Run time (program lifecycle phase)PredictabilityCartesian coordinate systemWebsiteLengthDistribution (mathematics)Multiplication signPhysical systemINTEGRALBitLocal ringSet (mathematics)Convex setInferencePattern recognitionSummierbarkeitFrequencySource codeGoodness of fitExtension (kinesiology)Matching (graph theory)Process (computing)Software developerDifferent (Kate Ryan album)DatabaseComputer fileRevision controlAreaBridging (networking)Operator (mathematics)Server (computing)Software testingProduct (business)BuildingFormal languageRootkitService-oriented architectureFlagImplementationPoint (geometry)File formatQR codeDependent and independent variablesMessage passingMereologyLecture/Conference
Key (cryptography)Set (mathematics)Run time (program lifecycle phase)Multiplication signType theoryElectronic mailing listBuildingWeb pageKeyboard shortcutExtension (kinesiology)Musical ensembleRevision controlDifferent (Kate Ryan album)Incidence algebraComputer animationLecture/Conference
Transcript: English(auto-generated)
question. Really looking forward. It's a pleasure. It's Fabien Boucher with the talk Grokka, a Docker build chain for Python applications. So, thank you.
I'm here to talk about Grokka. It's a build chain for build Docker image from Python package. But first, let's get some context. I work at Polyconcet. Polyconcet is a company behind Autolib information system. And Autolib is an electric car
chain services based in Paris. In fact, we have five electric car chain services in the world, but Autolib is the first and the largest one. So if you have heard of us, it's probably by Autolib. The Autolib information system is composed
of certain application. And when I say application, I do not include backend like database, ready server, and stuff like that. Those application are mostly
based on Django application, bunch of open source libraries, and your run libraries to own business logic. And our problem was to deploy those application in production. I work on developed subjects from time to time.
And that's why I'm here to introduce Grokka. Why we build Grokka? And before using Grokka to deploy your application in production, we use
Debian packaging. And Debian packaging was held in 2015 for Python application. You have to edit your Debian package metadata by hand to, excuse me, by hand to report the versions that are already
declared in your Python package. In your worst case, we took 48 hours to package your application and all its dependencies. And we aim to deploy
your application once a week. If it took 48, sorry, if it take two days to package your application, we have only three day lives for validating your application, do the bug fix, and then revalidate the
application with a bug fix. And your validation process is very long. It took about a day. It's because we have physical device like car, charge point, and on-check cost. The device have to be manipulated by humans. So we move to Docker. Docker allows us
to have atomic updates. Once you want to update a system, an application with Docker, you have just to pull an image and run it. You can have an alpha install application. The image
is based on another machine. So you, if the build process fails, your production is fine. Grokka allows us to put
multiple applications on the same server. Before we have only one application by server, most of the time the server do nothing. Yes, there are other tools that do the same thing as
Google, but in a different way. For example, OpenShift source to images do approximately the same thing, but it starts from source and now a Python package. In fact, once we start writing Docker, we do not know that source to image exists.
Docker comes from many approach we tried. The first one was to use pip and sell on the production server. But in fact, we are no use it approach because it's too dangerous. With this approach, you have to build your C
extension in place and the production server and you may fail. The next approach was to improve your Debian packaging tools. And like I said before, it was L. So we get
up and we switch to Docker approach. The second approach have also another issue. You have your application library linked with your infrastructure program libraries. So if you want to update one of your libraries, you have to be
sure that your application and an infrastructure program works with a new version. And it's very difficult. You have to do two changes in the same time. And not a good thing. The next approach we try was to use Dockerfile.
But with Dockerfile, we have to build and sell your build dependencies and sell your application and remove build dependencies in one layer. Otherwise, the image size will
be very big, more bigger than accepted. Because Docker does not delete files. It just knocks them as deleted. It's
just a mask. And the last approach we try was the slug approach. In fact, it's a Heroku-like approach. Heroku
is a very popular path, platform as a service for those who don't know it. In the slug approach, the build phase and the run phase has a split. In the build phase, we start with a base image, which include your runtime dependencies and your build time dependencies. You
instantiate one instance of this image, and you build your application on it. You save the result of the build process, and you delete the instance. And then, once you run your application, you will start a new instance of
the image, put the result of the build process on it, and your application will run in a new instance. The approach has a problem. You have one big base image for all your
applications, so you cannot have an application with special requirements. So you have a very big and fat image. In fact, Docker makes the two last approach. We
will use separate build and run phases. So how it works. Once you build an image with Docker, we start by pulling a base image on the Docker Hub, which is for new Debian GC. On this image, we will add your
runtime dependencies to create the root image. This image will use this image as base for the compiler image, which is a base image with extended by build time
dependencies and a compiler script. This image will be run to compile the wheel for your application and the dependencies. They will be stored in a data volume, and then we will use a web server to expose them. When
the web server is running, we will create the runner image, which is the final product of your Docker build chain, from the root image, and we will install the wheel. We have the pre-compiled compiler image. In
fact, it's more complex than this because we have one root image by runtime dependency set, but it does not matter. It's the best principle of how Docker works.
The compiler image is here to allow to compile wheel and especially C extensions and link them to the library that will be installed in your runner image, and it avoids the Docker layer problem. Docker
works fine. We use it every day to deploy our application in production, but it currently has some limitations. One of them is that the base image, the
root image, sorry, is 200 megabytes for your test application. It's a very, it's a test application. You use ZBar, and it's a library which allows you to decode the QR code. And for a more cumbersome
application, you have a base image of 600 megabytes. It's a very big base image, root image. Maybe it's because we use Debian JC, which base image is 125 megabytes, compared to a language distribution like
Alpinenux, which image is only five megabytes. But Alpinenux do not have ZBar, and we use ZBar in your production application. Another problem with the current implementation is that it can only build
packaged applications. And in fact, these were applications that have to be packaged and on a PyPy server. It can be a private PyPy server, but it has to be on a PyPy server. For the first limitation, we're currently working on it. We hope to have Alpinenux in the near future. Yes. So how
to use it? It's very simple. You just have to install the broker, pip install broker, and then
we write on your command line broker build your application name, true EGL sign, and your application version. And after a long time, we will have a Docker image of your application, just like you can run with Docker. In fact, it's a little more complex than that. Your application has to be
packaged, and it must use Python 3, or at least be competitive with Python 3, and do not have runtime dependencies. So IPython is a good example. Otherwise, we have a config file that
allows you to set build dependencies, runtime dependencies, and build dependencies to other stuff like that, or you can use config flag. If you read carefully the example, I use one of them. I set the entry point to IPython. So like
you can see, is IPython that is running once you start the Docker image. By default, Docker use an entry point named broker runner, but IPython don't have it. Yes. So, Docker was open
sourced yesterday. So you can see the source on pre-conceited Docker GitHub background. The package is on PyPy, and the doc can read the docs. If you want to contribute, you are welcome. Thanks for listening. Do you have questions?
Any questions? Okay.
Why is the size of your image such a big issue? Can you repeat? Why is the size of your image such a big issue? I mean, we have also Docker images, but they are way bigger than 600 megabytes. Because we want to use a pass. So if we do not
know where the image will be pulled. If you end up your pace die, for some reason, you have to pull your image. If it is one gigabyte is a
very long process. If you have smaller image, the response will be faster. Hi, thanks for the talk. You say was a PIP was
dangerous. And can you give us more details in other approach? Oh, yes. When you build C extensions, you do not know if you have good build time dependencies installed. If you don't be very meticulous of
what you install on the server, you can miss either or something like that. And if you do it from your production server, the if the build process failed, your application is not available anymore. So it's not a good thing.
More questions? The config file you talked about what kind of format does it support? The configuration file
you you can supply? Oh, yes. What kind of format does it support? It's a YAML file. Maybe I have an example somewhere. It's very small, but I don't know
how to increase the size. The part we took on file for your application is, yeah, so you can, I can read it. You can set your
library, your volumes, your entry points, your your ports, I think. If you go on the
documentation, we explained that on the on this page. So you can say you can put
dependencies just for dependencies, you can say that it's just a runtime dependencies. Because, for example, we use GEDAL for, I don't know, but you do not have a binding
essay extension is just use a set type or something like that. You can have a dependency we have runtime, one runtime dependencies, or a list of runtime dependencies, build time dependencies. More questions? We have time enough? No?
Okay, thanks again.