Optimizing Docker builds for Python applications - TIB AV-Portal

Optimizing Docker builds for Python applications

00:00

3

Formal Metadata

Title

Optimizing Docker builds for Python applications

Title of Series

EuroPython 2019

Number of Parts

118

Author

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/44753 (DOI)

Publisher

Release Date

Language

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Do you deploy Python applications in Docker? Then this session is for you! We will start by reviewing a simple Dockerfile to package a Python application and move to more complex examples which speed up the build process and reduce the size of the resulting Docker image for both development and production builds.

Keywords

Deployment/Continuous Integration and Delivery

EuroPython 2019118 / 118

1

32:25

Dash: Interactive Data Visualization Web Apps with no Javascript

2

29:36

Google Cloud for Pythonistas

3

30:25

Visual debugger for Jupyter Notebooks: Myth or Reality?

4

30:41

Image processing with scikit-image and Dash

5

10:06

EuroPython 2019 - Closing Session

6

45:35

EuroPython 2019 - Lightning talks

7

1:05:06

EuroPython 2019 - Lightning talks

8

1:09:28

EuroPython 2019 - Lightning talks

9

04:02

Europython 2019 - Morning Announcements

10

15:34

EuroPython 2019 - Opening Session

11

49:34

EuroPython 2019 - Recruiting Session

12

14:24

EuroPython 2019 - Sprint Orientation

13

30:23

Building Industry 4.0 logistics applications with MicroPython and ESP32 MCUs

14

42:52

Machine learning on non curated data

15

43:47

Don't start with a database

16

1:02:11

Delta Chat, CFFI, pytest and all the Rust

17

29:04

Natural language processing with neural networks.

18

45:49

Maintaining a Python Project When It’s Not Your Job

19

30:30

Gamifying the study of algorithms

20

28:10

21

28:17

Downloading a Billion Files in Python

22

12:46

Docker meets Python - A look on the Docker SDK for Python

23

29:49

Deploy Python to the cloud faster with Azure Serverless

24

29:51

Introduction to Python and MongoDB

25

31:23

How Thinking in Python Made Me a Better Software Engineer

26

43:26

Audio Classification with Machine Learning

27

26:16

PEP yourself: 10 PEPs you should pay attention to

28

28:32

Zen of Python Dependency Management

29

26:45

Building a Powerful Pet Detector in Notebooks

30

33:19

Moving big projects to Python 3

31

45:18

From days to minutes, from minutes to milliseconds with SQLAlchemy

32

44:09

Teaching Programming to the Next Generation

33

43:25

AI in Contemporary Art

34

47:16

Getting Your Data Joie De Vivre Back!

35

40:01

Advanced asyncio: Solving Real-world Production Problems

36

52:13

EPS General Assembly 2019

37

24:39

EuroPython 2020: Help us build the next edition!

38

31:06

PyRun - Shipping the Python 3.7 runtime in just 4.8MB

39

24:58

“When a biologist met Python”

40

27:43

Exceptional Exceptions

41

29:32

Publish a (Perfect) Python Package on PyPI

42

30:05

Geospatial Analysis using Python and JupyterHub

43

42:46

Python for realtime audio processing in a live music context

44

44:49

How to ship a Python app to a hundred million desktops

45

27:46

How to read (code)

46

32:35

From Python script to Open Source Project

47

47:05

AsyncIO in production - War Stories

48

45:14

Tips for the scientific programmer

49

58:46

Look Ma, No HTTP!

50

43:51

From 0 to 180 in 10 years: Evolving a helper script into a 180,000-lines-of-Python-code project

51

44:23

A Day Has Only 24±1 Hours

52

28:03

Bioinformatics pipeline for revealing tumour heterogeneity

53

29:24

Building Data Workflows with Luigi and Kubernetes

54

30:48

Tools of the Trade: The Making of a Code Editor.

55

45:56

The soul of the beast

56

27:12

Dissecting tf.function to discover AutoGraph strengths and subtleties

57

26:46

GraphQL in Python

58

38:25

59

31:20

Data-Driven Customer Relationship Management bin Banking with Python

60

46:47

Modern Continuous Delivery for Python Developers

61

31:02

Configuring uWSGI for Production: The defaults are all wrong

62

29:00

The dos and don'ts of task queues

63

26:34

PlotVR - walk through your data

64

30:13

Parallel computing in Python

65

26:11

Game Development with CircuitPython

66

26:53

Code quality in Python

67

43:31

Software patterns for productive teams

68

53:28

Advanced pytest

69

27:29

Practical decorators

70

42:03

How to train an image classifier using PyTorch

71

45:31

Opening PyPy's magic black box

72

32:23

Python's Parallel Programming Possibilities - 4 levels of concurrency

73

30:27

The Dangers of Outsourcing Software Development

74

42:21

Wait, IPython can do that?!

75

28:43

Accelerate your Deep Learning Inferencing with the Intel® DL Boost technology

76

21:50

Supercharge your Deep Learning algorithms with optimized software

77

44:57

Go(lang) to Python

78

46:29

Get up to speed with Cython 3.0

79

23:34

Testing Microservices: fast and with confidence

80

30:36

Become a command line wizard

81

24:06

Opt Out of Online Sexism – Open Source Activism

82

45:30

Don't do this at work

83

31:02

Refactoring in Python: Patterns & Approach

84

30:28

Writing an autoreloader in Python

85

28:32

Enhancing Angklung Music Rehearsals with Python

86

27:16

Using Python to Teach Computational Finance

87

47:41

Python Performance: Past, Present and Future

88

30:26

Static typing: beyond the basics of def foo(x: int) -> str:

89

30:17

From HTTP to Kafka-based microservices

90

41:49

Why You Should Pursue Public Speaking

91

28:08

Code review for Beginners and Experts: Tips & Tricks

92

29:12

useFlask() - or how to use a React frontend for your Flask app

93

29:53

What about recommendation engines?

94

1:01:38

The state of Machine Learning Operations in 2019

95

44:24

Python Standard Library, The Hidden Gems

96

30:28

Explaining AI to Managers

97

40:44

How we run GraphQL APIs in production on our Kubernetes cluster

98

28:57

Python vs Rust for Simulation

99

25:42

The Agile comedy: from hell to paradise

100

27:40

Better WebSockets - Server-Sent Events, a carefree alternative

101

30:27

The Story of Features Coming in Python 3.8 and Beyond

102

30:15

How To Build a Python Microservice Without Losing a Job

103

40:39

How to write a JIT compiler in 30 minutes

104

42:12

Are women underrepresented in the High Performance Computing (HPC) community?

105

16:59

Hack The CPython

106

29:36

Astro Pi: Python on the International Space Station

107

29:55

status quo of virtual environments

108

32:14

Deep Learning with TensorFlow 2.0

109

31:07

Do we have a diversity problem in Python community?

110

43:10

How software can feed the world

111

45:20

Introduction to low-level profiling and tracing

112

30:28

Understanding Numba - the Python and Numpy compiler

113

37:23

Is it me, or the GIL?

114

39:28

Running a Synchrotron on Open Source Python

115

29:26

Unleash the power of C++ in Python

116

27:06

A Python-powered pantographic plotter

117

44:21

And now for something completely different.

118

29:44

Optimizing Docker builds for Python applications

Automatic playback

Speech

Text

Image

00:00

Multiplication signSoftwareCartesian coordinate systemIntegrated development environmentPhysical systemSlide ruleVideo gameSystems engineeringBuildingMathematical optimizationLecture/Conference

00:57

Cartesian coordinate systemWeb pageComputer animationLecture/Conference

01:46

SpacetimeDifferent (Kate Ryan album)Kernel (computing)Template (C++)Windows RegistryComputer-generated imageryComputer networkData storage deviceFocus (optics)Mathematical optimizationMedical imagingComputer fileCartesian coordinate systemPoint (geometry)CryptographyProjective planePhysical systemWindows RegistryMathematical optimizationSampling (statistics)Stack (abstract data type)CASE <Informatik>DiagramDefault (computer science)1 (number)BuildingSpacetimeData structureKernel (computing)Template (C++)Object (grammar)Set (mathematics)Multiplication signDifferent (Kate Ryan album)Module (mathematics)Shared memoryDirectory serviceFunctional (mathematics)MereologyFlow separationProcess (computing)Image resolutionMobile appComputer animation

05:56

CASE <Informatik>Medical imagingImage resolutionProjective planePairwise comparisonTable (information)ResultantProduct (business)CodeSoftware developerLecture/ConferenceMeeting/Interview

06:26

Extension (kinesiology)BLACK MAGICGraph (mathematics)Computer-generated imagerySpacetimeCompilerInstallation artCache (computing)Compilation albumRun time (program lifecycle phase)Computer fileInclusion mapLatent heatStatement (computer science)Source codeExtension (kinesiology)Computer fileImage resolutionRun time (program lifecycle phase)Source codeMultiplication signDirectory serviceSelectivity (electronic)Medical imagingLibrary (computing)Logic gateData Encryption StandardInstallation artCryptographyOpen setCompilation albumMaxima and minimaCartesian coordinate systemTerm (mathematics)Time zoneDifferent (Kate Ryan album)EmailCASE <Informatik>SpacetimeLatent heatMenu (computing)Projective planeStatement (computer science)BuildingComputer animation

11:30

StrutComputer fileCache (computing)Overhead (computing)Statement (computer science)Physical systemSource codeComputer configurationInstallation artGezeitenkraftMaxima and minimaSoftware testingRun time (program lifecycle phase)BuildingMedical imagingRun time (program lifecycle phase)Statement (computer science)File formatMaxima and minimaSingle-precision floating-point formatMultiplicationCASE <Informatik>Image resolutionComputer fileElectronic mailing listSoftware testingLevel (video gaming)BuildingAdditionOverhead (computing)1 (number)Physical systemGraph (mathematics)Mathematical optimizationLocal ringSound effectComputer configurationPoint (geometry)Latent heatAnalytic continuationSource codeSoftware developerResultantMultiplication signModule (mathematics)Product (business)Open setCache (computing)Computer animation

16:34

Execution unitCASE <Informatik>Statement (computer science)Computer animationMeeting/Interview

17:09

Computer-generated imageryBinary fileMultiplicationLevel (video gaming)ResultantMedical imagingCartesian coordinate systemBinary codeFormal languageImage resolutionInterpreter (computing)Process (computing)Programming languageExtension (kinesiology)Computer animationLecture/Conference

18:40

Formal languageLevel (video gaming)MultiplicationMereologyHand fanExtension (kinesiology)Data managementBuildingSource codeProjective planeTwitterMobile appVirtual realityLevel (video gaming)Computer fileMedical imagingCartesian coordinate systemResultantCodeGraph (mathematics)Directed graphComputer animationMeeting/Interview

20:14

Directed graphCache (computing)Software testingMultiplicationLevel (video gaming)Curve fittingComputer-generated imageryBuildingWindows RegistryProjective planeLevel (video gaming)Module (mathematics)CASE <Informatik>ResultantDirectory serviceComputer fileLine (geometry)Virtual realityMedical imagingRun time (program lifecycle phase)BitMobile appComplex (psychology)BuildingSoftware testingRight angleMultiplicationXMLComputer animation

22:45

Run time (program lifecycle phase)Source codeIntegrated development environmentLocal ringMedical imagingBuildingWindows RegistrySource codeMeeting/InterviewComputer animation

23:22

Buffer solutionComputer fileIntegrated development environmentVariable (mathematics)AreaCache (computing)RankingComputer-generated imageryComplex (psychology)Level (video gaming)MultiplicationStatement (computer science)Graph (mathematics)Virtual realityCodeVariable (mathematics)Web applicationSource codeIntegrated development environmentStatement (computer science)Complex (psychology)Image resolutionMedical imagingWechselseitige InformationElectronic mailing listBytecodeElectric generatorProcedural programmingCASE <Informatik>Slide ruleVirtual realityUsabilityComputer fileWritingBitCache (computing)Lecture/ConferenceMeeting/InterviewComputer animation

25:19

Virtual realityMultiplication signProduct (business)Integrated development environmentSoftware developerDatabaseWeb applicationFront and back endsLevel (video gaming)Different (Kate Ryan album)Software testingLocal ringINTEGRALUnit testingCASE <Informatik>Maxima and minimaSurjective functionMedical imagingCartesian coordinate systemBitResultantEvent horizonExecution unitWeb 2.0Lecture/Conference

Transcript: English(auto-generated)

00:03

I had a great time, too, and by the way, this was my first year of Python. So I'm going to talk about Docker and Docker and Python, and Docker is a popular way to package and run applications, however, when you're packaging Python applications in

00:25

Docker, there are some caveats, so I'm going to share with you my lessons learned when I was trying to optimise Docker builds. Now, I hope you will find something useful, something valuable, something that you can take away and apply in your environment. My name is

00:42

Dmitry and I'm a system engineer at Cisco. I create Python applications for internal use, and I focus mostly on network automation. You can find slides here if you would like to follow along. Before we proceed, let's do a quick show of hands. Who has been using

01:05

Docker already? Wow, okay. Keep your hand raised if you're using Python and Docker together. Okay. The same amount. That's awesome. In this talk, I'm not going to talk to you about why Docker, and what are the benefits, and if you should use it.

01:25

If you haven't already, you should check it out. I started using it around two years ago, and it completely changed the way I deployed the application. So, to make sure that everyone is on the same page, let's start with some Docker terminology, okay?

01:44

So first, container. It's a lightweight way to package your application with dependencies, and different containers have some isolation, they have separate user space, but they share the kernel of the host. Now, next one is Docker image. Docker image is a template

02:04

to create Docker containers. It's built using the Docker file, and it consists of read-only layers. We are going to talk about layers later. We can upload the image to the registry and share it with others. Now, Docker file is a set of instructions to build an image.

02:25

You start with the base image from which one you're going to inherit, and then every instruction does something, and it creates a new layer which is cached for future builds. In this case, I have a silly example where I inherit from Debian image, I copy

02:46

a file from my host system to the image, then I run some command inside of this image, and then I say that, okay, my default command when I'm going to start this container

03:01

is going to be there. Okay. So, what is Docker container? It's a container created from Docker image. We add a writable, a layer on top. We allocate resources, and then when we start the container, we execute entry point and CMD commands. Okay, and the last

03:25

one is registry. It's a place where we store and share tagged images. Here is the very simple diagram. Again, just to summarise, we have a Docker file. We use Docker build commands to build an image from it. We can change the tag using Docker tag. We can

03:45

push and pull this image to the registry, and when we want to run a container, we do Docker run on the image tag. We apply entry point and CMD, and here we go. We have a container. Now, in this talk, we are going to focus mostly on the left part on

04:04

the build process. Okay, so, Python and Docker. So, on the left-hand side, you can see my sample project. So I have my project is a directory which contains, which is a Python

04:22

module, and then I have main POI which I run when I want to run this application, and it invokes some function from my project grid. Details of the Python itself here are not really important. The structure is more important. Now, another thing that I have

04:40

is the requirements.txt file which contains Python dependencies. In this case, I have only two. Requests and cryptography. And you will understand why I chose something like cryptography for this talk. Now, in the middle, you can see a sample Docker file where I inherit from Python 3.7 image. I create a directory called app, and then

05:07

I copy everything from the host system, from the current directory to the image, I install my dependencies, and then I define that I'm going to run Python main POI. Now, it's

05:21

so good. It's very simple. However, the size of this container is around one gigabyte. And in this talk, we are going to see how we can improve on that. So, before we go further, let's define our optimization objectives, what we are going to optimize. And there are two things. Image size, but also the build time, and it can be initial

05:46

build time as well as subsequent build time. Now, let's also define priorities, and those are the ones that I defined for myself for my projects. During development, I would like to have fast builds, okay? I care less about image size during development

06:03

because whenever I change my code, I like to see results much faster. But, for production, I prefer small image size. In your case, the priorities could be different. Okay, so the first and the most important one is selecting the base image. So, here is

06:25

the comparison table. I have Python 3.7 which corresponds to Python 3.7 stretch where the base image for that is Debian stretch. The size of it is around 900 max. We have slim stretch, which is much smaller version, but it's still using Debian as a base image.

06:47

It's 150 max. You can see it's six times different already. We also have Alpine, and Alpine is very popular. Alpine Linux is a very popular base image, especially in container world because of the small image size. You can see it's almost twice less than slim stretch.

07:05

What are the differences in terms of the Python applications and these base images? Well, the Debian uses glibc, and what it allows it to do is it supports many Linux builds. Now, when we are talking about many Linux builds, we have to talk briefly

07:25

about Python native extensions, so Python native extensions, we usually have to compile them to make them work with Python, and those are libraries that some of you may

07:42

know, like cryptography, XML, and some others, they use Python native extensions. However, there was this way of creating a menu Linux build where that we don't have to compile it. It's precompiled for us. We just download the build and extract it. However, the Alpine

08:03

uses muscle, and it doesn't support many Linux builds, so the consequence of that is that Python extensions should be compiled, and if you have ever tried installing something like XML in Alpine, it takes around 15 minutes just for that one dependency. Now, if you

08:25

want to know more about many Linux builds and Python native extensions, there was amazing talks this year at PyCon US, the black magic of Python builds. I strongly recommend you watching that. Also, what I noticed is that, in Alpine, some of the well-known packages take much less space. For example, for Git, I think there was three times difference

08:46

in size. So, in general, the footprint of Alpine images is much smaller. So, here is my recommendation. When you care about the build time, I would select slim stretch

09:01

as base. Whenever you care about image size, I would recommend selecting Alpine as a base. The main reason why is that, for a slim stretch, you can use the many Linux builds. Okay, so let's do that. So, I changed my previous Dockerfile, and now I'm using

09:29

slim stretch. You can see that the size from one gig is around 200 megs now. With

09:40

that, we have that cryptography dependency, and it needs to be compiled to be able to run Python, because there is no many Linux, well, there is no build for Alpine for cryptography. So, this case, I have to install tools like GCC, but also some packages with

10:03

the headers, like OpenSSL, and when I do all of that, you can see that the size is 300 megs, which is actually bigger than slim stretch. So, you may wonder, like, you just told us that you should use Alpine if you care about image size, right? So, what's

10:22

going on here? Well, let's first define the problem. So, the problem is that the build dependencies which are contributing to image size here are needed only for compilation, but not the runtime. This is the main issue here. So, the general solution to this

10:41

is you have to include only files necessary for your runtime in your image. So, how to achieve that? Well, first, let's take a look on when we are copying the source code to the image. So, the recommendation here is to use more specific copy statements instead

11:03

of broad copy ..., and you can also use Docker ignore file to ignore some of the files when you are doing that copy. So, here is an example of Docker ignore file I use. I just copy it from every, you know, from every project. So, something like ... or PYC files, or ... I usually ignore that to make sure that it's

11:27

never in my image. Okay? So, let's apply the technique of, instead of having the broad statement, copy, copy, copy dot, dot, and convert it to more specific copy

11:44

statements, in this case, I copy my module, my Python module, my project, and main PY, so you can see that the size decreased. The reason why that I had V ENV on my host, and it previously got copied to the image. So now, it's not

12:05

there. And the same with Alpine, we also get around 20 megs from doing that. Okay. Now, this next one is very important, removing unnecessary files.

12:22

And it's not as easy as it may sound. So, let's try using that Alpine Docker file. It's exactly the same, however, I added at the very end additional run instruction, where I'm trying to delete GCC, open SSL, and some other

12:42

packages because I don't need them during the run time. And if you do that, you can see that the size of the image hasn't changed at all, right? So, what's going on here? Well, to answer this question, we have to understand how Docker layers work, so every instruction creates a layer, and

13:04

then the new layer can be smaller than the previous layer. Those layers are cached and can be reused for subsequent builds, and layers themselves introduce some overhead, but the first two are the most important ones. So, again, the new layer can have smaller size than the previous layer.

13:26

So, what's the consequence of this? What is the takeaway here? First is combining multiple run statements into a single one, so that they are formed the same layer. If you need to delete files, you have to make sure that you

13:43

delete them in the same layer where they were added, because if you do it later, it has no effect whatsoever on the image size. If you want benefit from caching, you have to arrange your statements in the order from the least changing to the most changing. Usually, the order will be system-level

14:04

dependencies and tools, Python dependencies, and then the source code. And another tip would be not to save anything to cache. For example, with PIP, you can use no cache gear, not to save builds. With APK, you can

14:22

use no-cache option as well. So let's try to apply these principles to our problem here. Now, in this case, for slim stretch, the only thing I added was no-cache gear, so I got additional formats, and, in case of Alpine, I

14:44

defined my build dependencies and runtime dependencies, and then I combined all of my run statements into a single one. So I have, first, I install my build dependencies, then I install my Python dependencies, then I delete

15:01

build dependencies, and then I install runtime dependencies. And all of that is in a single run statement, and the result, you can see it's already three times smaller, so it's 100 megs. Now, from this point on, I'm not, I will no longer consider slim stretch any more, because we can

15:22

already see the image size of the Alpine is much smaller, so we are going to continue optimising that, but slim stretch is already good, so, because I personally use slim stretch for my local development builds, I don't care about those 20, 30 extra megs. But in case of my production image size,

15:48

sometimes I do, so we will keep decreasing that size. So, here is an optional thing that you can do. You can delete your PYC files and tests from your dependencies. If you want to, if you do, then it becomes

16:08

even more complex, so you have to find those files in your user local PYC files, test files, and delete them. In this case, you get additional 10 megs, sometimes you get around 30 megs from it. Really depends. Okay.

16:26

What are the disadvantages of this approach that I just shown? Well, the Dockerfile becomes really complex. You have to always remember that you have to install build dependencies, and then install everything else that

16:44

you need, and then delete everything that you don't need, everything in the single statement. The consequence of this is not only it is complex, but also you can no longer benefit from caching. You can't cache your build dependencies in this case. You will have to always rebuild the

17:05

container. Okay. So, Docker multi-stage. So, the idea behind Docker multi-stage is you build an intermediary image where you have all of your build dependencies, and you install your application. Then

17:25

you copy result, for example, binary, if it's go, length, for example, or whatever the result from your programming language is, to a fresh image, and then you label it as your final image. So, you have these two

17:41

separate images, if you will, in one image you do all of your build process, and the second one you actually package your application for future use. So, why would you want to do that? The resulting image size

18:01

is smaller, because you will have no build dependencies. It could be also faster, because you can cache all of those build dependencies now. You no longer have to delete them anywhere. Okay. So, however, Python is interpreted language, and the question is, are multi-stage builds

18:21

relevant to Python apps? My answer is somewhat. The main thing is that even though Python is interpreted, you still may have those dependencies which use native Python extensions. Not only that, but you may also have

18:45

some other tooling that you need as part of your build. For example, I am a big fan of a tool called Poetry which allows you to manage Python dependencies. If you think about that, and it is also in the same league, if you think about that, you only need the tool to install

19:04

your dependencies from the log file, but you don't really need that tool to run your app. So, all of that can really go to that build stage, and then you can copy only your result to your final image. Okay. So, here is the idea, and thanks to Hynek for sharing it

19:30

on Twitter. In order to simplify copying from one stage to another, the easiest solution would be to use virtual environments. So, the idea is

19:43

that you have your code of your application, you create the virtual environment in the same folder, for example, you install all dependencies that you need, and then between the build stage and your final stage, you just change, you just copy the whole project directory, including

20:03

your source code, including your virtual environment. And it works out pretty well. So, let's take a look. So, this is the example of Python Docker multi-stage. It may seem a little bit complex, but it really

20:21

isn't from our previous examples. On the left-hand side, I have my builder stage, where I still have my build dependencies, I install them, I create virtual environment, and then I also upgrade pip in that case. I copy my requirements, txt, and then I install my dependencies,

20:45

and then in this case, I also delete PYC files and tests. But you don't really have to do that stage. And then on the right-hand side, so, the result from the left-hand side that in slash app, slash .vnv, we

21:02

have our virtual environment, and in slash app, slash my project, we have our module. So, in the second stage, I inherit from Python 3.7 Alpine again, I install my runtime dependencies only, and then

21:21

I copy my slash app directory, and that's pretty much it. So you no longer have to care about, you no longer have to care about how you delete those build dependencies. The size is a little bit bigger, because you use virtual environments, but I think that's fine. So, one

21:45

additional piece that you get in this case, your build dependencies are now cached, so everything, well, depending if you change your Python dependencies often or not, you can cache up to line 14, or

22:01

maybe even further, maybe even the whole build stage, it really depends, because you don't have to delete anything, so, in case you don't change your Python dependencies, you cache the whole layer. In case you change them, but your system-level build dependencies are still the same, you can cache up until line 12.

22:25

So that's pretty nice, because previously, we couldn't do caching at all. Okay. So now that we have that, you can also create your custom image with common build dependencies across your multiple projects. For example, as I told you, I like using

22:42

poetry, in some cases, I need curl to download something, sometimes I needed Git, I may also need a bunch of build dependencies, so I just build a custom image for it, and I store it in a registry. So, then, that multi-stage is simplified even further. We just

23:02

have to inherit from your custom image, and everything else is the same. Okay. A couple of quick advices here, or suggestions. What I found for my local dev, where I use slim stretch, sometimes binding out of your source code instead

23:25

of copying, it really pays off, especially if you have web apps with some reload capability, so that's pretty nice, so you just have to change the code, and you don't have to rebuild the container. It really depends, but it may be useful for you. Now, another one is adding the

23:42

environmental variables, Python and buffer, so everything is printed to the stdout, it's all buffering, and then don't write bytecode one, if you don't want to generate You just have to add the .pyc files, which I think is not really needed in your Docker image. Okay. Now, this is my

24:03

example. I'm not going to go into details. You can download slides later where I'm using poetry, so it's a little bit complicated to build. If you're interested in that, you can check it out from the slides. So, let's do a summary. So, first and foremost, you have to select

24:21

your base image carefully. So, Alpine for smaller image size, slim stretch when you need faster builds. You have to take into account layer caching. So, combine different statements into one. If you want to delete something, you have to make sure you delete it in the same statement where

24:41

you added that. You have to order statements from the least to most changing to benefit from caching. And then, the last one, Docker multistage can help you avoid some complex removal procedures and benefit from the caching.

25:00

And, if you go down this path, I recommend using Python virtual environment. It's really nice in this case, even though I'm not really in favour of using virtual environment in Docker container. And that's all I have for you today. Thank you very much. Thank you, Dimitri. We have a few minutes.

25:30

Four minutes for questions. One, two, three. Okay. Hi. Thank you for this great talk. I have a very simple question. Have you evaluated

25:42

using other base images, like clear Linux or me in depth? Because they both have GDPC and maybe much smaller than stretch slim. Thank you for the question. I haven't. Now that you mentioned, I probably should check it out.

26:00

Yeah, you should check it out. Thank you. If you write unit tests, how do you run them? Okay. That's an amazing question. It really depends. In my case, I built a development Docker container for that, and I run that. So it's not the same as my production container. So, yeah, I just have a separate

26:24

container to run that. And I include my development dependencies there, and I run them there. Thanks. Thank you. Any more questions? Then I'll go ahead and ask

26:41

one question. Do you think if we, in the build stage, if you, instead of like installing a virtual event and combining in the second stage, you would actually build the wheels and then use those wheels to install in the second stage. I think, do you think that will like lower the size a lot, or have

27:05

you tried using this technique instead of building the virtual length? I haven't tried it, even though it was one of the suggestions and one of the things that I wanted to explore. I don't have the data to confirm it, but I don't really see any benefit from doing that, but it is just my

27:22

personal opinion. So because when we saw here by adding a virtual environment, I just added five Macs to the container. It was acceptable for my case. Okay. Thank you for your talk. And just for our curiosity, how does your local development

27:42

environment look like? What kind of tools do you use, and do you use Docker when you are developing? So I, well, my local development is my Mac, and I use Docker. Sometimes I use Docker, sometimes I don't. So it really

28:01

depends how complex the application is. If there are a lot of outside of Python things, you know, for example, if it is web app, you know, some database, your front-end and stuff like that, then I do use Docker to make sure that everything is working, and I would like, you know,

28:23

to see and touch the result. If I don't really, you know, if it is only Python and nothing else, then I usually don't use Docker locally, but most of the time I do. And adding on to that,

28:40

I do use tools like Poetry to manage dependencies. And I think that is pretty much it. We have time for one more question. Also concerning testing, you said you have another environment for local development you also use for testing, but what about integration tests? Do you run them, like, in that same

29:02

container, or is it, like, I would be a bit afraid of running that in a completely different container than production? So there are different approaches for stuff. I do actually do it in a separate container, but that's just me. I don't have, well, according to my requirements,

29:21

that's okay. However, I do understand your concern that, yeah, it may make sense to, if you have your integration test, to run it on your production container. Thank you. Unfortunately, we don't have time for any more questions. You can find Dimitri around the conference. Thank you.

29:40

Thank you.