Cloud Native Python in Kubernetes

Video thumbnail (Frame 0) Video thumbnail (Frame 1440) Video thumbnail (Frame 2037) Video thumbnail (Frame 3061) Video thumbnail (Frame 17773) Video thumbnail (Frame 31820) Video thumbnail (Frame 45673)
Video in TIB AV-Portal: Cloud Native Python in Kubernetes

Formal Metadata

Cloud Native Python in Kubernetes
Title of Series
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date

Content Metadata

Subject Area
Cloud Native Python in Kubernetes [EuroPython 2017 - Talk - 2017-07-13 - PyCharm Room] [Rimini, Italy] Serverside applications are more and more likely to need to run in dynamic cloud environments where they can automatically scale as required. One rightfully popular approach is to run the application as a Docker container inside a Kubernetes cluster, giving you a lot of operational benefits thanks to the Kubernetes folks. For the most part it is rather easy to make your Python application work inside a Docker container. But there are a number of common patterns one can follow to save time by delegating more things to the runtime environment. Furthermore you can start adding a few simple non-intrusive features to your application which will help improve the application live-cycle in the cluster, ensuring smooth hand-over when migrating the container to different nodes or scaling it up or down. This talk will quickly cover the basics of Kubernetes and will then start from a simple program and will discuss the steps to take to make it behave well in this environment. Starting with the basics steps you can rely on the runtime for, covering logging and all the way to supporting the service life-cycle, health checking and monitoring in a Kubernetes environment. You will see that building a cloud-native application is not very hard and something you can gradually introduce
Medical imaging Group action Process (computing) Open source Forest Multiplication sign Projective plane Website Right angle
Service (economics) Server (computing) Video game Integrated development environment Login Content (media) Mereology Neuroinformatik Point cloud
Slide rule Message passing Service (economics) Bit Quicksort
Complex (psychology) Group action Beta function Code Plotter System administrator Multiplication sign Workstation <Musikinstrument> Execution unit File format Numbering scheme Set (mathematics) Client (computing) Function (mathematics) Mereology IP address Computer programming Software maintenance Software bug Peer-to-peer Network socket Core dump Videoconferencing Position operator Exception handling Social class Moment (mathematics) Keyboard shortcut Metadata Bit Instance (computer science) Price index Complete metric space Connected space Category of being Message passing Thermodynamisches System Process (computing) Buffer solution Lastteilung Quicksort Cycle (graph theory) Writing Router (computing) Windows Registry Metre Slide rule Dataflow Functional (mathematics) Overhead (computing) Service (economics) Boilerplate (text) Computer-generated imagery Virtual machine Gene cluster Student's t-test Event horizon Architecture Crash (computing) Thermodynamisches System Authorization Operating system Selectivity (electronic) Computer architecture Multiplication Graph (mathematics) Validity (statistics) Server (computing) Boilerplate (text) Mathematical analysis Cartesian coordinate system Template (C++) Loop (music) Kernel (computing) Event horizon Error message Software Integrated development environment Logic Personal digital assistant Network socket
Concurrency (computer science) Programmable read-only memory Execution unit Client (computing) Computer programming Software maintenance Neuroinformatik Data model Medical imaging Computer configuration Core dump Exception handling Compact space Block (periodic table) Structural load Keyboard shortcut Interior (topology) Sampling (statistics) Bit Hecke operator Instance (computer science) Lattice (order) Radical (chemistry) Message passing Process (computing) Interrupt <Informatik> Quicksort Router (computing) Point (geometry) Socket-Schnittstelle Tournament (medieval) Letterpress printing Student's t-test Event horizon Product (business) Number Architecture Propagator Term (mathematics) Energy level Selectivity (electronic) Communications protocol Computer architecture Scale (map) Default (computer science) Standard deviation Scaling (geometry) Information Login Line (geometry) Cartesian coordinate system Loop (music) Error message Personal digital assistant Statement (computer science) Exception handling Library (computing) Pulse (signal processing) Structural load Code State of matter Multiplication sign View (database) Process modeling 1 (number) Function (mathematics) Mereology Total S.A. IP address Peer-to-peer Radical (chemistry) Network socket Cuboid Process (computing) Position operator Scripting language Service (economics) Concurrency (computer science) Closed set Metadata Cloud computing Term (mathematics) Variable (mathematics) Connected space Type theory Computer configuration Normal (geometry) Quantum Lastteilung Right angle Metric system Bounded variation Resultant Row (database) Reverse engineering Frame problem Finitismus Game controller Server (computing) Service (economics) Virtual machine Login Distance 2 (number) Thermodynamisches System Root Operator (mathematics) Software testing Default (computer science) Dependent and independent variables Polygon mesh Projective plane Template (C++) Number Event horizon Intrusion detection system Network socket Speech synthesis Object (grammar) Service-oriented architecture Communications protocol
and good afternoon so yeah I'm I'm forests have been a python develop before and quite a while now and I have in in my spare time and quite often I'm can you can of 1 image properly transcend personal style better but it so it so that you have been involved in Python for quite a while and and I work on open source as well as a kind of use year-round justice and to try and I have a very recently changed jobs and and no uh site reliability engineer Google and while is a Google originally came after Google's is not actually a group project and Google Talk sorry and this is like largely based on my expense my previous job right and I was looking at
Mike services in Python that we run on community and a little note about the title so uh communities uh part of the native computing Foundation CNC and that's kind of rare that the title kind of came from a planet of Python and OK so and will be
and this is kind of the conference that will at the covering so I'll start with like a very really really brief introduction about communities that probably this shortest introduction can have that I have that and hopefully
that should be if you're not familiar with that should be enough to kind for the rest of the of the slides women and men are kind of introduce a little a little example and it just to kind of traditional active service so it inevitably send a message gets a message back and then we will take a kind of take an example throughout throughout the rest of the talk to sort of start modifying little bits
so interested communities like this is kind of community in 1 slide uh which is uh acquired a tricky thing to do I guess but giving
is the idea of communities is that it's a uh the welcome the cluster orchestrator really so the idea is that you give it a bunch of machines and if a crater cluster out of it and then when you want to run your application you just say run my application somewhere in this cluster and and 2 betas itself will decide whether right place in the clusters and this to for station to run and and and the kind of data that aim that you're aiming for instance the reason we we want to build a an architect a system like this uh especially with like multiple microservices and have multiple uh instances of every as service that it tries to make you really resilient applications and if 1 of the machines in the cluster is unhealthy or something you can just take it down you can fix it replaces and your application just keeps running so if something precious you know request would just be written somewhere else and you create really resilient and and always up kind of application that's the that's the end goal of what trying to do and we're trying to run in in communities and so the core concepts so the core concept of Cuban each cycle you want to you want your education Indication out case is just going to be at a price in the patient so and the uh and the 2 meters runs kind of containers essentially so you need to containerized urea application trying out it basically means uh docking In the future please don't they also like a rocket and and the like so you you great contain out of your application I'll skip over that part today today and and could is runs this inside of appalled support this kind of the smallest unit that can be used will run for you essentially it's just another up before you container and you don't really have to worry why they decided to create boat that it's kind of get the idea is to like treat multiple competing can potentially put multiple containers inside on port and treated as 1 unit why you to do that doesn't really matter for today and the so called you don't uh tell communities please support for me which is the application of activities will kind of just find a server in its cluster and start to run it for you however problem pods is that they're kind of a female so they can if the body gets killed for for some reason you know and administrators over machine gets taken down or something like that then you plot is gone and that's that's so that's a far cry from our resilient applications and the next concept that given its introduces kind of this idea of a replica set and a replica set is essentially will will kind of continuously and look at what's running inside of communities and the idea is that in Europe that said you say I want to run this many instances of my application so um and whenever whenever the replica set sees that these in that your application isn't running that many times it will try and and make sure that it happened so if there's not enough of a instance of your of your application that will create more on that too many it it will kill a few as that means that you know machine gets taken down there that set that's this and will create a new book instead is so this is done so this starts to like create your application to be always there and especially fees and request multiple instances the problem that is of course again these these sports they just run on a random machine somewhere in a cluster you don't have anything about them you don't know how to contact them so that's why the concept of this and the service comes in service essentially some sort of fixed IP address is the easiest way to think of inside a cluster and the and if you if you need to contact your your you can contact if I had a service and the service was essentially load balance between all the instances of the boat and it would make sure if you send traffic to service it will go to 1 of the instances and of the application that's running somewhere in class and these are reunited the the basic things about communities and that's kind of enough to follow and and I am and I'm gonna select everything else is another missiles and there's lots of all those layers and they don't actually work recommend use replica sets directly at moment answer this property lots more going on than but that's the core concept and that's a core concept that allows you to create resilient applications and students should be enough for today hopefully the so this is my little example application all all all work with to progress and it's based in active service so on on the network and have implemented using Zernike you because and anytime you want to create uh you think creating a TCP socket and you want to send receive data just you should really use your MQ instead because it takes care of a lot of the nitty-gritty networking details and you get like whole messages ultimately delivered to Europe patient enough to worry about everything else would not this is fairly standard and I should point out that all the code I show is kind of slide vessel I use global so I don't show all the imports that you shouldn't write application ideas of an as a standard application basically that the main presented is like a great mind so could that I want to listen on an I. Dan and uses polar uh thing which is basically a few don't and nobody speed program inside select as so essentially it's asking your operating system like that and can I sleep until that the next message is available and then when the next message available that kind of comments like an event and then I I basically past the the service socket handier who service could uh to to the event handler for that event handler will then basically received the message and send it back to me as so that's all this is basically an infinite loop which you can into a receiving and sending messages as they are the few little helper functions that to to make it fit on the slide and and it's a great combined like really not in just some plumbing uh Sucre sockets buying it so that we actually can receive connections and the registry also into that polar began using global to try this is not much good and and then there is anywhere but actually by about happens and so it's received the message uh again there's a little bit of CMQ bookkeeping to surpass that of the and the ways you MQ pursued appear addressed and and and a lot lot the message and then send send it back to you to In graph ascended to us so that's kind of a very simple application and the 1st thing to to kind of and notices that that's actually kind of sufficient so we can just take application outputs like you know if mainly because that name was maintained in there and etc. and we can just containerized that and Enron analysis as in an hour to this cluster now will just work and the so the 1st thing to sort of um rely on when you're when you're in communities of just rely on the fact that you are running in environment you don't have to put complexity in your application and it allows you to really have a very simple kind of logical flow which is kind of what we just saw was simple as a straightforward internal architecture and that's kind of true for larger business value you can sort of as descriptor of the boilerplate it's a trap if and and the 1st thing you that like I didn't write any exception and was enacted and show its it's if it's on a slide that that's actually generally acquired through uh true you can and you don't really have to worry about exceptions because your application runs in multiple instances and if you do get something unexpected like other no maybe the bind that could fail so soccer because whatever goes wrong on machine light I don't really mind like this that my process will die and could be so just make sure that new 1 gets created somewhere else instead and it will probably work so and the yes it's it's so you don't have to worry about that best especially you can even go as far as kind of doing that for and when you're receiving data so when you're receiving a request from from someone other service if you give these all the services complete internal to itself so you'd actions for the team or whatever is the is the author of the service then essentially country that failed request the but as a sorry and invalid kind of repair scheme of of the requests kind of as a bug and you Commissioner crash again if you doing this for external applications so if you actually receiving user requests and then that is probably a bit uh to the United uh or uh to brittle so in that case you probably do you want to catch exception for for request validation because there is a little varies depending on application in this case not very much but there is some overhead in starting up here application again and then the other thing is kind of happens when you just crash in any should take into account this that if you have never connection as the receiving messages from from from clients alike those never occur connections they will they of buffers in them so that basically there will be requests queued up already on on in your local process searing cumecs that really obvious because there is explicitly Q would video socket but even if you were brought TCP there will always be an and there will always be an internal kernel buffers that a kernel may even have accepted new requests already dead and just you them up even their position has no idea at that there might be
an data stand still on the wire just coming over and if you just crashed you kind of lose that data so
I would this request and whoever created as sequester lieutenant kind of waiting and time out and we try and as so that that's that's not very good and the yes if you want to take that kind of into account and in and that kind of brings you to the mother how do I organize my messages to try and you can go and stop playing if you really don't want to suffer from that you can stop playing with uh message broker brokers like rabbit and you all the different systems like uh Kafka etc. but the comments you know trade offs expenses dewetting I would say like in in in here is basically an be aware of it when you crash you may use requests and make sure that that's OK in the system you design so in this light and actually so shows how you actually would create your Europe this is the recommended way data 1 it's great Lewis and you pulse um this deployment is essentially just a around is replica set things and the reason they have it is because it creates and updating applications like easier but as far as from our point of view that the 2 important things here is this line that says replicas of 3 and that means we we want be question 3 applicants so community will always ensure that the 3 of us running and all the really important thing is the very last line restore policy always and that that line will basically and that has to be it is that you know if a if a crashing just start again please and so that's kind of the 1st 2nd thing aside the you get the script had no concurrency and what it was a very simple example this is generally true you can you can rely on you you can keep your internal code really simple because the idea is to scale violet process model that's some sort of 5 you've ever heard of 12 traps a kind of mythology this kind of and the idea of like he just create more he scale horizontally by creating more instances of propagation as we just saw that's kind of you and you're ready dates and which is great more gas and they will handle the traffic and that means that internally we can have really easy debugging etc. because our control just gets really simple and don't worry about any of the uh of of of the other stuff and yeah and basically the servers services load balancer in this case so it uh forward so this is kind of this service definition that we have 1 T again 1 got to look out for it with the sort of low but the service and the crater uh which advances the traffic between apologies is if you protocol that using uses longstanding connections at which again like here and you uh American sample points out very well because you're in cuprates longstanding connections to connect that tries to reuse that connection from lots of question responses so Our her service will accept lots of an echo request and from from from the client 2 month election so this means that essentially we don't get there a load balancing so 1 client will only be connected to 1 of the airports L 1 of our applications sentient and a and and so so that's no very low bands like and while on the other hand if you if you if you use image p then you know you get if in the Chesapeake connection gets created for each request and that would just be to automatically the the trick to you to use various activities actually allows you to see the layer below services and it has and points out the the thing objects is called an points there and the and and that actually allows you to see which which endpoints and basically uh IP addresses that are part of your service and and that using that information is there we could that our application to sort of create a Cuban easier pair to ask and 1 . selection what connected you conductor answer and you connect all these points the downside is that you have to kind of were critically the tech I need to be aware constantly aware that this can change to an point disappears you thousand you disconnect from that point to drop and but that's is generally you know at so if they need to be aware of that when when you have long and protocols that use longstanding connections basically and next on uh took the dialog and so you may have seen in and then I very 1st example and cringed at this print statement and 2 by default adopt doc a kind of take standard output of few container has special log data and and communities will again take the log data from from the the containers and the make available to to use and generally the idea is that uh at operations time so of hold that up to a low of aggregation of some sort and my Elastic Search of India's and Abbott using simple print statements is this is not very nice and in general there if you want to be able to control a lot levels of that command line level again the so the fertile factor at kind of thing and this set so you you really want to use as modern libraries and the the uh yeah and so loving libraries a sort of a common uh there's quite a few variations and they'll all kind of try to wrap a horrible amount of global states into a nice API and you that the state is always quite probable so they all kind of Our ability in 1 way or another and acquire like lot but that is not I like kind of like the way it tried tries to handle the globe state but is not inherently better about global tendencies and then the standard library logging if like library looking and 1 nice thing about low because they can use to normal curly braces formatting into new style formatting instead of % vomiting from a star library looking but the main thing here is to notice that once you start like using the loading library you can actually start that you you can look up the log library to instead of printing out to standard out which kind of the 1st thing you would probably do because if you don't have all the infrastructure and but you can look it up to send send the lock records directly to the aggregator and this allows you to if you have traced back to something you can send them as as a same single big block so the next thing that you really really uh really want to do this and start using other libraries and wrap humaine main application in into a and into basically this local uh this exception loading the holding libraries will support so so a variation of this but the idea is that by doing this to make sure that any unhandled exception is always of size in earlier will will be and captured by the Lord in library and this and this 1 single log records and back to the in back to your login and related and so next lowest kind of an this concept that communities called health points and the idea here is that as the central idea is that when you when you application starts up and to be has said it you know stop this container and as soon as that contains kind of running from from from process point of view it's available so the service object that created for for your application will start sending traffic to you but there is a finite amount of time that the applications running and you haven't open you Socrates and not listening for connections yet and in this case obviously a very small but if you have a lot of set up and teams have to do before you start accepting connections from clients that they might be bigger as well the problem is is ever point could be to send traffic to your to your time application then you basically saying refused connection and all this kind type of services and so you don't want to do that and what can does is it introduces this air readiness probes um and the idea is basically that um to beaches like after starting application it will wait until the probe succeeds at and once approach succeeds only then it will start sending traffic to your application this kind of how configured out in in in the computers level compression and and in in our case so the several different approach you can use in our case and is just a simple echo service so all we care about really from the probes point of view is that the socket is open and listening so this TCP socket broke and essentially says like as soon as you have um at that as soon as the socket great accepts connections and send the traffic at be literally just trying connected socket and as a connection close and say yeah OK I'll send the traffic as that means they're still kind of tiny delay in which we might actually have found also could not yet presses not getting made presses meshes but because we know the buffering on and the killing in and in our sockets and take care of this that's basically perfectly fine that be a amount of time that some request get you up to the start serving them very shortly and if using if few
services actually using HTS and transport it's has kind has built-in support for that and there is this convention fusion this health z root and the and basically this is really nice in a way because you can basically tell your uh that direct this probe kind of completely in line with the TI or can be if you if you if you great it's as such to be completely in line with you normal request processing so if that actually returns and so basically means only look for the 200 OK on that and so as soon as evidence to indicate it was start a traffic uh and because you can do it is completely line you actually have to quite high assurance that yes my my position is fully running and and can sorts of traffic and I so the same concept of off kind of a readiness uh also uh the same concept also also happens uh full for during the pods lifetime when your applications running a especially when you're when you're running application a large scale lots of request sometimes things will just go wrong and and 1 of 1 of the in 1 of the many instances book start misbehaving this might be because of external things like this and other container on that box that's even doting should be isolated and not always that's isolated as you would imagine and light things go terribly wrong or someone decided to run uh back up on that machine and things go really slow also things happened right theory and and the book so so that the the the idea of lightness probes is that you don't want to be sending requests to to uh an application that slow to respond just completely stuck or something like that it does this is an in essentially very very similar way so and it just has likeness birds sing and this shows kind of the 3rd type of of program you can use a reverse IT speech could uh and an to be gets and here I decided to use the exact quantum exact is kind of the most but um but it's also it's also the most flexible so the problem is like that TCP 1 that we use for readiness and was very suitable various I'm not very useful for and aliveness bird and what we really want this this kind of the same as this and as you can use with the age to be gets like we want to know in line like its distance of things working correctly to trick you do there is this exact commands and basically says it gives you the command line that you want to execute inside of you container so you have to provide an extra binary inside a container which will ideally like to tell how healthy or not healthy and you that all this and then ideally you you you want to aim for this inline checking so and that's exactly 1 of don't ever have other than of extra script and in in application in adjusting the k and so is a standard that how simple the script and the again numbers we have a very simple application that and literally just great socket connected to to get the public endpoint in this case because I'm running on the same container it's actually local host but we still make a TCP connections to distill gives very high kind of I idea of what the public's and uh behave result of our patient and you send a message and the wait for like 5 ms and if you haven't got the message back at that time at that then then we fail and communities will start taking you put out of and so the basic stop sending draft uploading tended to delta ones instead and you may notice I'm not actually even receiving my my response and check hey there is a response and in this case I think that is sufficient and so it depends on on the protocol you you have 4 so the 4 them yeah messaging use sometimes you might want to add slightly more and often it's nice to build in some tonight at home what the heck uh the health center kind of rooted at each image to be convention and you just something that you know it is working and um yes just until you know it is working uh and have defined as city of minus the next summer's like a termination um and again this is like very similar if you thought about how we you start up time we don't really want to wouldn't want to refuse any connections at termination time we we don't want to drop any and connections that exist as of so when you quest to the currently active dude up on our process if you just say 0 you can ask if we don't do anything basically occupational just I received termination signal just died and all the that cued up eventually be lost and all clients will be a meeting and so that's a very good so instead we should handle that termination signal which is done in communities like it's always been done in in unit systems I you basically get wise yeah viable core of them but he gets 6 terms and you and and the signal 100 that we created is just very simple internal sockets it's essentially I think of it as a pipe now we want to do is like send a signal I concerning a single byte and to my main loop and then I mean you know knows it has to shut down and and yeah you can show them while trying to handle the connections and I'm I'm using the same signal handling of about 6 tournaments against student is the is the signal that you get when you press Ctrl-C normally when you're writing on maternal so in Python that normally gets ultimately translated to keyboard interrupt and exception and 2 because you want your application gonna behave in the same way when you're running and what where when when you're running on the test right it's usually best practice to just find better signal handlers the modifications for for my main maybe a little bit more messy now but essentially it's not that much I just again creates instead of created by and creating just 1 soccer team I mean you buy create 2 sockets as I have had socket I added to the polar and when I now receive them important in years when his when I received this new and uh the single by basically I don't even care what all that bike this by the way again I just know all I received a message from my termination soccer so each of them and here I am I 1st unbind so and this this means I and will stop receiving a new connections the M and N are basically key processing while there are still events in use at person and have also set a timeout election to like 5 seconds from things may be fairly high the idea here is that some some request might actually already beyond the why and I want to give this a chance to to be processed further they're no longer than any messages you might you might that that the while loop will finish and return and the so the um so the last thing that I would like to add is and kind of and monitoring so Prometheus is that is a similar on cloud computing Foundation project actually uh which is a kind of why I mention it and the ideas like to start uh you always want to know kind of what's going on and produces kind of and uh it offers you this this option of doing white box monitoring and NYC compact counters and you cannot metrics to inside your application in the start start and and rich new metrics collection basically Springer just looks in a very E and in an in a pool-based fashions so you have central instruction it will go around to all the other services and it will do next year question and get back to you you metrics basically and it's a little bit like an SNP SNP reached the few room and I'm guessing people metal and but at least we learn that you know monitoring data is probably even more important than production and the traffic so we actually use a reliable transport instead of unity and that giving Prometheus don't see a very difficult and a big project but the idea shared really wants shows just how easy it is to get started with it and and you can just add variables and says that last year there was talk about from him neck in that Europe Python and which goes in a lot more detail about meetings so this is kind of a yet recap like daddy is basically none of this is actually stricter required to be able to run on communities you can add up this gradually assets need and the and yes to keep your architecture simple I think about when you lose requests and then you don't want to go blind basically so always you always want to have some information and monitoring and thank you very much I think I'm about out of time uh so forcefully a problem in questions but he defined inside of like be