End-to-end monitoring with the Prometheus Operator

Video in TIB AV-Portal: End-to-end monitoring with the Prometheus Operator

Formal Metadata

End-to-end monitoring with the Prometheus Operator
Title of Series
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Kubernetes is a powerful system to build and operate a modern cloud-native infrastructure. Monitoring with Prometheus ensures that Kubernetes stays healthy. Prometheus is a stateful application, so operating it in a cloud native environment can be a challenging task. The Prometheus Operator makes running highly available Prometheus clusters, and even an entire end to end monitoring pipeline, easily manageable. Max will explain the functionality of the Prometheus Operator and describe a desirable end-to-end monitoring stack, including alerts and dashboards.
Keywords Monitoring

Related Material

The following resource is accompanying material for the video
Video is cited by the following resource
Mathematics Computer animation Thermodynamisches System Operator (mathematics) Bit Right angle Social class
Overlay-Netz Addition Enterprise architecture Group action Distribution (mathematics) Open source Software developer Source code Bit Storage area network Product (business) Word Computer animation Hypermedia Automation Right angle
Addition Graph (mathematics) Service (economics) Channel capacity Open source Multiplication sign Moment (mathematics) Projective plane Mathematical analysis Planning Survival analysis Bit Database Two-dimensional space Disk read-and-write head Mereology Twitter Number Product (business) Computer animation Right angle Whiteboard
Point (geometry) Addition Default (computer science) Functional (mathematics) Server (computing) Service (economics) Key (cryptography) Block (periodic table) Artificial neural network Database Cartesian coordinate system 2 (number) Data model Computer animation Thermodynamisches System Natural number Chain Configuration space Metric system
Point (geometry) Standard deviation Server (computing) File format Database Client (computing) Number Formal language Data model Computer animation Right angle Metric system Arithmetic progression Library (computing)
Wiki Service (economics) Touchscreen Computer animation Bit rate View (database) Graph (mathematics) Instance (computer science) Error message Proxy server Number
Touchscreen Computer animation Thermodynamisches System State of matter INTEGRAL Graph (mathematics)
Predictability Standard deviation Building Computer animation HD DVD File system Data storage device
Web page Addition Group action INTEGRAL Connectivity (graph theory) HD DVD High availability Frequency Data management Word Computer animation Thermodynamisches System Software testing Data structure Routing
Computer animation Divisor Logic Execution unit Core dump Cartesian coordinate system Spacetime Product (business)
Point (geometry) Addition Open source Software developer Connectivity (graph theory) Cloud computing Mereology Cartesian coordinate system Product (business) T-symmetry Computer animation Right angle Metric system Computing platform Metropolitan area network
Server (computing) Service (economics) Computer animation INTEGRAL Cartesian coordinate system
Direct numerical simulation Computer animation Thermodynamisches System INTEGRAL Term (mathematics)
Latent heat Computer animation Term (mathematics) Code Auditory masking Operator (mathematics) Blog Video game Bit Database Cartesian coordinate system
Scripting language Server (computing) Demo (music) State of matter Multiplication sign Connectivity (graph theory) Projective plane Mereology Rule of inference Entire function Data management Word Computer animation Operator (mathematics) Statement (computer science) Configuration space Right angle Metric system
Point (geometry) Kernel (computing) Computer animation Information overload Virtual machine Metric system Information security
Server (computing) Computer animation Open source State of matter Cube Cartesian coordinate system Metric system
Scripting language Message passing Computer animation Thermodynamisches System
Computer animation Spherical cap Thermodynamisches System Metric system 2 (number)
Web page Data management Computer animation Open source Code Operator (mathematics) Projective plane Automation Right angle Entire function
Web page Computer animation Software developer Student's t-test Systems engineering
Scripting language Mathematics Computer file Code Operator (mathematics) Multiplication sign Configuration space Right angle Software framework
Computer animation Operator (mathematics) Multiplication sign Descriptive statistics
Theory of relativity Computer animation Computer file Wrapper (data mining) Operator (mathematics) Electronic mailing list Data storage device Configuration space Cartesian coordinate system Social class
Point (geometry) Server (computing) Service (economics) Digital electronics Open source Computer file 1 (number) Set (mathematics) Primitive (album) Event horizon Neuroinformatik Subset Medical imaging Thermodynamisches System Different (Kate Ryan album) Computer configuration Operator (mathematics) Energy level Standard deviation Matching (graph theory) Scaling (geometry) Mapping Namespace Projective plane Cartesian coordinate system Entire function Data management Process (computing) Computer animation Vector space Cube Mathematical singularity Statement (computer science) Configuration space Normal (geometry) Right angle Figurate number Quicksort
Computer animation
right engine monitoring with the Prometheus operator as introduced so all cover Prometheus as a monitoring system will dive a little bit into how to monitor cabinets classes this is like seen as an introduction to monitoring and how you can do that in a very agile is our world very vivid that infrastructure right on my maths
uh we don't I think an entire our teachers that's a very long and had talks a lot so please feel free to ask questions during the talk if you have any when none of the groups so we can go live it into detail and stuff you're actually interested in and in addition if you have any questions afterward so stick around a little bit I got a talk right after this about command it's itself so if you still like if you're a little bit intrigued after this in 1 of the more by command itself I think it's just in a different group you can check it out in your schedule and other than that and reachable over social media so feel free to read out and e-mail finds all of course OK and why is somebody from
Chorus staying here standing here and why is he giving this talk and so chorus that's a company in San Francisco have based intendment is going in New York and land and the idea is to secure a simplified and automate container infrastructure that the that the bunch of buzz words and and know that's not the descriptive and not very helpful so let's go live it into what chorus does and then we can go over as so we have to Enterprise products tectonic and great I don't think this is the scope of this conference so we'll skip all the closed source stuff from now and in and then got container Linux as a Linux distribution very it's and it's open source entirely all these products a rocket as a container engine flannel as an overlay network where communities as based on and then it's city where it's pretty much the brain of command nowadays and in addition to all on products were also heavily invested into Prometheus and cabinet us so we're involved in upstream development and um that these out products for it now
OK moving on from there and in addition as we are reinvested into permissions itself to a love of the lecture part of the Committee's team just joined as well and so were based on the database to the open source upstream project as well OK so we wanna talk about monitoring today and I will 1st of all it's question the idea of monitoring why in the world would you need monitoring especially at the beginning if you and not start up and you definitely don't care about monitoring and all but 1 somebody tweets that you that you services that you might hear a little bit more and that's through the number 1 thing about monitoring is the alerting part at right you wanna be woken up in the nite if your services down and you want to be woken up and the I'd better if right before you services stand out during their service being done and then there's a 2nd dimension to monitoring LP long-term trend analysis if you doing capacity planning for example I if you're trying to survive Black Friday in America and you for everyone to bipolar servers and run the spread before and instead of and being out of out of capacity pretty much current and just to get a little graph would can audience were after accuse maintaining a production service at the moment there were do applicable and this taking care of the monitoring set up as this company or this OK cool OK who's using Prometheus the are to the aggravate me you do the talking head are it was for
me it's an MIT is an open-source monitoring tool and it and it started off at SoundCloud s or let's say most of the people that started Prometheus started was we're working in times of and is heavily inspired the board mark you don't really need to know what port monitors but you might know what baucus August the internal container orchestrator at Google you can compare to current as I go into detail on a 2nd time and so was built it sounds out and the idea is amount people leaving the will and they're missing the tools they have at Google and now they build it uh that
look for it in the open and can't find it so they develop it and that's how Prometheus started 11 more technical and committees believes in the Church of so there's the 2 ideas pull push based monitoring and I wanna start a huge discussion here and Prometheus based that means you're monitoring system goes to use services and checks if they are right not your services and health checks to you monitoring system the next step and the and a monitoring system could be seen as just a database and a database with Peru monitoring and analyze so what Prometheus his followers as a multidimensional data model done by a key value pairs so labels called and Prometheus and we can analyze the data that you scrape very nicely and in addition Prometheus is all about metrics it's not about logging it's none of our tracing I'll go into detail why it's not about logging that's tracing simply doesn't have that functionality so you cannot trace you packages through infrastructure with Prometheus of IEEE and last where this might be interesting or might be important for a monitoring system from if is not about magic I'm not going to talk about artificial intelligence and not going to talk about block chains today refuses no magic involved it's it's configuration that you have to apply the right way
OK and basics about Prometheus as a work I already told you it's a pull-based monitoring system so I few targets whatever you want to scrape on out of just I just came from from con actually that was the last couple of days and people were telling about how they monitor everything in the apartment and they send alerts whenever they need to be out there apartment who due to tumidity so you can pretty much do anything this permeate focused on the server infrastructure and so targets could you be applications could be operating systems whatever what not an immediacy expose such metrics and point that's just nature and the end point and and that's can then be described by Prometheus every now and then but I think the default 15 or 30 seconds you cannot configure it of course so comes along and scraped data and is that such a
slash metrics endpoint look like an icy have like a little comment and then you 1st have to metric name that's the exposition standard exposition formant for Prometheus then you have labels that Indian makes the multidimensional data model and then in the end you got the value itself from and this is right now trying to be standardized with the open metric standard you can have a look at that and I think there's not seeing in progress right so how would this look like it's very easy exposition format you can develop it yourself you can also use the client libraries that Prometheus actually offers while at a request comes in you increase your number and when the next request comes in you increase your number again right nothing but increasing of magic why is it not about logging here as I said earlier a logging you Perry 1 and and phi every single request and here me might come up I just get snapshots of that data here and doesn't know about every single 1 right so it's not about logging is really just about metrics of it so
Prometheus grapes that data and has all that stated in its database and so on while you wanna do something with the data and that's why there's from your Prometheus through language just point on the permits server and then and you centuries against it and then you can carry data with you might be asking why another Curie language why are we not simply using a skill and there's more to it and for me here is and from jealous just about answering monitoring questions and once you've written this stuff and bronchial you really don't want to go back to scale the how would this look like
and see if the current percentage of each should be errors across all by my service instances so for example that will be 1 of the curious so we some bypass that would just be the end point of the API and rate over the HTTP request a little filter-based status for hundreds and in the last 5 minutes and divide it by the same thing except not filtering by pockets made so really easy and that's how we get the data and now we have the percentages of uh coronary error of the views of our API now numbers are great we want dashboards we want fancy big screens in our and more rooms of
monitoring so I you can extend this with the Wiki I can and have nice graphs and it also of course
integrates whisker final and other tools graphing tools we can have unknown million screens and you wall and have fancy graphs of it right and now having those fancy graphs is nice but differed on 1 to set an alarm everyone our it during the nights look at your graphs everything's going right so you want to make is to tell you automatically when something is not going the way it should go so that's the idea of the definitions so you tell Prometheus what is a bad state of my system and then
Prometheus so would go over over its data um every on what the standard but it's just an interval um it would go over the Hitler definitions see if any of the data returns to true on that those are the definitions and then sends out of the
this blew my mind at the beginning and that's do with prompt you always it quick linear prediction if our disks are running full in the next 4 hours so there will be just linear predict build file system that get in this 1 hour and see what is happening in the next 4 hours and have that is below 0 you probably want we walk up and Perry 1 a provision more storage the OK so you get the standard
definitions to Prometheus goes over it stayed at every now and then and end and sends out alerts but it actually doesn't
send out alerts right away to page Egidio e-mails slack but actually there something in between and will be the matter so another component L go into detail while that that is there the people that optimally using Prometheus who's using the Atlanta a period 2 OK all then OK hopefully more after this and the the idea of a DI and alert managers and running Prometheus H. so and high availability is just running to Prometheus next to each other this great the same died they analyze the same data thereby they sent to alerts out but you don't want to be alerted twice about everything so what the demanded does it d duplicates and in addition it works so for example if you start to cluster goes down and you have no word for every disk and that start closer you don't want to get an alert for every disk so a groups it and then you have just 100 for you and test articles and then last if you start cluster goes down you don't wanna wake up the front and engineers but instead you only want to wake up the and stories people so you tend alert Manager how you company structure looks like and then the manager knows where to rout which that data but debate around a
group of and then from the lock manager we just send it on to beta duty or whatever you're integration looks like I think the ATIS system is really huge and Prometheus so that's pretty much everything here hard coal uh we cover the basic monitoring idea now and I think that's the wrong with
me and there any questions so far think pretty basic so far of it right so let's divide the monitoring space into 2 different things from now on NLP application monitoring cluster monitor education is yet to business logic and custom monitoring is pretty much anything underneath with infrastructure so as diverse inter-cluster
monitoring and and I don't wanna talk to tell you anything about infrastructure that I have no clue about so I'm going to talk about 2 minutes that's like our core product chorus some pretty familiar with the so how would we monitor units of 1st of all what
this commanders who here has ever heard of minutes 0 0 it's a pretty much right OK cool would good high driven developers uh it's and this using the man is cool cool OK for this using production right you hand is going up every time reversal could buy a common as a platform for running containerized applications come to my 2nd talk you want to learn a lot more and it was non a 2014 Michael worldwide who will has a lot of experience running containers xt lot of the technologies that enable us to data running Linux Containers is contributed by the will to the next girl and they open source the ideas in earnings in 2014 the community picked it up and they develop that in the open together and in 2015 and was released at 1 . 0 and given to the sea the the cloud computing foundation and so it's not part of will anymore but it is actually now part of scientific but who Google is still high uh heavily invested in it the from it so how would as look like and
you have a master node and you have a bunch of components and surprise surprise and they all exposed metrics they all exposes such metrics and point so that is really really nice you up and running you just have to point you Prometheus against that and that's it In addition worker nodes and again at all the components all the cabinets components exposed metrics by fold and thereby and you're good to go as well so we covered that the cluster monitoring here Annex of would
be the application monitor so the idea is you
have a bunch of applications you probably replicate them as well you group them and can surpass service and then other committees and now where to go from here all the ideas from make it really doesn't know where he applications like so I would Prometheus has its service discovery integrations like for example is command this so that are spherical in as API Server where do all the applications live and then they can scrape the applications and you get to go but all you think monitoring and a dozen on the
committee is very versatile you can use and promote an environment are you can set your target statically I you can use DNS of course but any kind of other users discover integrations OK
so I I make this seem pretty easy
but like if you would send me in front of a computer and say we set this up a pre would still need a couple days to really make that's very sturdy especially in terms of monitoring you want this very well done and now we're all running others especially we're running Carreño spread so why don't we share some ideas and their earnings and I tend to this altogether as we're all running the same thing
underneath so and here I want introduce the Prometheus operator sorry for the little bit long introduction and so the idea
is behind an operator and at this term was crooked coined by a chorus
there's really nothing special behind it and uh we have a bunch of application specific operational knowledge at for example a very 1 here in the room really knows how to operate a mask database and for example we are very we very much know how to renew Prometheus in a command that is rampant and in so writing a bunch of blog posts all over uh why don't we just put that into code instead of giving you all the blog posts why don't we give a that code as an operator you deploy that operate inside the cluster and you just talk to an operator make everything your life a lot easier the so we have
done that with Prometheus and the Prometheus operator and the idea is that it automatically managers and upgrades and new Prometheus angular manager and now we can natively configure Prometheus and alert Manager out via the Prometheus operator and that's how it would look like if you you you might be familiar with deployment moles just a configuration files right instead of writing just upon the animals you would stand right Prometheus camels and Prometheus operator would understand those and automatic deploy everything in your cluster that is needed and then we even went 1 step further as we are not saying chorus but we as a community entirely that it would not have been possible without it and we have a single command and that's just a script that brings up the entire cluster monitoring Seattle vanilla command cluster you run that script and we set up a lot matter Prometheus we set of alerting rules because you probably all want to be alerted when you had the I-Server goes down and and we set up dashboard it for you and that's that's what I would do I presented in a little demo that are there any questions so far In a Russian way too fast through my time here at the know-how to entertain you for an entire hour varied a the and so on
actor lowered RHE delivered at a 7 minute you cluster your running I hope you can all to that is OK the size it OK and so that's it is to watch on all my containers uh all my parts here has to with this way um and you just see all the parts they're running there and what I do now is I can render a Prometheus project and just wanna show you so it's just a single command is just a script and the and and that will deploy all the component that you need so 1st of all it was start the Prometheus operator that's the operational knowledge that we all gathered and repugnant to code out once that is up and running will start up for companies itself will start of an allotment of most of you will start up that that wording prefinal was that you statement tricks that's actually to really analyze tremendous cluster so state metrics talks to the API server and delegates all metrics
and what else node exported to look what's going on on your machines actually so at any and it's kind of difficult to get a slash metrics HEP and point in the Linux kernel and I think this overload would probably not yet that emerged so what we just use little that's the node exporter you run it on you know and exposes the metrics aren't there I would like to wait for it to to get uh
created but let's have a look at the security working gives yeah you have the
Prometheus and is always open source is not known for amazing from dense but they're only for amazing and technology and underneath so as to see here we have the targets that were scraping for example was scraping aligment itself we're scraping the API server with scraping the cube state metrics that talked about we scraped the cupola we scraped node exporters and the Prometheus itself so you have your cluster monitoring yourself uh done and you can just extended for your application monitoring and from here uh let's go
to the fancy dashboards for another lot of data so for but we bring up finer for you and we can figure everything and you just have to buy the monitors and to put it on the basically the end of last as a
set up from the system send out alerts itself but that's actually done by like Mandarin
our yes and that is the alert message I and you go up and running there as well and if you don't like it you can
dig it out in the same way as well just as script or it will see what else I got
here so as to root quickly
cap and so was for me is all
about and tuple based monitoring system believes in the Church of people it's a multidimensional and you can create a data very nicely it's metrics it's not about long it's not about tracing and especially it's no magic and you want it in and you don't manage magic to happen at that time and
sprinted targets every 15 seconds um it since those you give it alert
definitions and since then so it's out the
manager i we went 1 step further we took all of all knowledge and put it into code and gave it to the open source community as an operator um and the open source community reacted even further and do it 1 step further and automated entire foster monitoring of right where can you go from there 0 1st of all Prometheus IO and that's the main project page has proved very interesting for you and then that Prometheus operator a repository everything I showed today is completely open source all free for you to use um OK and if you really wanna get involved
course were hiring and we're hiring and 7 disco the the land and none of year students for worlds are infringing chips where right now firing for permits upstream developers and automation engineers so we're very much looking for new employs here of to read to ask me on the canal Korea's pages in OK and that's the
hope of you like that and hope you all setting of Prometheus now right any and questions so there the 1 that has
and on the right of again I am supposed to repeat every question sorry about that as so what CA-DB attended from his operator and how do I start new Prometheus suspect that is and how the maybe I just show you some code um so the idea behind command and this is all it is that a big framework right haven't you can extend it as well and that's done readers third-party resources TPR arts these just in the end it's says config files that you deploy inside of the as cluster and we permissions operator we listen to any of those changes and whenever that config file exists so it changes or whatever we create a Prometheus for you and what i in the end was the script it is that's have a look at that time so that
go here and they're going to cluster mind only so I it don't manifests if I go and Prometheus for example and I have the Prometheus OK this yellow
and that's just in time description so it's a third-party resources and others call Prometheus and that's the only thing that you do create a new cluster and then the comedians operator will pick this up so you don't have to actually
have to configure and storage you don't have to configure which alert mentors to talk to which targets to scrape and on but instead you just give us this and then we know how to best operate with Prometheus inside common that's closer doesn't want to say that the list of some of the people the and it's not a deployment instruction what
we Indian do under the hood is we pick up that configuration file and we create the deployment for you you know the the of the world that's yellow color rapid and it does 11 more so it does the configuration around it as well that I'm sorry I didn't know it is a wrapper around India at any any Prometheus related questions and things more yes please the the the and not at all as so indeed the question is should be at means operated just before class a monitoring and we we do offer you full set of cluster monitoring but it's also for application monitoring so indeed Prometheus operator Rico you also have full therefore of
uh so there's there's Q Prometheus and I think it's and so manifests you have the examples folder and there you find and this 1 you find a full example with with an application as so we deploy little application for you and then you just can't figure primitives and what you do instead of the and set reconfiguring the entire environment pervious operations team you wanna set up Prometheus and then just want to give you develop a teen away how you they can monitor their stuff so they just have to create a TPR called service monitor and configure that to their pots and that's all they do and we pick up that service monitor and that the promised operator configures primitives accordingly so a bunch of automation in can and just guess it's from 1 shoulders of er and there's nothing more and sorry but taking the entire our but don't before they grew in this talk will dive a lot deeper and I'll probably filled out and yet this sort of thing you know you do with the level of your have does the deployment usually and they be on all of it as so the idea is that the question is how do I get to how do I make sure my deployments gets gets credit red and and that is done via service monitors maybe at can even um
so 1st of all let's look at this so we created we registered to increase and there can you read that there and so now we registered 3 third-party resources that will be the manager Prometheus and service monitor at the managers to create new Lagrange's by describing the cluster Prometheus to describe is intervals monitor is just I want the services 3 monitoring and now if they go in so this monitor the our cobalt namespaces now you see and for example we have as i showed earlier from his brilliant scraping everything in the cluster so we created a service monitor for the other major cluster with greater servers monitor for the cube API server and that's what you would have to do in the and so you just have to create a service monitor for your friend and application and then at this point if you front it I can I can show you some of the African candidate if you wanted others for example the acute statement experience interesting behind that that a the the the they don't they will be the the config file and you can see it just after that the labels which which actors great and as and when is everything works of labels and that's that's all you have pretty much have to do other jobs label which match and the selector which up to Jews and then to permitted promises operator picks it up from there and this circuit and when of course you scale of your applications will automatically configure promises to scrape all of those applications right various this that you do on on all that event and so um Prometheus can be configured for a pretty much a hot and so all below this but thank you very much the had studied configure all you Prometheus singular vector and and so on um that is configured and config maps of course and that Prometheus can be remiss used for all the different cluster orchestrators but we wanna give you a subset here right we want to give you specialize wouldn't so you don't have all the options that you have a normal Prometheus G 0 where we have the option that we think that they're the ones you actually need and I can do the same thing here for Baghdad to get Prometheus uh across all namespaces to get that um and then for example this at each of tuition and monitoring the so now this is that you're you're Prometheus configuration you only play around in this and I for example note here the manager you configure Prometheus to go to that of the manager I you specify how many replicas you specify the resource this that you want and you specify uh which service monitors you wanna pick up to you might have like in on uh a couple of thousand of applications and you don't want 1 Burmese operator to scrape all these red so you want to tell me please don't only scraped those that are important for me kind of data retention I think you can also configure it in here so all the flacks all the standard lexicon for sure agrarian if there's anything missing as it's an open source project out the sphere please feel free to raise issues I think we have a very vivid community around us yes please don't worry about what might be about the 1st of for logging and again do I recommend a solution for long has now there are 8 and so the very big planet computing foundation image has the entire set that you need for you and for your for Europe infrastructure spilled so would I I can't make any good suggestions but the gaps in CF and you'll find all projects that that's the biggest thing all the projects of quot interconnected anything else than now close here and just kind of come down here if you wanna ask further questions can answer them endurance well
and dangerous