Advanced, free, open-source application performance monitoring for your Python web apps - TIB AV-Portal

Advanced, free, open-source application performance monitoring for your Python web apps

00:00

13

DjangoCon Europe

Formale Metadaten

Titel

Advanced, free, open-source application performance monitoring for your Python web apps

Serientitel

DjangoCon Europe 2019

Anzahl der Teile

32

Autor

Lizenz

CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/45419 (DOI)

Herausgeber

DjangoCon Europe

Erscheinungsjahr

Sprache

Inhaltliche Metadaten

Fachgebiet

Genre

Abstract

In this talk and technical demo we’re going to walk through a minimal example Django web app, simulate traffic of varied intensity, and monitor the entire system as well as peek deeper in the app. It is 3am. Your phone rings with a special tone - the systems propping up your employer’s or client’s website are down. You open your eyes and groan, fumbling for your laptop. You’re used to the system’s worst problems, you have an idea what it might be. It’s slow going while you manually check if every component is working. You go back to bed in an hour, having written up a few of the most important details and brought the website back up. Hooray! However, at 6am you get another call. You sigh and fix it. Then another at 7am, and at 11am. A few days later the outages repeat. What is going on?! There are types of problems that can’t be solved purely through experience and knowledge of a system. You need deeper monitoring data than “% CPU” and “RAM free” to resolve them. Worse still, there are multiple components involved with failures in one masking a true underlying cause in another, like a slow loading page masking a slow database, which in turn makes you wonder what queries it’s really answering. In this talk we’re going to have a look at how you can monitor small to medium projects to really understand deeper problems at a glance. The talk will pick specific solutions from various vendors as well as open source for different parts of the problem.

Sprache

Text

Bild

00:00

App <Programm>Kopenhagener DeutungQuellcodeElastische DeformationBitProjektive EbeneVerschlingungÄußere Algebra eines ModulsKartesische KoordinatenOpen SourceSondierungKontextbezogenes SystemTOESuchmaschineQuellcodeComputeranimation

01:27

InformationElektronische PublikationServerPhysikalisches SystemCodeBefehlsprozessorDatenloggerGeradePhysikalisches SystemSpeicherabzugQuick-SortFlächeninhaltSystemaufrufServerNichtlinearer OperatorCASE <Informatik>Strategisches SpielGebundener ZustandPunktKartesische KoordinatenClientWeb SiteMini-DiscEvoluteEinsInformationUniformer RaumCOMProjektive EbeneWasserdampftafelBitrateURLProgram SlicingRechter WinkelDienst <Informatik>Design by ContractComputeranimation

05:47

FokalpunktATMNichtlineares ZuordnungsproblemSoftwareKomplex <Algebra>RechenschieberQuick-SortDienst <Informatik>MAPService providerAutorisierungArbeit <Physik>Open SourceMultiplikationsoperatorMinkowski-MetrikRechter WinkelProgramm/QuellcodeComputeranimation

06:20

FokalpunktZeitrichtungOpen SourceFokalpunktHyperbelverfahrenFitnessfunktionMultiplikationsoperatorComputeranimation

07:05

Keller <Informatik>Elastische DeformationCodeGeradeMetrisches SystemTransaktionZahlenbereichSchwebungKlassische PhysikLoginElastische DeformationKeller <Informatik>Web SiteComputeranimation

07:45

CodeDatenbankInformationZeitreihenanalyseMAPBitMetrisches SystemProjektive EbeneTransaktionVisualisierungQuick-SortElastische DeformationServerSchwebungInformationsspeicherungKartesische KoordinatenOpen Sourcesinc-FunktionKeller <Informatik>Rechter WinkelNetzbetriebssystemPhysikalisches SystemVersionsverwaltungCASE <Informatik>UnrundheitGoogolBitrateAbgeschlossene MengeTLSComputeranimation

10:23

QuellcodeSkalierbarkeitMetrisches SystemMultiplikationDatenmodellAbfragespracheE-MailFormale SpracheVideokonferenzMetrisches SystemProjektive EbeneQuick-SortServerÄußere Algebra eines ModulsPunktEndliche ModelltheorieLeistung <Physik>AbfrageRechter WinkelComputeranimation

11:58

HistogrammEichtheorieQuellcodeSkalierbarkeitMetrisches SystemMultiplikationDatenmodellAbfragespracheTypentheorieVideokonferenzDifferenteObjekt <Kategorie>Metrisches SystemRechenwerkSystemverwaltungWeb SiteZweiComputeranimation

13:00

Elektronische PublikationStreaming <Kommunikationstechnik>DezimalbruchKartesische KoordinatenCodeInformationRoutingMultiplikationsoperatorDatensatzPerfekte GruppeKontextbezogenes SystemComputeranimation

13:59

Elektronische PublikationStreaming <Kommunikationstechnik>DezimalbruchSoftwareMAPMetrisches SystemZahlenbereichService providerLoginDienst <Informatik>BenutzerfreundlichkeitBitProjektive EbeneBimodulHinterlegungsverfahren <Kryptologie>App <Programm>SchwebungKartesische KoordinatenMultiplikationsoperatorComputeranimation

15:28

MiddlewareGeradeServerRoutingPunktSchnittmengeComputeranimation

16:07

Gebäude <Mathematik>Exogene VariableProzess <Informatik>Abstimmung <Frequenz>RechenschieberService providerWeb SiteDienst <Informatik>Computeranimation

16:45

Web logDefaultElastische DeformationKontextbezogenes SystemGanze FunktionInklusion <Mathematik>QuellcodeMereologieElektronischer FingerabdruckMetrisches SystemWiderspruchsfreiheitServerElastische DeformationSoftwareService providerApp <Programm>InformationProjektive EbeneSchwebungKugelkappeRechter WinkelComputeranimationProgramm/Quellcode

17:27

BenutzeroberflächeMereologieElektronischer FingerabdruckWeb logDefaultWiderspruchsfreiheitElastische DeformationKontextbezogenes SystemGanze FunktionInklusion <Mathematik>QuellcodeMetrisches SystemServerCodeIntegralElastische DeformationGewicht <Ausgleichsrechnung>Service providerLoginSaaS <Software>SoftwareentwicklerBenutzeroberflächeBitGradientWeb SiteMultiplikationsoperatorFreewareComputeranimation

18:44

Spannweite <Stochastik>Produkt <Mathematik>MenütechnikMagnettrommelspeicherKartesische KoordinatenIntegralOrdnung <Mathematik>Produkt <Mathematik>App <Programm>SprachsyntheseComputeranimation

19:17

Computeranimation

19:51

FehlermeldungNormierter RaumDean-ZahlDatenverwaltungComputeranimation

20:24

StichprobeGanze FunktionVerschlingungTouchscreenRechter WinkelComputeranimation

21:10

StichprobeTransaktionHMS <Fertigung>BildschirmfensterE-MailUmwandlungsenthalpieAdressraumTouchscreenNetzbetriebssystemSaaS <Software>InformationTransaktionExogene VariableProgrammbibliothekSynchronisierungGreen-FunktionMereologieElastische DeformationAbfrageCASE <Informatik>Fortsetzung <Mathematik>EinfügungsdämpfungDemo <Programm>AblaufverfolgungComputeranimation

22:32

StichprobeBitCASE <Informatik>Kartesische KoordinatenForcingWeb SiteKeller <Informatik>Computeranimation

23:13

KonfigurationsraumServerKartesische KoordinatenElektronische PublikationLoginKeller <Informatik>Patch <Software>StandardabweichungComputeranimation

24:06

IndexberechnungMusterspracheAbfrageVisualisierungComputeranimation

25:00

DatenverwaltungProgrammbibliothekProzess <Informatik>MenütechnikKonvexe HülleVakuumDemo <Programm>Computeranimation

25:40

Spannweite <Stochastik>AbfrageMAPPromilleWeb SiteSpeicherabzugZweiDemo <Programm>Computeranimation

26:37

DivisionIndexberechnungMusterspracheDatensatzDatenstrukturMaßerweiterungSchwebungElektronische PublikationLoginMultiplikationsoperatorInformationOrdnung <Mathematik>Automatische IndexierungDatenfeldComputeranimation

27:22

Spannweite <Stochastik>AbfrageDatensatzInformationSchaltnetzMAPBitMetrisches SystemVisualisierungCASE <Informatik>FehlermeldungDatenfeldKartesische KoordinatenElektronische PublikationSichtenkonzeptLoginMessage-PassingTabelleInstantiierungWeb-ApplikationComputeranimation

31:04

MereologieElektronischer FingerabdruckWeb logDefaultWiderspruchsfreiheitElastische DeformationKontextbezogenes SystemGanze FunktionInklusion <Mathematik>QuellcodeMetrisches SystemServerMagnettrommelspeicherDatenverwaltungElastische DeformationComputerarchitekturSoftwareMereologieReelle ZahlServerCASE <Informatik>Zusammenhängender GraphPunktWeb-SeiteInstallation <Informatik>ClientMultiplikationsoperatorAblaufverfolgungDemo <Programm>App <Programm>StellenringWeb SiteDienst <Informatik>Computeranimation

33:42

BenutzeroberflächeMehrrechnersystemProzess <Informatik>Wort <Informatik>GraphfärbungEndliche ModelltheorieDienst <Informatik>ATMCodeInformationSoftwareLogischer SchlussBitMultiplikationElastische DeformationSoftwarewartungKonfigurationsraumServerDruckverlaufExistenzsatzService providerAppletOpen SourceDifferenteCloud ComputingApp <Programm>ProgrammierumgebungProjektive EbeneWasserdampftafelLesen <Datenverarbeitung>StandardabweichungComputeranimation

36:16

Web logDefaultWiderspruchsfreiheitElastische DeformationKontextbezogenes SystemGanze FunktionInklusion <Mathematik>QuellcodeMereologieElektronischer FingerabdruckMetrisches SystemServerBitProjektive EbeneTermQuick-SortElastische DeformationZusammenhängender GraphDruckverlaufGradientOpen SourceDifferenteKollaboration <Informatik>Keller <Informatik>FrequenzGrenzschichtablösungMetrisches SystemUltraviolett-PhotoelektronenspektroskopieAutomatische DifferentiationSchnittmengeArithmetische FolgeBitrateMinkowski-MetrikComputeranimation

38:42

RückkopplungInformationLie-GruppeSoftwareMAPKonfiguration <Informatik>Green-FunktionGruppenoperationLastMetrisches SystemProjektive EbeneTermQuick-SortElastische DeformationServerAutomatische HandlungsplanungInternetworkingCASE <Informatik>Zusammenhängender GraphMetropolitan area networkUltraviolett-PhotoelektronenspektroskopieNummernsystemSchnittmengeOffene MengeDezimalzahlOpen SourceWeb SiteDifferenteMultiplikationsoperatorOverhead <Kommunikationstechnik>Rechter WinkelDemo <Programm>Figurierte ZahlTwitter <Softwareplattform>SoftwareentwicklerSpeicherabzugSoftwarewartungGüte der AnpassungVektorpotenzialKlasse <Mathematik>PunktwolkeKollaboration <Informatik>Keller <Informatik>BenutzerbeteiligungComputeranimation

Transkript: Englisch(automatisch erzeugt)

00:02

Hello, hello, welcome to my talk on application performance monitoring. So today we'll do a little bit of a dive into kind of the history and context of APM and why we want it, why it matters to us at least,

00:25

and why it maybe matters to other people. And we're going to look at some alternatives for actually using APM and integrating it into your applications. So my name is Emmanuel, Emmanuel Tollef.

00:42

I'm a community engineer for Elastic, the makers of Elasticsearch, Kibana, and so on, monitoring tooling that you have likely heard of, and also the open source search engine. So I do work for a vendor. So Elastic does make a whole APM solution as well as

01:04

various monitoring solutions. But I have tried to touch on other projects and other things and kind of the wider context of this. But I will later show you some stuff in Kibana and in APM. However, rest assured, all of it is completely free to use, including

01:21

in the commercial context. So I'm not here to sell you anything. So let's get started with kind of what APM is. How many of you have an idea of what I'm talking about when I say application performance monitoring?

01:41

OK, that's reasonable. There's still some who don't. But that's the point of these talks, right? So APM, very quickly, then APM at its core is search. It's a solution to a cognitive problem. When you want to know what your system is really

02:02

doing specifically at the edge, what your application is really doing, so not simply monitoring the percentage of CPU or the percentage of RAM or the disk IO that's being used, but really getting in there and looking perhaps at individual lines of code, bottlenecks in your application, what is slow, what is breaking errors, et cetera.

02:23

So that's application performance monitoring. And sort of a short story about why it matters to me and kind of how I got interested in this whole area as, of course, like probably many of you, I have also been on call for nearly five years.

02:41

That was the kind of pretty hardcore. It was the beginning of my career actually in computing. So straight out of uni, and I became a freelancer. I joined a small hipster collective of freelancers, this agency. And soon enough, I was in charge somehow

03:00

of determining the operational strategy for clients, for a couple of smaller clients and later for some bigger ones. And that's where my initial enthusiasm started to grow out of bounds and to hit a snag, because what happened was that we would

03:23

build all of these features. And then in the middle of the night, I would get a page, and I would have to get up and fix the website that had fallen down. Unfortunately, I was on a 24-7 contract, and I was the only person on call for a few years on that project.

03:44

But at some point, you do this. I was like, what, at 22, 23? So you do this, and you don't really mind it. It's for the project, for the client. But then a few years ago, what happened

04:00

was I got woken up at 3, and then again at 6, and then again at 7, same day, and then at 11, and then I spent the next week trying to desperately make this website work. And what was happening was we were getting accessed, I think, legitimately from China, but it was just not a use case that we had predicted,

04:24

and it got really out of hand. And that's where I got really interested in, how do I prevent this from happening to me or other people ever again? And really understanding these setups.

04:45

Is that okay? Can you hear me? Can you hear me? I'm just gonna hold this then. So yeah, so that's where I kinda got really interested

05:03

into how to solve this problem. And so as far as I could tell, I mean, at least in my career, the evolution was I went from looking at log files and HTOP for the first couple of years, when I was helping out small companies,

05:20

to then trying to collect, fuck it, to then trying to collect server information, so federated data collection, there we go, and then finally APM.

05:40

So we moved past the servers, we moved off the servers, off the virtual machines, and into the code itself, which was kinda mind-blowing for me. So this is a little slide from applicationperformance.com, and I came in towards the end of that, so we're about thereabouts here, when you start seeing the emergence of kind of more complex software service providers,

06:01

and we'll talk about what can they offer in a second. But we're sort of now obviously past this stage, so we're looking at what's the next step. So at the time when, at the end of that slide, towards the bottom right,

06:22

competition was very difficult in the APM space, it was very labor-intensive to build new solutions, including open source ones, there was extremely little, and what happened was, instead, competitors went for a narrow focus, and so this is actually a Copenhagen company.

06:43

So this is Upbeat, how many of you have heard of Upbeat? Ah, wow, geez, okay, right, I'm not gonna spend a lot of time here then. So it's specialized in JS and Python, including Django, and later on, in 2017, 2018, it got acquired by Elastic, the company I now work for,

07:05

and Elastic APM kind of fits within the wider infrastructure of the Elastic stack. So Kibana, the pink thing, is used to visualize the data that you put into Elasticsearch, and everything else is used to put data into Elasticsearch.

07:23

Beats collect metrics, so numbers with labels. Logstash puts logs in there, so text, and APM puts application-level metrics, like the particular lines of code and transactions that might be slow. So, let's see if the wifi works.

07:44

So the landscape has evolved significantly since then. Yes, excellent. And this is an excellent website, it's called openapm.io, and I love it. It really showcases how crazy things have gotten. So there's the Elastic stuff,

08:02

so if you click that, it doesn't really interoperate with much else, so it's just the Elastic stack. And you can use Grafana also to visualize it, and you can use certain things to do alerting on it, but it's pretty much designed to work as all the pieces together,

08:21

and people sometimes add Elastic beats to it, which allows you to get other metrics, like your node metrics, so how are your actual servers performing underneath your containers and your application code. However, since we'll talk about it later, I wanna talk a little bit about a few other solutions

08:42

that you might use. So another possible one is Prometheus, and that's an open source metrics store and time series database, and the whole toolkit around that. It's very cool. It's based on Google Borgmon, or it was based on Google Borgmon.

09:02

Now it's on a very mature project. They released version two. So what you do with Prometheus is you have, with all of these things, you have an exporter, so you have an agent, something that hooks deeply within your application and extracts all the useful information out there,

09:22

like what transactions are being processed. Then you have a collector, so where the agents send all that information, and that's the Prometheus server. And then you have storage. In this case, uniquely, it is also the Prometheus server. As I mentioned, it's a very capable time series database.

09:42

And to visualize it, so yeah, the Prometheus itself can do visualizations. They're sort of individual and kind of basic, so what people really tend to use with Prometheus is Grafana. And Grafana is a very good tool. It's very flexible, it allows you to embed visualizations

10:02

and it actually plays well with Elasticsearch as well. And so this is typically what happens here. So here you have, again, we're talking about monitoring your servers, so the operating system level metrics, orchestrator metrics, container metrics,

10:20

as well as your application metrics. All right. So Prometheus, so what sort of, because I guess the point of this is to, I mean, not really compare, because I can't really do that since I work for a company that makes this,

10:41

but to talk about the different possibilities here of what you can do, and maybe you walk away with an idea of what you can use to enhance your current stack, or when starting a new project. So Prometheus is a great alternative. You can, so the main thing with it

11:01

that really makes it stand out, I guess, is its pool data collection model. So with Prometheus, you don't run this risk of overloading your monitoring infrastructure, because you configure at the monitoring infrastructure how often it should pool,

11:21

and you configure, you monitor that for when it's overloaded. So there is very little chance it'll fall over. It's pretty stable. It's got a very powerful query language, and it sort of does all of this stuff in a single tool, so it collects metrics, and does the vis, and supports the querying. It's all in Prometheus server, which is pretty cool.

11:44

Let's see, what's the time, right. I have a video about this, actually, but I'm not gonna risk it, because there's also time, and so on. But I do have some extracts from there

12:00

to show you roughly what that looks like. And here is where we get into a more interesting topic. So this, if you're interested in using it, totally watch that video. It's an incredibly clean, clear talk. So this is how you generally use Prometheus. You define that they have counters, histograms,

12:23

gauges, like various types of objects. So in Python, you would define what you want to monitor, and this could be anything. So this could be application-level metrics, which we'll see in a second, or it could be business metrics. So it could be how many sales are we making,

12:41

as well as how slow the website is. It's like how happy are our users, how many units of stock have we shifted, how slow is the website specifically for our administrators, and so on. So it's very extremely flexible and powerful in what you can define.

13:01

But the other side of that coin is that you have to define that. So here, and what's happening here is we're using the definitions from the previous slide, so we're using the counters, the gauges, et cetera, to actually monitor something in our application with Prometheus, or to send it to Prometheus.

13:25

So up there, we can see the entire route essentially being monitored. That's like at request time dot time. And here, we see a particular piece of the code being monitored, and it's the context, so with analyze time. And so that stuff is going to record, obviously, how long that piece takes

13:41

and send it to Prometheus. So that's pretty cool, because at the same time, we could enhance that information in any way we want. Like we can tie essentially how long a particular piece of code takes to how much money we're making. So that's excellent, right? It's perfect. The thing about it is that

14:05

it's metrics that you have to define explicitly. And the other thing about it is that it's only metrics. So metrics in the sense of you have some number and you put some label to it. It's only that. So you don't get logs with it. You can't do log monitoring in Prometheus,

14:21

and you can't correlate with log monitoring. And so with this, you will need to do quite a lot of work to achieve the same level of sophistication and ease of setup as you would get from a commercial software as a service provider.

14:45

So that's a bit of an issue, and people have used this very successfully. And there are even community projects that help you get over that initial step. So for example, there is a Prometheus Django kind of module that you can use

15:00

that does a bunch of this work for you. It provides you with middleware, and it instruments your app. However, support and how often these are maintained and updated and how well they might fit your specific application, so how much customization you'll have to do varies quite a lot,

15:21

because that in particular is just a community project. So whoever's got time, it's got like two, 300 commits. This is how you actually install it to give you a quick idea. So you just got, you put a Django Prometheus if you're using that in installed apps, and you pop the middleware in there,

15:41

and this is gonna be very similar for what I'll show you later. And you also do its URLs, because it's a pool. What happens is it exposes an endpoint called slash metrics, so that's what this does. And from then on, the Prometheus server that you configure separately can pull from here.

16:01

So it's really easy, really simple to install. It's basically settings and URLs, couple of lines, and you're done. And that's what the built-in dashboard looks like. And this is what Grafana looks like with it. So it's pretty reasonable.

16:21

You know, you got your responses, response codes, and kind of breakdown of requests, and there is more, like there's the 95th percentile, 99th percentile, and so on, that you can do with this. So that's pretty good. Again, the only thing with it is that it doesn't do logging, and you may need

16:43

to do more customization with it, I think, at least. So here we get on to Elastic APM. So, so far, what we had from that slide, that from this slide, is you have the Software as a Service providers

17:01

down at the bottom right, where you literally just install something, you put in an agent into your app, and then it automagically sends all the information, and everything is cool. You go and log into the dashboards, and you understand everything about your application, or that's the idea, at least. Then we have Prometheus, which is very powerful,

17:21

very flexible, but you might need to do some more setup to fit it to your specific app. So that's where Elastic APM, for me, came in, and actually, one of the reasons I joined the company was because I was excited about this project, because what happened is they bought Upbeat,

17:41

and then with rather a lot of flair, they open-sourced almost all of the code of this commercial SaaS provider that had been in development for five years. What they didn't open-source, but is still free to use, is the user interface, which you will see shortly.

18:00

However, they did nonetheless provide, and for the first time, I guess, or at least as far as I'm aware, for the first time, provide an entirely free solution that's commercial-grade, has been developed for several years, and you can just use it. So the advantages of this is that

18:21

you have more consistent support, and you also get the logs in the same place, and I guess it's pace of development when it comes to specific integrations is probably greater, like it has .NET and Ruby and lots of stuff that might otherwise fall behind a little bit, or just not continue development at the same pace.

18:48

So let's have a look at some actual integrating. So this is the app that we're instrumenting. It's a relatively simple app. It has products, it has some orders,

19:00

and it has some customers. We're gonna look at the products page. Don't you love it when you lose your, lovely.

19:58

This is what happens when you, all right, nevermind.

20:26

Let's go with, so that's what APM looks like, and actually, so this is, because we're talking about APM, I'll start from here, but then I'll show you what the rest of the whole thing looks like,

20:43

so you kinda get an impression of what the entire monitoring solution actually is. All right, so this is not what I originally intended to show you, but unfortunately, I have put aside the link somewhere

21:03

to the actual demo, and I have closed the tab with it, so we're going to look at this one. So what this is, as you're probably familiar with this, this is a similar screen to what SaaS vendors will offer you. You have your transaction, you have the full details

21:21

about your transaction, like the particular request, the response, the operating system information, as well as being able to add custom tags. So here you can do stuff like customer tier, this is a very high value customer, or a user, you can have things like the specific email address or user ID for the user,

21:45

and then the interesting part, of course, is the actual tracing. So here in this case, this is an Elasticsearch request, so we can see the body of the search as well as the specific stack trace. So we can see all the way through the libraries,

22:03

and I do have a Django demo, but this is a flask one. You can see the entire stack down through the libraries, down through Green Unicorn, and you can see exactly which part is taking how much. It also supports salary, so it supports asynchronous workers

22:24

and they will just turn up as soon as you instrument it, and it supports SQL as well. So it will break out your SQL queries in the same way that we have this Elasticsearch bit here, it will break out the SQL too. In fact, let's have a quick look and let's see.

22:44

This is not Python, but if I can show you that, oh yeah, okay, so that one is a bit more interesting. So you can see what's going on here, and I assume a lot of you would have seen

23:00

similar stuff in your applications. So the thing that I like about this, besides that it's free, is that it combines performance monitoring with all the rest of the stack. One day on time, okay.

23:20

So we start from something like this. So this is just your regular unstructured logging. So that's basically file beat, which is a log shipper application, so that's sitting on every server or everywhere where it needs to sit, and it's just tailing specific files.

23:41

It has a lot of defaults, so it's pretty easy to configure, like you don't have to do a lot of configuration to tell it get these files and these files and these files. It tends to find Apache and Nginx and all of your standard applications. So this is searchable as well,

24:02

and that's Elasticsearch, that's pretty good at that. So then from unstructured logging, we can go to a more structured logging, I think, or we could, if it was a demo, if I'd prepared originally, at least.

24:30

So as you can see, it's pretty easy to set up a visualization in Kibana.

24:40

Discover, bless you.

25:21

I guess this is what happens when you lose your planned demo.

25:49

Okay, so the thing is this is definite, so this is the official, like the kind of core last search demo, that's a Kibana demo that's used by the entire company of over 1,000 people.

26:04

So I think what's happened is that something has actually happened to it. It's not just me, this is always full of data and it processes about, well, it processes thousands of requests per second. So there is definitely a problem here,

26:24

but very possibly one that I will not be able to fix on my own, on stage. So let's have a, let's just try that again.

26:58

So you generally simply select the file beat index

27:02

and then you tell it what field it should use in order to kind of separate the records in the index and to be able to get structured logs out of the unstructured information. And then ideally when it would discover,

27:20

oh, wait a second, it could be simply the, aha, right, okay. Yeah, so this should definitely be full of data, so something has gone wrong with our official demo, so there we go. Why not on stage, you know, that's totally normal.

27:42

Right, so that's the structured logging view, woo. Right, cheers. Thanks for the support. Okay, so the probably, yeah, something has happened

28:03

to our demo, I just got a message, excellent. Very, very, very timely. Right, so nevermind. So the cool thing about this is that it allows you to do a whole bunch of interesting things

28:20

with this information. So for example, we can see what we are running. So what the container name is, and you'll see in a second what this little table button is doing.

28:43

So this is pretty cool. So I would have really liked to have this when I was just looking at logs and you know, like trolling through log files. What this does is it takes one field out of the entire record and then it shows you the same field in the summary view.

29:00

So as the records are collapsed, you can kind of go through and see. So you know, this is the, the first thing is the name of the container. So this is the actual web application and you know, this is an NGINX instance. And these here are different pods. So in this case, this is a Kubernetes setup. Not what I was intending.

29:21

But so this is super cool because you know, you can do things like log levels. So you can easily scroll through and find all the warnings or errors. You can filter this. Additionally, you can sort by pulling out specific fields. So purely as like a log navigation tool,

29:41

it's very powerful. And yeah, it's pretty cool. So you get unstructured logging in logs. You get structured logging in here in Discover. And you get application metrics in APM.

30:03

You can do a lot of other things with this as well. So in visualize, in visualize you can define visualizations. And there should be about 300 here. So you can kind of build your own dashboards.

30:21

So here is the bit where you can combine business metrics with your application performance metrics. So for example, here is where you could plot like sales on the one axis and 95th percentile performance under one second

30:40

or however long it takes on the other axis. So which is pretty important, right? Because obviously that's how we relate what we do to the rest of the business that we are operating in.

31:03

So, so far you've seen Elastic APM. You've seen the rest of Kibana. So you've seen logging kind of combined with that. And you've also seen Prometheus

31:22

and how to integrate that. So for Elastic APM, the way you integrate it is pretty similar to Prometheus. In the interest of time I'm not going to try that demo just because, yeah, too much.

31:40

But what the, it's basically extremely similar to Prometheus. What you need to do is you run Elasticsearch, you run Kibana, you run the APM server, which are pretty much all Prometheus in the case of Prometheus and Grafana. So you run Elasticsearch, you run Kibana

32:02

and you run the APM server, and then you instrument your apps with the APM client or the APM agent, which is simply pip install Elastic APM. And extremely similar to Prometheus, you simply show them, you simply shove it into installed apps

32:23

and you install some middleware. And that's about it. The extra thing that this does is that I guess you can absolutely do it Prometheus. You just need to pull in more components and configure them more. The extra thing that this does is that

32:40

it also allows you to, in the same manner, instrument RUM, so real user monitoring. So this is, what I showed you so far is all on the server side, but there is also monitoring you can do on the client side. So you kind of know when the JavaScript has loaded and at what points it has loaded,

33:00

how long it took, how long certain parts of the page took to reload, and how long did it take for the user to get anywhere useful. So this is included as it is in most software as a service vendors. So additionally, the other feature that's worth mentioning,

33:24

that I would have liked to show you, but there we go, is distributed tracing. So if you have microservice architecture or you simply have an architecture that has enough components that they start calling each other in some manner,

33:40

APM supports this, actually. I wonder whether I still can show you a picture because it's worth a thousand words. It's like really basic mode here.

34:02

Yes, different colors are different services. So the blue is a Ruby service calling a Python service in green, which calls a Java service in purple, which calls a Go service in red.

34:21

And this is supported automatically. So there's a complicated bit of inference that the APM server does. When you instrument all four apps with the Elastic APM agent, and you just instruct them to send information to one server, it will infer

34:41

that they're connected together. That's a pretty cool piece, I think. And distributed tracing, I should say, is pretty standard nowadays. You would expect it from software as a service vendors. But hey, now you can get it for free. Now, I do keep saying for free, and it is,

35:02

but of course it's only for free in the sense that now you have to put in the effort to maintain it, and it's like the old problem of who monitors the monitoring and who watches the watchers. So as most, I guess most products, companies, if you're like a startup

35:21

or a small to medium sized project, you don't really wanna deal necessarily with the configuration and hosting of this stuff. So what you do is you go and get it somewhere, and there are multiple providers. Elastic sells this, but for Prometheus, there are multiple cloud providers as well.

35:42

So you can totally do this. And I expect, not sure, but I expect that the pricing of this will remain low because the actual underlying code is free to use, and the majority of it is open source. So that puts some pressure on my company

36:01

not to rush up the prices too much. Whereas obviously, in a proprietary environment, that pressure doesn't exist. You just get charged for the value that you get out of it. So yeah, so I think it's a pretty cool project.

36:24

It's a pretty advanced one, and I do really like that they went out there and they published something that was of this sort of commercial grade where you install it and stuff just happens automatically.

36:42

But I do still feel that there is a fair bit missing from this space, and we've been talking about logging, monitoring, and observability, and this sort of logging and metrics in particular

37:02

now for several years. And in terms of actually innovative thought in this space, I haven't seen it progress that much, although there are some exciting new projects that may progress this, so we'll see how that turns out.

37:21

But more importantly, what I would like to see a bit more of is more collaboration between different open source projects. So Elastic imports Prometheus data, and you can use Grafana with Elasticsearch.

37:40

But ultimately, I've seen the kinds of stacks people have in the wild, and they're pulled together from all sorts of components. You have Fluentd and Kafka for posting your messages, like it's just all sorts of stuff. And open source is supposed to solve this, right? It's supposed to not be this pain in the ass

38:03

where if you want some component, like Elastic APM, you now suddenly need to migrate to the whole Elastic stack, but you're currently using Prometheus. So I would like to see more collaboration between these things, and that can really only happen

38:22

if there is pressure for it, and if people actually collaborate on metrics. If you care about metrics, if you care about monitoring, and you maybe don't, that might, maybe you just want something to use that's easy and relatively low cost, who knows? But if you do, please write to me,

38:41

and also please tell me what your setups are, because the more data there is on what the setups are, and I do plan to republish this information when I get enough of it, the more data there is on what the setups are, then the more obvious it's going to be to potential open source maintainers

39:02

that there is demand for this thing. And I think that there is demand for this, because even in a small-sized project, you may start with one stack, but soon enough, even in a medium-sized project, it's a couple years old, you move to a jumble of different technologies, and initiatives like OpenMetrics,

39:21

which came out of Prometheus, kind of give you this base level of cooperation, and there's also the elastic common schema, which is also an open thing. But this is not quite there yet. I think we could do a lot more on open source collaboration.

39:41

So thank you very much for listening to me, and I'll just take some questions, if we have time. Awesome. We do have a little time for questions. Again, we can ask questions on DjangoCon QA on Twitter,

40:06

or you can do DjangoCon on IRC. It is always awesome. The last two talks are open source in the wild. It's so cool, so ask some cool questions about it. Go for it.

40:21

Okay, hey, thank you for the nice talk, and for your perseverance against adversity. I was wondering if the APM server had an API that could be created from the outside. For example, just random example, a Raspberry Pi with one of those green,

40:41

yellow, and red lights, and if my server load, for example, goes over five, the light on my desk turns orange because it's getting information from that APM server. Yeah, totally, yeah. So that's exactly how it works. I mean, what you saw there, Kibana is reading off an API.

41:03

Like, the APM server is entirely open source and an entirely separate component. So yeah, you can totally do that. Thank you. The internet may defy your demos, but the internet has questions as well. Can you compare the resource requirements

41:21

for APM and the ELK stack behind it to other options like Prometheus and Sentry? ELK has somewhat of a reputation of being resource hungry. Does that, how big a setup do you need to have to be able to log a small or medium-sized site? Yeah, sure, yeah, so that's true.

41:41

APM is a kind of very different project from the core Elasticsearch. I'm probably not gonna really go into saying, well, this is faster than the other because I do work for one of the companies, and I, you know, I mean, I'm a web developer, but I don't want to necessarily say it's faster because it also depends on the situation, right,

42:01

and like what load you put it in and what the particulars are. However, the APM team takes the benchmarking pretty seriously. I mean, that's one of the good things about it being an open source component that has the backing of a commercial company, right, because they can spend weeks benchmarking in very specific situations, and they do.

42:23

So as far as I know, I know they're very proud of like how small the percentage overhead is. So as far as I'm aware, it's in a good class of percentage overhead. So it should be pretty fast, and it's very different from the Elasticsearch

42:43

piece of software, which, yeah. You can, like, if you're a small project, what I would probably do personally, right, and this, you know, it involves giving money to Elastic, so take it with a grain of salt, obviously, but I would probably go and just go on Elastic Cloud

43:02

and the, like, cheapest tier that we offer, and that offers APM, and it's all fully set up, so you just send it stuff. You just instrument and send. So that's what I'd do personally.