We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Autoinstrumentation Adventures: enhancing Python apps with OpenTelemetry

00:00

Formal Metadata

Title
Autoinstrumentation Adventures: enhancing Python apps with OpenTelemetry
Title of Series
Number of Parts
131
Author
Contributors
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Hey there, fellow Python enthusiasts! Are you ready to dive into the exciting world of application observability without getting your hands too dirty with complex instrumentation? If that sounds like a journey you'd be interested in, then you're in for a treat! Observability is that magical window into the inner workings of our applications, allowing us to understand what's happening under the hood, troubleshoot issues, and ensure everything is running smoothly. However, achieving this level of insight can sometimes feel like a daunting task. That's where OpenTelemetry comes into play, simplifying the entire process and making it accessible to everyone, not just the observability wizards. In our session, we'll start with the basics: what OpenTelemetry is and the problems it aims to solve (and those it doesn't). We'll demystify the concept of instrumentation—the process of embedding observability into your applications—and show you how OpenTelemetry makes this not only possible but painless. The heart of our talk will be focused on autoinstrumentation, a magical feature of OpenTelemetry that automates the task of adding observability to your Python projects. Imagine being able to get detailed insights into your application's performance and behavior without having to manually instrument every nook and cranny. Sounds like a dream, right? And because we believe in learning by doing, we'll walk you through a small but mighty demo. You'll see firsthand how effortlessly you can implement OpenTelemetry in your own Python applications, turning the daunting into the doable.
Receiver operating characteristicMobile appPoint cloudExpert systemSoftware developerGoogolSoftwareLocal GroupSpring (hydrology)Computer programSoftware engineeringDifferent (Kate Ryan album)MereologyExpert systemRight angleComputer animation
Hand fanMetric systemBoss CorporationSystem callInformationRight angleMobile appComputer animation
Metric systemComponent-based software engineeringSource codeVisualization (computer graphics)Physical systemServer (computing)Vertex (graph theory)Server (computing)Metric systemCartesian coordinate systemOperating systemDatabaseInformationData managementComputing platformPhysical systemError messageLoginMathematical analysisScripting languageCentralizer and normalizerRun time (program lifecycle phase)Right angleSemiconductor memorySystem callSimilarity (geometry)Computer animation
BlogCartesian coordinate systemLoginRight angleComputer animationMeeting/Interview
First-order logicState observerCartesian coordinate systemRight angleCase moddingSource codeComputer animation
Cartesian coordinate systemLoginMetric systemRootFreewareCoefficient of determinationInteractive televisionRight angleComputer animation
Cartesian coordinate systemWeb pageCASE <Informatik>Service (economics)Right angleSingle-precision floating-point formatMetric systemDatabase transactionMereologyContext awarenessComputer animation
Rule of inferencePairwise comparisonFront and back endsSubsetProduct (business)Library catalogService (economics)Military operationStatisticsCloud computingOperator (mathematics)Multiplication signMetadataMetric systemWater vaporMereologyPoint (geometry)Data structureTracing (software)LoginCASE <Informatik>Parameter (computer programming)Finite differenceInformationInstance (computer science)WebsiteRight angleCartesian coordinate systemDistribution (mathematics)Different (Kate Ryan album)Computer animation
Physical systemState of matterCodeData managementCartesian coordinate systemMereologyState observerMetropolitan area networkFunction (mathematics)Computer animation
Metric systemSurvival analysisTracing (software)Metric systemRight angleState observerLoginComputer animationDiagram
Physical systemProjective planePoint cloudPhysical systemState observer1 (number)Right angleInstance (computer science)Computer animation
Open setKonferenz Europäischer StatistikerProjective planePoint (geometry)Library (computing)Visualization (computer graphics)Set (mathematics)Metric systemRight angleFile formatOperator (mathematics)StapeldateiData storage deviceInstance (computer science)Online helpBinary codeComputer animation
Digital filterStapeldateiAttribute grammarExtension (kinesiology)Web pageBefehlsprozessorGoogolService (economics)Multiplication signPoint (geometry)MereologyClient (computing)Connectivity (graph theory)File formatTransformation (genetics)Configuration spaceInstance (computer science)Right angleScaling (geometry)Cartesian coordinate systemMultiplication signDiallyl disulfideComputer animationProgram flowchart
Scripting languageJava appletBlogMetric systemFormal languageWeightErlang distributionBeta functionAlpha (investment)Demo (music)Dependent and independent variablesComputer reservations systemRootState observerCartesian coordinate systemDemo (music)Open setRight angleCodeSystem callDifferent (Kate Ryan album)State of matterLibrary (computing)Flash memoryLoginHypothesisInstance (computer science)CASE <Informatik>Ocean currentFormal languageFocus (optics)Tracing (software)WeightRoutingMetric systemComputer animation
Design of experimentsRothe-VerfahrenRepeating decimalInternet service providerTracing (software)Cartesian coordinate systemDifferent (Kate Ryan album)Instance (computer science)Open setIntegrated development environmentDatabaseRight angleComputer configurationVariable (mathematics)Formal languageComputer animation
Formal languageSoftware maintenanceData typeOscillationOpen sourceWebsiteStandard deviationVector spaceLoop (music)Stack (abstract data type)Revision controlExtension (kinesiology)Lattice (order)Shape (magazine)Information securityOperator (mathematics)ImplementationInstallation artRepository (publishing)Thermal expansionCodeMetric systemInformationFile formatFingerprintVideo game consoleElectronic mailing listVisual systemComputer fileView (database)BitInstance (computer science)Cartesian coordinate systemLibrary (computing)Tracing (software)Open setMetric systemSimilarity (geometry)CASE <Informatik>Simulation2 (number)Product (business)Process (computing)Configuration spacePhysical systemCodeService (economics)Complete metric spaceRight angleLoop (music)Matrix (mathematics)Real numberSet (mathematics)Computer programOperator (mathematics)Roundness (object)Task (computing)Flash memoryComputer animationSource code
Operator (mathematics)Installation artInformationDemo (music)TwitterMetric systemLocal ringImplementationExtension (kinesiology)Context-sensitive languageLogic gateArmBoss CorporationEmailCartesian coordinate systemMusical ensembleOpen setRouter (computing)Service (economics)Tracing (software)Computer animation
Visual systemCodeWebsiteGoogle ChromeGamma functionInstance (computer science)Tracing (software)Operator (mathematics)Object (grammar)TetraederOpen setProduct (business)Cartesian coordinate systemService (economics)BitComputer animation
Demo (music)Ordinary differential equationLink (knot theory)Google ChromeTime zoneExact sequenceObject-oriented analysis and designConnected spaceDirect numerical simulationComputer animation
WebsiteAddress spaceFirewall (computing)Proxy serverTwitterDemo (music)GEDCOMDean numberMobile appGreen's functionCodeVisual systemSystem of linear equationsView (database)Computer fileGoogle ChromeNP-hardComputer animation
Total S.A.GEDCOMGroup actionMiniDiscGoogle ChromeCodeVisual systemGame theoryView (database)Computer fileQuicksortGraph (mathematics)Demo (music)Graphical user interfaceWebsiteGoogolExecutive information systemPointer (computer programming)Instance (computer science)GeometryMobile appSystem on a chipSoftware development kitSoftware repositoryMultiplication signDemo (music)Musical ensembleSource codeComputer animation
Ext functorGoogle ChromeVisual systemPresentation of a groupMetric systemSuccessive over-relaxationElectric currentSupport vector machineDemo (music)Random matrixCone penetration testFreewareInstant MessagingFile formatClient (computing)Service (economics)Total S.A.BitCartesian coordinate systemService (economics)Zoom lensTracing (software)InformationCodeComputer animation
Query languageDemo (music)BefehlsprozessorContent delivery networkCartesian coordinate systemCodeRight angleFormal languageInformationDifferent (Kate Ryan album)Open setMetric systemTracing (software)Source codeComputer animation
Control flowPlane (geometry)View (database)Compilation albumTwitterTotal S.A.Service (economics)Demo (music)Electric currentWeb pageLocal ringDependent and independent variablesSmith chartWebsiteProxy serverFirewall (computing)Presentation of a groupDigital object identifierRevision controlClient (computing)CodeSpacetimeNetwork socketSystem callData typeInstance (computer science)Cohen's kappaHistogramServer (computing)Codierung <Programmierung>Raw image formatPeg solitaireFormal languageSoftware testingSuite (music)Absolute valueAbstract state machinesNumber theoryStatisticsSineNamespaceLattice (order)Metric systemGoogolVotingPolygonFirst-order logicBitOperator (mathematics)ImplementationExtension (kinesiology)FingerprintFile formatPairwise comparisonMatrix (mathematics)Point (geometry)Open setCartesian coordinate systemConnectivity (graph theory)Instance (computer science)CodePropagatorCodeEmailContext awarenessBinary codeRight angleWordMetric systemPoint cloud3 (number)Demo (music)Tracing (software)HistogramMultiplication signDatabase transactionService (economics)Polygon meshComputer animationSource code
Software testingTracing (software)Projective planeInstance (computer science)Latent heatLibrary (computing)Musical ensembleCartesian coordinate systemState observerAxiom of choiceElasticity (physics)Computer scienceComputer animation
GoogolPhysical systemPairwise comparisonArchitectureGraph (mathematics)ForceCartesian coordinate systemPoint (geometry)CodeMetric systemGraph (mathematics)View (database)Physical systemComputer animationLecture/Conference
Axiom of choiceSoftware developerState observerInstance (computer science)Point (geometry)Computer animation
Operations support systemDesign of experimentsMultiplication signPerfect groupHTTP cookieArtificial lifeQuicksortLecture/ConferenceMeeting/InterviewComputer animation
Axiom of choiceData acquisitionVideo GenieTracing (software)Open setDifferent (Kate Ryan album)Instance (computer science)BlogTetraederMultiplication signRight angleLibrary (computing)CASE <Informatik>Uniform resource locatorOverhead (computing)Cartesian coordinate systemWeb applicationRule of inferenceFront and back endsDecision theoryMobile appClient (computing)Server (computing)CodeProjective planeSystem callMereologyDemo (music)Presentation of a groupDesign by contractMetric systemLoginInformationSampling (statistics)Group actionRevision controlOperator (mathematics)Spring (hydrology)Web 2.0Common Language InfrastructureMathematicsComputer animationLecture/ConferenceMeeting/Interview
Transcript: English(auto-generated)
I'm Israel. I came from Spain, from Granada, a pretty nice and beautiful city in the south of Spain. I am a software engineer at Red Hat, working on the District Tracing team. I also am part of different community programs like Google Europe Expert.
I used to be also AWS Community Builder. Actually, the thing is that I don't know how I was not fired, because I do a lot of community, and I don't know how I am able to make my spring every three weeks, right, or something.
So the things that today we are going to talk about problems. Let's imagine that this is you. I hope that you are as happy as this guy. One day you receive a call, it's your boss saying, hey, everything is failing.
Can you take a look? Well, this is where the fun begins, right? Okay, we are not going to see too many blue during the talk. So one of the things that if you are lucky enough, right,
what you do is that you go to a dashboard, and well, you can see that there are too many apps, most of them failing, some of them are failing, but not too much. Others, you have no information. Also, you don't know how they are related between them, right?
So this is a dashboard where you are getting some alerts, right? And those alerts are generated from metrics. So again, you see that thing, and you start crying and say, okay, let's see what we can do, okay? So you are lucky enough, you have some metrics,
and well, actually what are metrics? Metrics is what we call before monitoring, right, the regular one, but now with a more fancy name. And yeah, well, the thing is that you can do things like you store a Prometheus server in your Infra, right, if you are running Kubernetes or whatever,
and well, you expose in your different applications one endpoint that is exposing metrics, and what these metrics? Metrics is just saying, hey, I am spending this amount of memory, or I am failing this amount of requests, or making properly these amounts of requests, something like that, right?
You can get information about, not just about your application itself, but about the runtime, the platform where it's running, like the operating system and other things, right? So you get all that information, you send it to a Prometheus, well, actually Prometheus will script that, right? From that, you can generate alerts,
generating, well, integrate with things like directly a Slack or have automated calls, something like that, right? And well, you know when something is not going well. Also, you can, well, just literally go in, right, and see nice dashboards, like the one that your manager loves to see because manager loves dashboards.
Okay, so you know that something is failing, so maybe you are also lucky enough, and you have an Infra for logs, so in a similar way, right, you can send logs from your application and also from the Infra and the different things that you are using, like databases or other stuff to a central location,
and with that, you can do some kind of analysis and also generate another dashboard, right? Like what are the errors that are happening more frequently in your system and things like that, right? If you are not lucky enough, you have to go application by application, asking for the logs, right?
And you get something like this because very likely you were not the one who called the thing, or maybe you were, but one year ago or something, right? Actually, the only thing that you can read is this in your mind while you are trying to read the logs. Okay. So, NCD enters the room.
So, well, how can we avoid this hell, right, about not knowing what's happening? Well, the problem is that we are wrong on the way of trying to monitor our applications, right? So this is the panel from an aircraft, from a modern aircraft, one from Airbus.
I don't remember exactly which one. Well, you can see there that they have a lot of dashboards that we don't know exactly what they mean, right, but very likely if something starts to be going red, we are in danger. And the thing is that instead of doing observability like this, right, we are doing it like in the older ways, right?
So let's imagine this is today our application, right, but we are doing observability like if our application were like this, okay? We are still just checking logs and maybe metrics, right? So how can we solve that? Well, we are going to talk about this root tracing that actually I love to explain this
as the free docs problem. How is this, though, doing? Tell something, please, be interactive. Sit down, right, or something, right? How about this one? Stand up, right?
How about this one apart from SAP? Yeah, relax it. So this is the real problem. And this is what usually happens when you are working with your services, right, where you have a big application that has different services, and in this case, it was just free.
Let's imagine something like Amazon or Google, right? And maybe you are just selling, I don't know, World of Warcraft things in a single-page application. You don't need this, right? Well, the thing is that with tracing, now we are going to see how, what we can see is the complex,
the full transaction, right, in our applications. So we log some metrics, you see how the different services are behaving individually, right? But you don't have the context. You don't know the relationship that is gonna be between the different parts of the application. Okay, so the thing is that
this retracing, actually it's just saying, okay, we are gonna generate some structure logs, right, where we are gonna explain for different things that we're gonna be doing, different operations, where the operation starts and where it ends, right? Also, we are gonna be able to
create dependencies between the applications, the operations, right? So we can see exactly how much time we are spending on the different parts of our operations. So for instance, let's imagine that this will be a trace, right, of our request and an endpoint on our website or something. So this will be the total amount from the request,
but later we can analyze, right, how our request, what we're touching, the different microservices, right? Just in case there is a bottleneck or something, we can detect pretty fast, or if there is a failure, we can see exactly there where is the failure
and what parameters are we using to go into that. So this is, for instance, Jaeger, the UI of Jaeger, with one thing running. So there you can see exactly, right, how much time you are spending on the different operations, and just in case something fails, you are gonna see it there pretty easily, okay?
I was afraid of being in sparkling water. Okay, so the things that even, there are companies that are removing the other signals, the other information, and are becoming
on these three tracing first companies. Why, because if you check these, the things that you can add metadata to those spans, to those logs, so now you have the logs, you don't need also to have logs apart from that, and also you can extract some metrics from this, right? Because you can have, for instance, red metrics that are requests,
endurance, and duration metrics. So you can know exactly the operations, how much time you are taking, and things like that. So you can generate alerts based on your traces. So you know this operation should take, I don't know, maybe 300 milliseconds. If at some point it's one second and a half, maybe something is going pretty wrong, right? Okay, so the things that actually,
what we are talking about is about the survivability, that this just sending a bunch of data to a certain eye, so we can have a dashboard to give to our manager, because yeah, as we said, managers love dashboards. And when we can correlate and see
exactly what's happening in our application. I love exactly how the Wikipedia describes this, the observability, because it's like a system is observable when you can infer the internal state of the system, yes, with some outputs, right? So this is what we are doing here.
We are checking the code to know exactly what's happening, but we can see how well or bad our system is performing, right? Even in what part we are having maybe problems. So the things that, well, with metrics, we can see, hey, there is an issue with logs, what's the issue, right? Maybe you are right into the wrong place
or I don't know, a problem with permissions or something else. And well, with traces it's like what is the issue, right? These are the three pillars of survivability. Usually people talk a lot about this. Also there is one article written by Yuri, I don't remember his surname,
that talks about temple and he talks about the six pillars of survivability, right? Now people are talking a lot about profiling, but yeah, well, that is for another talk. I actually have it here, right? This article is pretty interesting
and I recommend you too much to read it. So well, the thing is that each system has their own APIs, so their own way to send data. So for instance, if I am using Google Cloud, Google Cloud has their own thing for doing observability. If you go to Amazon, they have their own thing. And you need to like,
if you switch from one cloud to another, right, you need to change that. If you are using maybe, you are using something like Datadog, Dynatrace or Splunk or something like that, right? They have their own agents, so maybe you will need to move from one to another because well, you don't want to pay too much to these ones, right? And the others came with a better offer or something like that.
The things that you are stuck, you have that problem with the vendor locking. So well, this is where one project came, that is OpenTelemetry. OpenTelemetry came from joining OpenCensus, that was mostly about metrics, and OpenTracing, that was well, about tracing, right?
And it's a set of SDKs, libraries, binaries that came to like help you with the collection of your telemetry data. And it's important just with the collection, okay? No visualization, nothing else. If you want to visualize a store or whatever, you need all the tools. And yeah, well the full OpenTelemetry project,
more or less, there are some other things, right? But with the idea of having receivers where you can receive data, right? So for instance, if you have something writing with the old Jager tracing SDKs, you can consume the data from there. If you have something with the SDKs for OpenTelemetry, you can consume from there.
If you have Prometheus endpoints, you can consume from there. Later you compare all the data to a common thing, you process, so for instance, you can batch the data, you can remove things that you don't want to have there, whatever. And later you export the data. To where? Well, to where whatever you want. So if you have an endpoint using OTLP format,
you can send there, but also you have other supporters that you can be using for specific technologies, okay? So the thing is that, for instance, you have the OpenTelemetry collector, because the things that you can do are the other part also in the clients. But you have the OpenTelemetry collector where you can still maybe be sending telemetry data
in your old format. And from here, you have a single point to configure how you treat your telemetry data. So from here, if you want, you can enable or disable different components that you can use later to process your telemetry data. So for instance, if I have one Jaeger instance and also something using OTLP,
I can send the telemetry data to both. If at some point I remove the Jaeger instance and I install a Grafana tempo, I can from here just do that configuration without changing all my applications, right? So it will be something like this, right? So I can have this and do bunching and transformation
and all the other stuff, okay? There are a lot of companies working on this, like Data.Google, Red Hat, Timescale. Well, the thing is that this is something that companies are putting a lot of effort, because the thing is that before, due to the vendor locking, right, it was pretty expensive to move from one to another. Even companies are basing their agents now
on the OpenTelemetry collector. So for instance, if you go to Grafana, Grafana deprecated their agent like three or four months ago. I don't remember exactly. And now they created something that is called Grafana alloy that is actually something on top of the OpenTelemetry collector, right? So well, the thing is that after this introduction,
you're actually right, because usually people, when I talk about this thing and when you start talking about instrumentation, they say, well, but you didn't mention what is accessibility. Okay, so now we have the full story. The thing is that the OpenTelemetry libraries, SDKs, things, right? This is the current state for different languages.
We are gonna focus especially on Python because this is a Python conference, but if you see me, for instance, on the conference, I will say exactly the same, but just pointing to the .NET route. So the thing is that currently, well, in the case of the locks, is still experimental.
But well, for metrics and traces that we are gonna focus here, the thing is pretty stable. So again, what we are gonna see is a demo application. We'll give something pretty stupid. Later I will show you actually the full code, but it's more like just a flash application and where we are gonna receive a request,
we are gonna request, well, we are gonna forward the request to another second application. That application is gonna be exactly the same and the third application just replying to others, right? And we'll do some sleep and some things, right? Just to simulate some work.
So what we are gonna do here is that we are gonna send from our applications, we are gonna send the telemetry data to an open territory collector and from there we are gonna export the data to Jaeger to see some traces. So all the things that you will be doing these different ways. For instance, you could be using manual instrumentation. I don't need to tell you
that this is not very likely the best idea in the world. Why? Because this was our application and now is this mess. Actually, you can remove some of this data and inject it using environment variables, but I thought that this would be like more dramatic.
So well, you can, maybe if you need it, right, you can instrument something manually. But usually what happens is that people when they want to start using instrumentation, they have two requests. One is now and the second one is free. So can you imagine if tomorrow you have to go to your company
and somebody says, let's add instrumentation. We want instrumentation for tomorrow. How many things you will need to modify? How many impounds? How many applications in different languages? And very likely other applications that you don't maintain like, I don't know, databases and things like that. So that is not very likely an option, right?
So we have something that is the automatic instrumentation. It's amazing because it's automatic and you don't have to know anything. The things that are asset of libraries, I think on Saturday there is during the sprints are one that is about contributing to open telemetry. So various smart people are going to be there
so you can ask exactly how the things work. The things that those libraries, what they actually do is using monkey patching techniques. When you call one of the methods that are instrumented automatically, you are actually calling the one from the open telemetry library.
It will generate some stuff like traces and other things, right? And finally it will call the real code. Even you can create your own library. So for instance, somebody from a company that I think is called Trace Loop, they created something that is called Open LL Metric that here are just libraries that you can use to instrument your LLM applications.
Not like, I don't know, I don't think that you can do for the training, but at least when you are calling the generate and complete method, something like that, right? And how do you do it? Well, the things that you have to install the libraries, depending on the libraries that you are using, so for instance, if you are using Flask, you need to install the Flask library
for the instrumentation in your system. And you have a nice, pretty nice CLI that you can use to start your application. So instead of starting your application like Flask run, what you do is like you do on this way, right?
So this thing, what does is saying, hey, this is the name of my service. I want to export my traces using ODLP and my metrics using ODLP to this address, okay? Well, in an insecure way, just because this is having more, okay, but don't do that in production or at home.
But yeah, so the thing is that if you are using Kubernetes, actually you can use something that is much better because you have to do almost nothing. And we know that doing almost nothing is pretty nice, always. There is something that is called OpenTrem that allows you to auto instrument your applications, okay?
So we are going to see, we are going to see the thing. Thanks for the look to me. Okay, I will check later. So one of the things that our application is going to be,
maybe it's too weak, give me a second. As I said, right, it's just, hey, send this request, right? We do some simulation work. The second application is going to be exactly the same, okay? And the third one, it's just slipping a little bit, reply.
Okay, so in the case of the Jaeger thing, right, it's just creating a Jaeger instance. I am going to show you later what is Jaeger exactly, I mean the UI. And what we have, we can create one OpenTremly collector. We are going to go later a little bit more about how this configuration that seems a little bit crappy
looks like and what it's doing. Actually it's like just saying, hey, I want to receive data for traces through OTLP. We are not going to do any processing and also we are going to export via OTLP to the Jaeger thing. We are going to see later what is this magic.
For metrics, something similar. We are going to receive via OTLP and we are going to export via Prometheus. So the thing is that I have my application running here, okay? This is the OpenTremly collector logs. Okay, and here I am running core, okay, to my application.
So you can see that this is saying buy, okay? So if I go to the Jaeger UI, I will find that I have zero services so I have nothing like showing traces. If I refresh the thing,
you will find that, well, actually now we have one but this is because Jaeger itself is instrumented, which is pretty nice. Okay, so the thing is that now what we are going to do is that we are going to create something that this Jaeger OpenTremly operator has
that is an instrumentation object, okay? And here what we are going to say, hey, export my traces to this endpoint, okay? And this thing that you just always copy and paste from other places until you understand what you are doing. But yeah, so we are going to do it.
So we created and nothing happens, okay? But now we have to restart our applications so it will take a little bit.
Usually when you are using a production service, a production cluster, it doesn't take too much but the thing is that I am running other things here so luckily the thing is becoming crazy. Okay, come on, do it.
Please, do something. Okay, now it's there, okay? So the thing is that very likely this thing broke. Yeah, okay. We have to do this. Okay, we are still receiving the buy, okay?
So when we restart the thing, oh, what did this happen? Yeah, connection refused. Why?
No, it's not DNS. Come on. Stupid Jaeger. Okay, in the meantime I am going to tell you
what happened to me yesterday when I leave the conference. So the thing is that it was just going out from the underground and one guy tried to rob me. Don't worry, I am good, I am here. But yeah, the thing is that he tried to take my bag pretty hard.
And yeah, the thing is that somebody tried, oh, this is the problem, okay, I found the problem, I found the problem. The thing is that I turned back, right? So I hit him pretty hard. I closed my eyes and I did the windmill like this.
Yeah, and somebody helped me later. First time this thing happens to me. Not the demo thing, the problem with the rob. Yeah, actually this demo failed a lot of times.
Yeah, give me a second. It's recreating the thing. It's not going to take too much. Okay, now we are here. Okay, so now if I restart this,
now we are going to see here four services. Okay, so if we go to application one and we check for traces, we are going to find that the application was automatically instrumented.
So now I am going to increase a little bit soon. So now we can see things like what HTTP verb was used, the status code, what kind of request we did, right? So a lot of information like the name of the container and other stuff. This was done automatically.
So I didn't add any open telemetry code to my application, which is amazing because you can do it with different languages and everything, right? And everything is going to be interoperable and you are going to get a lot of information. So the thing is that also we are generating the traces. The SDK is going to be generating some metrics. Also, if you check, I have something here that is the small metrics.
So what we do just is that we enable it, okay? And we say, hey, when you are receiving traces, export whatever stuff you have to this span metrics thing and in the metrics, receive from the span metrics. So what we are doing is that we are generating metrics from our traces.
So if we go to the metrics endpoint from our open telemetry collector, it is going to fail because it was restarted.
We are going to be there soon, trust me. Or not. Oh, no, no, yeah, I am not touching. The correct place is this one. Come on. This is for the application, yeah.
Now we have a lot of, I mean, we have populated here, right, the endpoint, right? So we can have some histograms and everything. So we can, just with our traces, see exactly what's happening in my application with zero code. If you are, for instance, using, so if somebody used to work with Istio, service mesh.
Okay, so the things that you are used to use Istio. In Istio, one of the things is that Istio generates traces itself, but if you want to have traces like the full transaction because otherwise you are going to have just traces
that are of two spans, you have to propagate the context. And to propagate the context, there is some documentation there that says that you have to propagate, you have to propagate all these headers, right?
So this means some coding. So I think that something that also you can do with this magic thing is that you can tell it, hey, propagate,
propagate those headers for me so that is done also automatically, right? Which is pretty nice. There are a lot of work here. And also the things that all the components are switching to this, right? And it's not just something that you have to use for,
you can use for cloud. For instance, also there are a lot of work behind, I mean, making edge devices observable using this, right? So with a single binary, you can send all the metrics and traces and everything to there.
So I wasn't sure if it was, okay. Yeah, well, we have plenty of time. I was thinking that maybe the demo will fail much earlier. So yeah, the thing is that now making your application
observable is not a choice, especially because now also there are tools like TraceTest that allows you to create tests based on your traces. So now also you can do that thing. All these tools are like for not just
when something is failing, it's also for finding where you are spending your resources. So another way of making money is to spend less, okay? This is pretty useful for this. And also there are things, if you switch to use open telemetry, the thing is that also other signals like the profiling
that right now you want to use it, you have to use things like Pyroscope, for instance, from Grafana, or well, other things like from DataLog and other agents. So also the things like one or two months ago, Elastic donated all the agents, well, started the donation of the agents for profiling
to the project, to open telemetry project. So very soon, right, we are going to be able also to see that in open telemetry. Also you need to evaluate what doesn't make sense to instrument. And what I showed you, right, is like I tried to instrument everything.
Something that you can do is to exclude some libraries that you maybe don't want to have them instrumented, or maybe you can focus on something that is pretty specific for your application, right? So maybe you can add some manual instrumentation there, but you don't need it with what we show here, right?
As I said, it's not just for debugging issues. If you are using things like this, this has one thing that I, oh yeah, it's here, right? I thought that this will not work, so I was not planning to show it. So the things that are here, you can see even a graph that was generated
from your metrics and everything. I don't know how to do it bigger, but the thing is that you can see that there are three points, right? So are your applications and how they are connected. So imagine that you are an SRE. You encode the applications. Well, the things are changing, right,
how the different endpoints are interrelated between them, right? Because, well, you are doing CI, CD, having a lot of releases per day and everything. So if something fails, they can see exactly how the system is actually connected from the single view, right? So you can see that application one is connected to application two and to application three.
Oh, well, actually it's here, right? So this is also something also pretty interesting. And also the thing is that you have this instrumentation, but also you need to store it.
And storing in telemetry data is something that also has some cost. So usually when people are integrating with the observability tools, their complaints are, hey, this is pretty expensive. Yeah, very likely because you are saving all the telemetry data from the last three months, and very likely you don't need it, right? This is not just for SREs.
So if you watch, for instance, the previous talk, this is something that it can be also pretty interesting for developers because you can detect pretty easily things that very likely you are not able to reproduce locally. You can see points, right, where you can improve.
So thank you. Well, you have any question or something? Hi. So we are now entering Q&A session.
We got a lot of time asking questions, but as I love live coding, you already now get your cookie. Thank you. So ask your question. Hi. So, sorry. Thank you for the talk, and I really applaud you for not only live coding, but live Kubernetes cluster reconfiguration.
That takes a lot of, you know, cojones to do it. And I have a question for you. You sort of just went over it just quickly in your last slide, but if I instrumented everything like you said automatically, my company would fire me because they would run out of budget. So is there a way to do sampling in this way,
like reduce the amount of the traces being actually sent to? Yeah. Something that you can do is to adjust from the open telemetry collector, for instance, you can adjust the sampling. So, for instance, you can take, depending on what is the policy that you need and for different signals and everything, right,
you can say things like doing just the 10% of the data that is sent, or maybe like just the data that has this other thing. Something that also other companies are doing is that they are putting one open telemetry collector in between, right, and they are sending some data to maybe things like that and other things,
and they are sending the rest of the data to something locally, right? Something that maybe is not so important or is just important for them, things like that. So with that, you can reduce the cost, right? But especially the thing related to the sampling is what you very likely are going to need. Thank you. Let me just put a question from Discord in between.
So the question is, do open telemetry Python, open telemetry Python contract packages on GitHub are overwhelmed with issues, PRs. Do you know more about where this is going and how the governance is handled? I am not in the par with Python. There is one spring this Saturday
with one guy that is mentioned there. Very likely you can ask to him. But the thing is that there are a lot of working groups, so you can join to them. And one of the things that they are pretty open, right, they allow you to ask things and well to be included in the decisions and everything.
The things that if you go, for instance, I am part of the SIG call for the operator, for instance, right? So we are people from different companies and we take decisions, we vote, even write about how to do things in one way or another. So very likely the rest of the things are going to be the same, right? There are some rules that are written, yeah.
Cool. So now back on the microphone. You got a question, huh? You, yeah? No, you. Yes, thank you. Yes, thank you for the presentation. So all the tracing basically adds some overhead, but does the Python version,
Python instrumentation add a lot of overhead? It's just a little. Yeah, well, the thing is that you have some overhead, right? But, I mean, it's something that you have to measure. If it's worth it for you to have that overhead but get all that information or maybe not have the overhead
but be totally blind about what you are doing, right? One of the things, this is something that I was talking about to one person who is using a lot of intermetry is that day before we're sending a lot of logs and metrics and everything, right? So now they are just sending traces.
So the things that they are using the overhead there because they are just sending one thing. And also they are not sending the metrics. They are calculating that in the backend. So, yeah, they were measuring, right? For them it was worth it doing that way. Yes, definitely true. Thank you. More questions? Oh, yeah, here.
So one more question. Thanks again for the presentation. And during the presentation you focused largely on server-side applications. I was investigating using this for client-side. I think you mentioned edge locations. Specifically I'd like to implement this for a CLI application. I'm just kind of wondering what your thoughts are and if you know anybody using this
or any pitfalls to be aware of while doing it. Yeah, well, I have not been using it, right? I know that there are some people that are using these for especially web applications, right? So, for instance, in the case of Grafana, they have Grafana Vela, I think is the name of their project, where it's actually, well, the library is right
for OpenTelemetry and other stuff in a way that you can consume them. Also this week, this last week or the previous one, there was a blog post in the OpenTelemetry blog about how to use OpenTelemetry for mobile applications. So it's going to be somehow related, right? So maybe you can check. I'll check that out. Thanks.
I was wondering whether it was possible to put your code on GitHub so that we could have a more... Anyway, you go to my GitHub account, right? You will see a lot of workshop or demo or something. Very likely all of them are the same, but with some small changes. Okay. All right. Thanks.
We have plenty of time, so if someone has a question, first of all. If not, last chance. No. So, again, thank you for this wonderful talk. Thank you.