We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Distributed Tracing for beginners

00:00

Formal Metadata

Title
Distributed Tracing for beginners
Title of Series
Number of Parts
490
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Distributed tracing is a tool that belongs to every developer's tool belt, but what it actually can do remains a mystery to most developers. In this slideless talk, we will introduce you to the world of distributed tracing by developing a cloud native application from scratch and applying all important distributed tracing concepts in practice, at first by hand and then by using existing libraries to automate our work. You will learn not only what distributed tracing is, but how it works, what it can do and what it can’t. By the end of this talk, you will have working knowledge to start using distributed tracing tools with your new projects, as well as with your legacy ones.
33
35
Thumbnail
23:38
52
Thumbnail
30:38
53
Thumbnail
16:18
65
71
Thumbnail
14:24
72
Thumbnail
18:02
75
Thumbnail
19:35
101
Thumbnail
12:59
106
123
Thumbnail
25:58
146
Thumbnail
47:36
157
Thumbnail
51:32
166
172
Thumbnail
22:49
182
Thumbnail
25:44
186
Thumbnail
40:18
190
195
225
Thumbnail
23:41
273
281
284
Thumbnail
09:08
285
289
Thumbnail
26:03
290
297
Thumbnail
19:29
328
Thumbnail
24:11
379
Thumbnail
20:10
385
Thumbnail
28:37
393
Thumbnail
09:10
430
438
Artificial lifeDampingSlide ruleCartesian coordinate systemComputer animation
View (database)Group actionValidity (statistics)QuarkTunisExtension (kinesiology)Visual systemCodeJava appletDebuggerCartesian coordinate systemMilitary baseGroup actionProjective planeComputing platformCodeVisualization (computer graphics)MereologyRepresentational state transferJava appletSoftware frameworkBuildingExtension (kinesiology)Computer animation
QuarkExtension (kinesiology)Visual systemCodeJava appletComputer fontCore dumpComputer virusVariable (mathematics)Stack (abstract data type)Boolean algebraThread (computing)Local GroupHash functionHydraulic jumpInfinityMetric systemTracing (software)Multiplication signRevision controlGoodness of fit2 (number)Extension (kinesiology)QuicksortLibrary (computing)Perfect groupCASE <Informatik>RandomizationConnectivity (graph theory)Projective planeInformationClient (computing)Type theoryInterrupt <Informatik>CodeException handlingBound stateJava appletVideo game consoleComputer animation
InformationSocial classNetwork socketAddress spaceVariable (mathematics)User profileTask (computing)Java appletStack (abstract data type)Computer fontInformation securityQuarkContext awarenessEnterprise architectureContent (media)Run time (program lifecycle phase)Open setEvent horizonSineAdditionCompilerConfiguration spaceTracing (software)Dependent and independent variablesCore dumpMeta elementInfinityProduct (business)Thread (computing)BuildingBoolean algebraString (computer science)Data typeInheritance (object-oriented programming)NumberSmith chartChaos (cosmogony)Hash functionRun time (program lifecycle phase)Cartesian coordinate systemSocial classCategory of beingTracing (software)CASE <Informatik>Open setEvent horizonStatement (computer science)Configuration spaceSingle-precision floating-point formatSystem callCodeBlock (periodic table)Operator (mathematics)Java appletInstance (computer science)Service (economics)Bootstrap aggregatingInformationBinary fileObject-oriented programmingLibrary (computing)Stack (abstract data type)MereologyComputer animation
Maxima and minimaQuarkQuery languageService (economics)Computer animation
Chaos (cosmogony)Service (economics)Meta elementInfinityJava appletExecution unitComputer fontOvalOpen setThread (computing)QuarkService (economics)Goodness of fitMultiplication signNumberCross-correlationKey (cryptography)InformationConnected spaceComputer animation
VacuumMultiplication signTracing (software)Computer animation
Hash functionThread (computing)Chaos (cosmogony)Java appletQuarkComputer fontOpen setRun time (program lifecycle phase)Configuration spaceCore dumpSimultaneous localization and mappingDirected graphDefault (computer science)InformationBuildingInclusion mapTotal S.A.Electric currentVariable (mathematics)Stack (abstract data type)User profileProjective planeSynchronizationCodeSpring (hydrology)Java appletExtension (kinesiology)Computing platformType theoryService (economics)Software frameworkCategory of beingComputer fileComputer animation
Maxima and minimaJava appletResultantExtension (kinesiology)Library (computing)Server (computing)MetadataProjective planeComputer animation
Tracing (software)MereologyComputer animation
Asynchronous Transfer ModeComputer fontJava appletAlgebraQuarkOpen setInterface (computing)Client (computing)String (computer science)CodeInfinityHash functionInformationNetwork socketAddress spaceDefault (computer science)Module (mathematics)Source codeComputer fileTask (computing)User profileVariable (mathematics)Stack (abstract data type)Thread (computing)Right angleService (economics)Client (computing)Interface (computing)Java appletHypermediaCASE <Informatik>Goodness of fitCheat <Computerspiel>Type theorySystem callString (computer science)Computer fileCartesian coordinate systemExtension (kinesiology)InjektivitätComputer animation
Context awarenessClient (computing)Spherical capPropagatorComputer wormServer (computing)Library (computing)Overhead (computing)Software frameworkSystem callINTEGRALComputer animation
ResultantCartesian coordinate systemStrategy gameTable (information)Term (mathematics)MeasurementService (economics)Touch typingPropagatorVisualization (computer graphics)Slide rulePresentation of a groupCASE <Informatik>MereologyHierarchyContext awarenessDatabase transactionLibrary (computing)CodeType theoryConnectivity (graph theory)Software frameworkEmailMultiplicationMetadataMessage passingQueue (abstract data type)Communications protocolTelecommunicationComputer animation
NumberData storage deviceTerm (mathematics)MultiplicationCASE <Informatik>Software bugUser interfaceMultiplication signDatabase transactionRepresentation (politics)QuicksortMessage passingTracing (software)Computer animation
Point cloudOpen source
Transcript: English(auto-generated)
So, welcome to distributed tracing for beginners. In the next 25 minutes or so we are creating a new application from scratch and we are adding some tracing capabilities to it. All right, so you saw me bringing this chair here because this is a live coding session.
So I'm not standing here now, or I'm just gonna sit down and do some coding. Yeah, this is pretty much the only slide that I have. This Visual Studio Code, I assume most of you know that already, this code editor, and I have two extensions here. One is the Jaeger extension, so it allows me to start and stop Jaeger.
If you don't know what Jaeger is, hold on, we are getting to that in a few minutes. And the second extension that I have here is the Quarkus extension. So I'm not gonna talk or focus on Quarkus. For our purposes here, so Quarkus is a
framework or a platform to build cloud-native Java applications. All right, so that's all I need to know for this talk here. And we are creating a wave-fold producing resource. So we are in Brussels, so why not some waffles? All right, so we are generating a REST interface that accepts a
POST request and we generate some waffle for that. All right, so let's generate a new project, so generate a Quarkus project. And this generator is pretty much the same that we have online, so Maven has the build tool, some group ID, and some artifact ID
version, and which what is a package name to serve as bases for our application. So I'm just going to add some waffle there. And which REST resource we want to be created by default. So Quarkus generates one resource as part of this scaffolding, and we say it's, oh,
let's try again. Maven package, artifact ID, version,
package name, resource. Yeah, so for now we don't want any extensions to our Quarkus application, and we select where it should be generated. All right, so let me close those overview tabs here. So this is, let me
place it down here a little bit. All right, so this is what Quarkus generates for us. All right, so we have a path here to slash hello, and whenever we run a GET request against this slash hello, we get the hello back. So let's start our project and see if that's what's happening. All right, so it started.
And we can then call curl on localhost 8080 slash hello, and we get hello back. Good. So we start customizing that, so waffle
post, and let's say that our method calls, is called produce. And then here waffle is ready to be picked up. Good, so if we call it again here, changing hello to waffle,
waffle is exposed, we get waffle is ready to be picked up. Good. So this is working, and I don't know about you, but I don't actually know how to make a waffle. I hear that it needs to be cooked for some time, and of course, I don't know how long it should be cooked, right?
So as an engineer, what I would do is I would just produce some waffles and do some random amount of time for the cooking time, right? And then I see what is the perfect amount of time to spend cooking a waffle. So let's do that by creating a cooking time bar here, and we wait a random amount of time.
So secure random, because it has to be secure, and 1000 as our bound, upper bound. So and then we we slip for some time, so cooking time, and we just need to catch the interrupted exception.
Interrupted exception, and for now we do nothing, right? So if you get interrupted while we cook our waffle, we just return the waffle to whoever interrupted us. Good. All right, so this is pretty much it. So let's go back to our console here.
Let's add the time command so that we can see how long it's taking. So 762 milliseconds for one waffle, 778, 090, so this one was really fast. So it seems like our random cooking is actually working. Now if we get one of those waffles to be good,
then how do we tie that waffle to the time that it took? All right, so we don't have this information anymore. It is here in my console, but if I just clear my console or if I close it, this information is lost. So what I can do is I can create a new trace for each waffle that I generate. So whenever I produce a waffle, I just create a trace that represents that whole
production, and then I can add information to that. All right, and that's what we are going to do. Now to achieve that, we need two types of libraries. The first one is an instrumentation library. So instrumentation library is a tool that we use to tell what do we want to measure.
So in our case, we want to measure how long this thing here is taking, right? So we are using some sort of API to wrap this code here in a timer. And the second type of API that we need is the actual tracer. It's the thing that knows how to measure time, how to measure things, and how to send data somewhere.
So we need instrumentation API and we need a tracer, the actual client. Now sometimes those two components are in one library, but in our case here we have two separate libraries. One would be OpenTracing as our instrumentation library, and the second one would be Jaeger as the actual tracer.
So on a Java application, we do that by adding a dependency to our project file, in this case here a pom file, and add a dependency. I can do a trick here and add only the Jaeger client to the dependency, and that transitively pulls the OpenTracing library with it, right? So that's
I'm going to use this trick. So Jaeger client and version 1.1.0. So I have to synchronize my project. And let me start practicing the background while we do the other things that we need to do. So in our
Wayful resource, so now we have those two libraries in our classpath. So now we can start instrumenting our application. But it's probably a bad idea to initialize a tracer for every single HTTP request that you receive, right? So we want to initialize a tracer either lazily or during the bootstrap of our application.
And for that, then we create a tracer initializer. So let me create a new file and call it tracing. And that's public class tracing. And we want only one instance of this class to exist in
during the runtime for the whole application lifecycle. So this is an application scoped bin. For those of you who are Java EE, Jakarta EE, or microprofile developers, you know what this is. There is not really much magic. So now what we need is a method that gets called whenever our application bootstraps. And for Quarkers,
this is a void method, but the name doesn't matter. And what we can do is we can observe a CDI event. So CDI, again, Java EE, Jakarta EE, and part of microprofile stacks. So observes a startup event. Now, this is an event from Quarkers, so SE, and
that's it. So this method here is called on bootstrap. And within this method, we need to then initialize the actual tracer. And we store it in a in an instance property. So private tracer, tracer. And this is something from open tracing. So this tracer here is open tracing, and open tracing is not an implementation.
It's only an API. So if you decide to change your tracer later, you can, you know, keep your code mostly unchanged. The only place that I have to change is something like this. So where you initialize your tracer, so we are setting this, oops, tracer,
to, and now we call the Eager API to get an actual tracer. And knowing the API, I know that I have to just call the fromenv from the configuration object, from the configuration class, and set a service name to, in my case here, waffle. So get tracer.
That's it. Now, I expose this property here as a CDI. So now, whenever, so with that, I'm ready now to inject a tracer into my applications. So waffle resource, and here I can just go and inject
again CDI, tracer. So tracer is from open tracing. And in my code, I can finally measure my my business. So we can wrap in a try statement. And this is a try with resources statement. So for those of you who are Java developers, you know what this is. Basically, here
I add a closeable object, in this case scope, and scope is also something from open tracing. So open tracing scope, I get that from the tracer, so tracer build span, and I give this an operation name. So I named this block of code. I would name it after the method, so produce, and
I start active through. Now, start active through here means that I want to close my span, I want to stop measuring things whenever my scope gets closed. And my scope gets closed whenever I step out of the try statement. All right, so quite simple.
What I can do now is I can make another call to our service. Let's remove the time, and it's there, it's working. All right, so you might be wondering, where is this trace information going on? I mean, has it been logged somewhere? Because I don't see that here.
And the thing is, this data has been black-holed. So this data has been lost, because we don't have a Jaeger instance running anywhere to receive this data. And that here just shows you that even though we don't have Jaeger available, it's not breaking our application. So this is one of the main things about tracing or about monitoring in general.
Those tools should never break your application. You are paying some performance price for that, and even that should be really, really low, right? So typically, there is far more benefits of using that than the cost that you are paying. Good, so let's now start then Jaeger so that we can see data. So we start Jaeger.
Jaeger has started. And I start, show Jaeger UI. So here, oops. Show Jaeger UI. Now, this is probably too small for us to see here. I will just get a new. So this shows one service, so Jaeger query. So that's Jaeger tracing itself.
I'm just opening here on a real browser so that we can see better. All right, so this is Jaeger. This is the Jaeger UI. We see one service here. That's Jaeger tracing itself. And that's about to change, right? So we run one more curl call here, or a few more.
And if everything works, we see a new service here, right? The waffle service. Once we hit find traces, we see one trace for each HTTP request that we executed. Now, if we expand one of those traces, we see a trace that contains one span. And this span here contains quite a few, quite, you know,
a good number of information. So something about the process, where is it running, and so on. And also a duration, right? So the duration here is the key for us, because we are waiting some amount of time for the waffle to be produced. And we can make a correlation between the quality of the waffle and the time that it took for the waffle to be produced.
All right, so we can make a connection there. But that's only a correlation. We don't actually know how long this waffle here spent cooking. So what we can do then is to make it very explicit. We can add a new tag to our span. So we get a span from the scope, and then we set a new tag with the cooking time. So cooking time,
and with the value of the amount of time that we are waiting. Then we run a couple of more HTTP requests,
and we should see new traces there. All right, if we expand one of that, we see the cooking time as a tag within our span. Right? So that's good. So that's the very basics of tracing. So, you know, what is a span? What is a trace? What is a tracer? What is an
instrumentation API? And the next step is, you might be wondering if there isn't some way to make it easier for us to consume that. I mean, do I have to actually initialize my tracer? Do I have to take care of that by myself? And the answer is,
depending on the stack that you're using, you don't actually have to. So if you're a Java developer, chances are high that most of the, you know, chances are high that the framework that you use or the platform that you use does that for you already. So it is true for Spring Cloud, it is true for Quercus, it is true for
Java EE in general. And so let me demonstrate that. So let me just remove this tracer initializer, oops, option, so delete, delete, and we clear the dependency from from our project.
We have to synchronize it. And then we see that we have some import failures here, because OpenTracing is not known to be something for this project. So what we can do now is then add a new Quercus extension to this project. So add extensions, and we type in here OpenTracing.
Alright, so we add this extension there. And similar to the tracer initializer that we had, we also have to specify a service name. Now the service name, synchronize it, now the service name can be either specified by an environment variable or in some property file like this one here,
service name waffle. Alright, so let's debug our project. And so basically what we did, comparing to the last span, to the last trace, was we removed our tracer initializer and replaced with a Quercus extension. And if we run it again here,
our code is still working, so our waffles are still being produced. And if we find new traces, we now have a different picture. So we have the old produced span that we had before with the cooking time, but we also have a surprise here, right? So we have one span being created at the rest handler
layer. And this is a result of us adding the Quercus extension to the project. Now the Quercus extension brings a instrumentation library for the JAX-RS stack, both server and client. That means whenever I use JAX-RS in my application, that layer is going to be instrumented for me.
I don't have to do anything. I can of course disable if I want, but I don't have to do anything extra for that. I have some metadata about this JAX-RS server request. So I have, what is the HTTP method? I have the whole Java method that was called
how long it took, and so on, and what is the HTTP status code, and so on and so forth, right? So I get not only the tracer initializer, but also some libraries that instrument this stack for me. Again, not Quercus specific. This exists for pretty much all the libraries out there.
All right, so that's pretty much what I had for the distributed trace, or the tracing part of distributed tracing, and we should have now a few minutes left for the distributed part of distributed tracing. Now we are, when preparing waffles, we need some ingredients. We need some eggs, we need water, we need flour, and
let's, because we like microservices, we should probably provision some flour from some extra service, right? So we should call a microservice to provision that flour for us. I have a service here for that. So I have a flour service, so
target flour, and it is running on port 8081. All right. So let me, let's open our waffle producer again, and in here we are then making a call to that extra microservice, and we are doing that by
creating a new file. So if you are, again, if you are a Java developer, you know a lot of the magic going on here. If you're not a Java developer, you should just trust me that it works, right? So there's no magic. So package, and we specify here an interface, so public interface the flour service, and
we register that as a REST client. We say that the path to the flour is at slash flour. We import that. This is a jack-sized path. Let's scroll down here a little bit. And then we say that we want to get a string back from that service, and let's call it provision on our side.
It doesn't matter how it's called on the other side. This is a POST request, because we are producing something, we are generating something, we are changing state, and that it produces, produces, a jack-size, a media type, media type of plain text.
All right, so that's it. That's all we have to do here on our interface. We need to do a couple other things. So the first one is to tell where our service is actually running. So de-urling-frostm-waffle-for-service, micro-profile-rest URL,
localhost-881. Good. And finally, we have to add the Quarkus extension that adds the REST client for us. So REST client, the first option, and add the extension. We have to synchronize it, and
while our application starts again, start the bug, while our application starts, we just go back to our waffle resource and we inject the REST client, right? Inject REST client.
For a flower service, flower-service. And before we start cooking, we get the flower. So we string flower and flower-service provision, I think it's called. Yeah. And then, of course, I need to do, do something,
the flower. And if everything is working, we get some waffle back. Now it looks like I'm cheating because you don't see actually the calls going to the flower service,
but you can use Jaeger and see that this is indeed the case. So if we open it here, we see a different picture now. We see one trace with five spans, and if we look closely, we have three spans here that were created for us by the stack that we are using, right? So
the first one is the REST handler at the server side of the waffle, and then our business span, our REST client making a call to the flower service, a REST handler on the flower side, so the server side of this connection, and we can compare those two sides, and we see that the client thinks that it took
188.33 milliseconds, and the server thinks it was 186.45. All right, so we see that there is a mismatch here, there is an overhead here somewhere, and then another business span saying it took 186 milliseconds to provision this flower, and we provisioned five caps.
All right, so that's, and that's only possible because we do context propagation between services, so whenever we go remote, we serialize our trace context into the payload of the message, and we serialize that on the other side. So we serialize on the REST client, and we serialize it on the REST server.
All right, this is for us, it is done by the framework integration that we are using, but if you are not using any libraries, you have to do that manually yourself. Like if you're a library builder, then there are hooks that you can use to to propagate the context, or to consume the
propagated context. All right, so I think I'm a couple of minutes over my time here, and I'd just like to do then a small a small recapitulation here. So let's recap. So first is, we saw that we can make, we can use tracing on existing applications or new applications by making use of two types of libraries.
So the first one is the instrumentation library, so the thing that we use to say I want to measure this, right, so this is instrumentation library, and the second component is the tracer, the client, and that is the thing that knows how to do things, how to send data, how to measure things. We saw that we we can use the instrumentation API to just to be very explicit about what we want to measure,
but we don't have to. So we can use framework instrumentation libraries that are doing most of the work for us. So in this case here, we have five spends, and three of them were created automatically for us. We can get a pretty good picture of what application is doing by just looking at those spends, right?
So we don't actually need those business spends here. And finally, we saw that the distributed part of distributed tracing is possible because of context propagation. So we have to propagate the context in some way so that we can have a visualization into very specific requests, right?
Yeah, so, and with that, I'm now, I think I have my head, a couple minutes for questions. Yeah, so questions, I have stickers for you, so if you have any questions.
Not the slides, no, I mean the slides is only one, one, like the presentation.
I can publish the code. I mean, it's pretty much every single quick start for tracing involves those steps here. Your colleague, Yuri, he has a open tracing tutorial which shows in more details all these steps and more.
Sit still. No, don't touch the tables. Don't touch the tables. What's, like, this, is this sampling tracing? What's the performance impact of tracing like this? Mostly in terms of latency that you add on. Yeah, so
so there was a there was a some, there was some work in the community doing some measurements about the wrong performance, and the results were that whatever you are doing in your application is probably impacting your application more than tracing.
So there is an overhead, of course, because you are doing something, but it should, it doesn't matter that much. Of course, if you are Uber, then you certainly do want, do not want to measure every single request that is coming in. So then you use sampling strategies. That is a more advanced topic for this talk,
so I never, I did not touch that sampling strategies or sampling at all, but there are some Yeah, yeah, yeah, pretty much all the libraries that you use allow you to control how much data you are collecting. Yeah. Any other questions?
Questions, questions, questions. Yeah. Like the internal communication between the different hops in case of microservices is the results are transported as HTTP, HTTP headers, I suppose, or. Yeah, so
tracing is not transport specific or application protocol specific, so if you are doing HTTP between your microservices, then you have to put the context into the HTTP headers because, you know, HTTP. If you're doing JMS or message queues in some way, then you put that as a metadata in the message. Now,
the idea is the same, so you serialize data and send with the payload, and you just serialize data. But yeah, so for HTTP, it is part of the HTTP headers. One more question.
Can you have, can you have multiple contexts as a, for example, hierarchical contexts that I want to, I want to see only part of the, of the, of the work a service is doing because I have a long-running service for two hours, but I want to just concentrate on this five minutes. That's a wonderful question, actually.
So, you can, so you define what is a business transaction. So a trace is basically a representation of your business transaction. If your business transaction is that long, then, you know, it should include
that sort of transaction. So I've seen cases where traces have thousands of spends, like two, three thousand of spends. So we had a bug that we have to fix because the UI could not show that, that number of spends. Now, how you visualize that is a different problem then. So, storing your business transaction representation in the daily store is one thing, and displaying that on the UI is a different thing.
But then you do have an option, or there are some other techniques, not directly supported by Jaeger in this case, but, you know, in terms of distributed tracing, that allows you to have multiple parents, or to,
like, it's just typical for messaging cases where you have one message going in and then different consumers for the same message, and each one of those are a business transaction. All right, so it is a concept that exists. It is mapped by some APIs, but I don't actually know any UIs that properly display that.
Does that answer your question? Okay. All right, so I think we're over time now. Yeah, thank you.