Profiling in the cloud-native era
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 287 | |
Author | ||
Contributors | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/56937 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Chemical polaritySoftwareOpen sourceOperator (mathematics)Software engineeringProjective planeOpen sourceObject (grammar)Data miningOperator (mathematics)Polarization (waves)Service (economics)Level (video gaming)Profil (magazine)Data storage deviceDiagramXMLUMLComputer animation
00:42
Read-only memoryBefehlsprozessorKolmogorov complexityFunction (mathematics)System callFrequencyAerodynamicsMathematical analysisProfil (magazine)Functional (mathematics)Term (mathematics)Tracing (software)SpacetimeDependent and independent variablesMathematical analysisOffice suiteBefehlsprozessorComplex (psychology)Computer programmingMultiplication signFrequencyXMLComputer animation
01:20
Peg solitaireGoogolBlogProduct (business)Overhead (computing)Overhead (computing)Traffic reportingSampling (statistics)Electric generatorComputer programmingMultiplication sign2 (number)Inverse problemFunctional (mathematics)System callDifferent (Kate Ryan album)Event horizonRow (database)XML
02:18
BefehlsprozessorSemiconductor memoryType theorySemiconductor memoryService (economics)Functional (mathematics)Uniform resource locatorProfil (magazine)Multiplication signComputer programmingRoboticsSoftwareBefehlsprozessorType theoryMemory managementResource allocationComputer animation
03:09
Virtual machineComputer programmingComputing platformServer (computing)2 (number)Self-organizationMultiplication signSemiconductor memory
03:52
Self-organizationCodeDew pointStandard deviationOpen setPhysical lawBootingComputer programmingSelf-organizationMultiplication signProfil (magazine)Address spaceFormal languageRepresentation (politics)Line (geometry)Semiconductor memoryStandard deviationMereologySampling (statistics)Point (geometry)Type theoryUniform resource locatorFunctional (mathematics)System callOcean currentBinary codePrototypeWebsiteIntrusion detection systemComputer animation
05:51
BefehlsprozessorResource allocationMetadataFormal languageFunctional (mathematics)Tracing (software)Right angleCodeUniform resource locatorOpen setProfil (magazine)File formatBefehlsprozessorComputer animationSource code
06:50
Point (geometry)Library (computing)WeightCASE <Informatik>Key (cryptography)Server (computing)Web 2.0
07:24
SynchronizationRun time (program lifecycle phase)Sample (statistics)Graph (mathematics)Semiconductor memorySystem callServer (computing)RandomizationProfil (magazine)Memory managementRight angle
08:02
Revision controlMathematicsUser profileAutomationProfil (magazine)BitLatent heatComputer programmingPoint (geometry)Virtual machineMenu (computing)Multiplication signComputer animation
08:38
Latent heatComputing platformOverhead (computing)GoogolInternet service providerProfil (magazine)Projective planeOpen sourceSoftware developerCuboidShift operatorProduct (business)Different (Kate Ryan album)Multiplication signContext awarenessStructural loadNumberComputing platformTwitterRight angle
09:29
CodeRevision controlSample (statistics)Overhead (computing)Connected spaceWorkloadPrice indexMetadataUser profileProduct (business)Special unitary groupOverhead (computing)Right anglePressureProjective planeBit2 (number)Semiconductor memoryUniform resource locatorComputer programmingQuicksortProfil (magazine)Revision controlArithmetic progressionSubject indexingFunctional (mathematics)Point (geometry)Sampling (statistics)Incidence algebraMultiplication signAnalytic continuationProcess (computing)PrototypeCategory of beingDifferent (Kate Ryan album)CausalityTime travelBefehlsprozessorType theoryResource allocationMemory managementKernel (computing)Computer animation
11:53
RootProduct (business)Mathematical optimizationCausalityArithmetic meanProfil (magazine)Multiplication signCompilerInformationComputer programmingXML
12:37
CodeUser profileImplementationCore dumpBenchmarkBoom (sailing)Meta elementHost Identity ProtocolService (economics)Data storage deviceServer (computing)Open sourceComa BerenicesBinary codeData modelMultiplicationGoodness of fitComputer programmingInformationCompilerProfil (magazine)Analytic continuationOpen sourceProjective planeBlogProduct (business)Point (geometry)Configuration spaceState of matterArithmetic meanDatabaseDependent and independent variablesCategory of beingSubject indexingQuicksortLink (knot theory)Semiconductor memoryUser interfaceData storage deviceQuery languageComputer fileBinary codeService (economics)Endliche ModelltheorieTime seriesSource codeJSONComputer animationProgram flowchart
14:52
Open sourcePhysical systemSemiconductor memoryBefehlsprozessorElectric currentComa BerenicesMathematical analysisStatisticsCodeStructural loadInternet service providerServer (computing)Formal languageProfil (magazine)Open sourcePhysical systemGroup actionPlotterMereologyOverhead (computing)CodeKernel (computing)Mapping2 (number)Level (video gaming)Computer programmingStructural loadProcess (computing)Tracing (software)Symbol tableMultiplication signFile formatSemiconductor memoryInternet service providerBefehlsprozessorRevision controlLatent heatWeightOcean currentPoint (geometry)CASE <Informatik>Drop (liquid)Visualization (computer graphics)PrototypeForestComputer animation
16:22
Digital filterVisualization (computer graphics)Selectivity (electronic)Point (geometry)Type theoryProfil (magazine)Multiplication signWeb applicationTracing (software)BefehlsprozessorDifferent (Kate Ryan album)Differenz <Mathematik>Latent heatSemiconductor memoryBinary codeProcess (computing)Bit rate
17:43
Sample (statistics)BefehlsprozessorShift operatorAdvanced Boolean Expression LanguageServer (computing)Scalable Coherent InterfaceUniform resource locatorRun time (program lifecycle phase)IRIS-TAddress spaceInterpreter (computing)Process (computing)SynchronizationStack (abstract data type)Semiconductor memoryResource allocationObject (grammar)Total S.A.Block (periodic table)Open setHill differential equationEmailWorld Wide Web ConsortiumComputer fileBinary codeMereologyHost Identity ProtocolCategory of beingPhysical systemInsertion lossMereologyProfil (magazine)Point (geometry)Multiplication signSequelCircleLabour Party (Malta)Group actionCASE <Informatik>Data storage deviceFunctional (mathematics)Ferry CorstenSeries (mathematics)Flow separationNP-hardDemo (music)FrequencyComplete metric spaceCurvatureSemiconductor memoryMechanism designUniform resource locatorDatabaseInstance (computer science)Matching (graph theory)Projective planeCore dumpProcess (computing)Social classCartesian coordinate systemFormal languageCoroutinePairwise comparisonComputer programmingProduct (business)Total S.A.Shared memoryNamespaceImplementationRecurrence relationFigurate numberComputer fileType theoryLevel (video gaming)Memory managementBinary codeJava appletFlagServer (computing)In-Memory-DatenbankBefehlsprozessorRecursionRegulärer Ausdruck <Textverarbeitung>Sinc functionCube
23:05
Projective planeDirection (geometry)Profil (magazine)Time seriesAnalytic continuationMathematical optimizationData storage deviceInheritance (object-oriented programming)DatabaseCASE <Informatik>Computer programmingMoment (mathematics)XMLUML
24:05
Flow separationData storage deviceCodeExclusive orSparse matrix2 (number)Row (database)Doubling the cubeData storage deviceDifferent (Kate Ryan album)Multiplication signArithmetic progressionInstance (computer science)MetadataLengthTracing (software)Computer animation
25:00
User profileSample (statistics)TamagotchiService-oriented architectureSampling (statistics)Program slicingFunctional (mathematics)Uniform resource locatorDifferent (Kate Ryan album)Visualization (computer graphics)Physical systemComputer fileArray data structureProfil (magazine)MappingData storage deviceTime seriesLine (geometry)MultiplicationMeta elementKey (cryptography)DatabaseSequelSeries (mathematics)
26:09
FrequencyLevel (video gaming)TimestampPrice indexMeta elementStack (abstract data type)Uniform resource locatorMetadataProgram slicingTracing (software)Profil (magazine)WordType theoryFunctional (mathematics)Library (computing)Multiplication signState of matterSeries (mathematics)FrequencyDirection (geometry)TimestampComputer architectureNumberMachine codeDatabaseMappingLengthData storage deviceUtility softwareTime seriesCode
27:46
GEDCOMProfil (magazine)TimestampQuery languageLocal ringMemory managementCASE <Informatik>Data storage deviceIncidence algebraMultiplication signSampling (statistics)Visualization (computer graphics)Green's functionReading (process)Row (database)Arithmetic progressionView (database)
29:33
Office suiteMereologyMiniDiscRun time (program lifecycle phase)Formal languageTracing (software)Semiconductor memorySoftwareResource allocationMereologyData storage deviceProfil (magazine)Arrow of timeOffice suiteUtility softwareMemory managementAdditionType theoryMiniDiscDifferent (Kate Ryan album)File formatRight angleRun time (program lifecycle phase)Computer animation
30:47
Chemical polarityComputer clusterServer (computing)Point (geometry)Data storage deviceProfil (magazine)VideoconferencingCartesian coordinate systemOnline chatComputer animationMeeting/Interview
31:57
BefehlsprozessorProfil (magazine)Service (economics)Remote procedure callComputer programmingSelf-organizationProjective planeMultiplication signFunctional (mathematics)Endliche ModelltheorieServer (computing)Point (geometry)WeightPresentation of a groupCodePairwise comparisonQuicksortMeeting/Interview
33:40
Presentation of a groupProjective planeBitProfil (magazine)State observerData storage deviceDifferent (Kate Ryan album)File formatClient (computing)Open setFormal languageDatabaseSeries (mathematics)BuildingDifferential (mechanical device)Multiplication signTime seriesReading (process)SpacetimeInformation overloadSystem callKey (cryptography)Point (geometry)Meeting/Interview
35:38
Visualization (computer graphics)Multiplication signSequenceBitStack (abstract data type)Functional (mathematics)Representation (politics)RecursionDifferent (Kate Ryan album)Tracing (software)State of matterGoodness of fitMeeting/Interview
36:19
Profil (magazine)NeuroinformatikMultiplication signNatural numberTracing (software)1 (number)Sampling (statistics)Overhead (computing)2 (number)GradientRow (database)Meeting/Interview
37:12
Error messageImplementationSampling (statistics)Physical systemPresentation of a groupAreaServer (computing)View (database)CASE <Informatik>Metric systemMeeting/Interview
38:17
Profil (magazine)Service (economics)OutlierMeeting/Interview
38:57
Computer animation
Transcript: English(auto-generated)
00:05
Hello, I'm Matthias Leuven and today I want to talk to you about profiling in the cloud-native era. Quickly about me, I'm a senior software engineer at Polar Thickness. I maintain various open-source projects with others such as Parca, Thanos, Prometheus, Prometheus Operator and Pura.
00:24
Pura is a pretty cool side project of mine. And I want to quickly shout out that I'm working on with a designer to make service-level objectives more manageable and easily accessible. If you want to reach out on any of the social medias, I'm always at Metro Matzah.
00:42
So, profiling. What is profiling? Profiling is really old. Profiling was first introduced in like the 1970s and even earlier probably, but that's what we found traces of. And it's been used ever since to dynamically analyze programs and measure the resource functions of the space,
01:05
so the memory, the time complexity of a program, so the CPU time and the usage of instructions, as well as the frequency and duration of function calls to know what our program is spending most of its time doing.
01:21
There are two different ways of profiling, and the first one might be the simplest one, but it also comes with the highest overhead, and that is tracing. And tracing records each and every event constantly in our program. But as I said, yeah, the cost is pretty crazy, and the amount of data we collect grows way too quickly to do anything meaningful with that over a long period of time.
01:48
But instead, we also have something called sample profiling, and that means that for a certain duration, so for example, for 10 seconds, we periodically observe function calls and stack traces.
02:02
So, every second, for example, let's say 100 times, we record what we are seeing, and that comes with a pretty low overhead, as you can see right here. So, we can do this almost always. What types of profiles can we create?
02:23
So, to be more specific, we can create CPU profiles that tell us where the CPU, or where our program is spending CPU time. We can create memory or heap profiles, so where's our program holding the most memory, and then which allocation profiles are telling us where are functions allocating the most memory.
02:48
So, slightly different, but super meaningful as well. And then there are IO profiles, and these tell us where functions do many network requests, or which functions are writing or reading the most from disk, things like that.
03:07
So, why are we doing this? First of all, to improve performance, and that is to reduce the latency, for example, of our servers. And that could mean that we, I don't know, like get our servers from one second tail latency down to 100 milliseconds,
03:23
and then our users will be more happy, and obviously, we need to make money, so they will be able to spend, they will be happier, and then maybe spend more money on our platforms. We can also save money. That means that our program could do the same task,
03:43
but if we improve the overall performance of our programs, we can maybe turn off like 20% of our machines. And that's exactly what some companies and organizations have done. They looked at their programs and where they spent most of their time and memory, and then they tried to optimize that, and they could save up to 30% of their resources.
04:08
To get more specific again, how can we profile Go programs? Go comes with a tool called pprof, and pprof descends from the Google Performance tools, and it's an open standard.
04:22
It is described in a protobuf file, and it is an open standard, and you can use it with Go, but there is also support by other languages. The most important aspect for Go is, it is built into the Go runtime, and is part of the tools that are shipped with Go.
04:44
Again, it is open standard, and there are many languages supporting this. The p-performance itself is not too complicated once you wrap your head around it. On the left-hand side, you can see a profile type, and then that profile type is kind of like a collection of various other types.
05:05
First of all, there's something called a mapping, and you could also rename that to binary, so if your program is just one binary, there's most likely just one mapping, and then there are certain programs that have a couple of binaries,
05:22
so these will be taken into account. Then every stack trace of your program will create a sample, and these samples point to location IDs, and a location is an address of that stack trace, and that points sometimes to a line,
05:43
and then that line points to a function, so that is how we can represent profiles in memory with pprof. pprof, again, supports many other languages, but Go obviously is kind of like the best-supported one, but there's support for other languages.
06:02
Some are better supported, some not too great yet, but the community kind of works together on improving this. On here, we can see how the code on the left-hand side is folded into the stack traces that are then stored in pprof.
06:20
The functions on the left-hand side are calling each other, and once they are folded, the main function is represented on the very right, and then the leaf function is on the left, and we can use these as locations in the pprof format. And yeah, the folded stack traces are then converted
06:43
or transformed into these samples, and then we get a CPU profile in this case, for example. Now, if we add pprof to a Go program, you can import the net HTTP prof package from the standard library, and once you've done that,
07:01
you can register a couple of HTTP endpoints, and in this case, you then have a router on port 8080 that you can actually query with the go to a pprof command and point it to the right URL, and then that would actually open another web server locally
07:23
where you can look at different visualizations, such as the icicle graph or flame graph, and here, we, for example, can see that we have some HTTP server that has a generated random text apparently, and that takes up quite some time.
07:43
A different way of visualizing this in pprof is a call graph, and in here, you can see we are looking at a memory profile and the buff.io new writer size, for example, is allocating or actually having lots of memory
08:00
allocated on the heap. Profiling is an incredible tool, but it doesn't do everything we want, so the problem with just profiling, and we come to continuous profiling in a bit, but just profiling is that it's just a snapshot
08:20
of like a single profile. It's just like a snapshot of where our program is in that specific point in time, and it is quite manual. We need to go to a machine, download that profile, and then we can do something with it, and it's not automated at all, so that's why we have continuous profiling.
08:42
It was first popularized by the Google-wide profiling paper, and now there are a couple of really awesome open-source projects out there, so why would we do this? First of all, development is in production, and once we have something shipped into production, there can still be bugs,
09:00
and there can still be things that we need to improve. The load might be more, and we might be seeing different artifacts in production, so we still want to be able to profile in production, and the data and context over time is really important, so let's say you have different versions, but just over time, there are more users,
09:23
and then there are less users on your platform, right? All of this taken into account kind of matters. So when is continuous profiling useful? It is useful to save money. We can look at what functions and where the processes are spending the time,
09:41
and we can try to reduce these. We can understand the differences of our process over time, and we compare even across versions, for example, like new version, why is that version slower, or where is it spending more memory? And then we can also use it to understand incidents that already happened,
10:01
but we might be able to kind of like, yeah, really like time travel and look at our process right before it crashed or the incident happened. So a very infamous example is this, where you can see that we have one gigabyte of memory allocated,
10:20
and then all of a sudden, the program crashes, and it starts creeping up again, and then it crashes again, and that is the infamous unk, right? So out of memory queue, and the kernel will just terminate the program, and it needs to restart. Okay, so how does continuous profiling work? We use pprof, and pprof creates sample profiles,
10:42
and we want to sample every so often, and then with pretty low overhead, due to the sample, we hope to get the profiles right before unk, for example. And instead of doing that by hand, we do that automatically every few seconds.
11:01
So once we are like scraping or ingesting these profiles, we want to index them by the metadata, so we can search for certain containers, for example. And then once we index and start these profiles, we also want to be able to query them in a meaningful way that would unlock new workflows
11:21
that were impossible before. So giving a bit more concrete example again, here we are trying to do continuous profiling for heap profiles, for allocation profiles, and CPU profiles. And every 10 seconds, with a bit of lag,
11:40
the continuous profiling project would reach out to our pprof endpoint and collect these different profile types. What's really cool is it is possible to profile in production all the time.
12:01
The overhead, it's there, but it's kind of negligible, and you can do it in production, and it's pretty good to really get down to root causes and optimizations. One special feature that we wanted to shout out is profile-guided optimizations. So let's imagine you, over time,
12:22
observe your program, and you can merge together all these profiles, and we get to that, like we get to merging profiles and what that means. But we kind of like, yeah, we look at what is our profile doing over hours of time, and then we can tell that to a Go compiler,
12:41
and the Go compiler can take that information and really optimize our program when compiling already. So that is really cool, and we are looking forward to having these capabilities in Go, and I'm pretty sure these are in Rust already, so if we can somehow manage to integrate there as well, that would be crazy good.
13:00
Okay, so now I want to talk to you about Parka. Parka is our open-source project for continuous profiling. Here you can see a quick overview of the Parka project and the Parka server, and that is responsible similar to what Prometheus does for either scraping or ingesting profiles
13:21
and then storing them in a time series database, indexing them and having a query engine, and it also has a gRPC-based web interface where you can visualize these profiles. For scraping these pprop endpoints,
13:41
we need to somehow discover these processes, and Parka supports either a Kubernetes service discovery, which is actually the Prometheus one, so if that works for you with Prometheus, that should work for you with Parka. We also support a static or a file service discovery where you can basically write into a static configuration
14:02
the endpoints to scrape from. And a really cool project that Parka also has is the Parka agent. The Parka agent uses eBPF to create profiles, and then these profiles are sent via gRPC to Parka, and that's where you can visualize them.
14:23
Parka is an open-source project hosted on GitHub, has a neutral governance, and contributions are welcome. It is inspired by Prometheus. It is a single statically-linked binary. It uses the same multidimensional label model that Prometheus has. As I already mentioned,
14:41
it uses the same service discovery as Prometheus, and it has a built-in storage which currently is in memory only, but we want to add persistent storage eventually. And because we have the Parka agent, it is super easy to integrate. A single profiler using eBPF automatically discovers targets from Kubernetes or systemd
15:01
across the entire infrastructure with very low overhead. It supports C, C++, Rust, Go, and more, and we are always constantly trying to improve the support for these languages and adding more. Just like Parka, the Parka agent is open-source. It discovers cgroups version 1 and 2 on the current system, and it uses eBPF to create these profiles.
15:23
It understands where CPU memory IO resources are being spent, and right now we have CPU profiles. It captures the current stack trace X amount of times per second and creates a profile from that. The really cool thing is you don't need to change any of your code for creating these profiles.
15:44
A high-level overview of how the Parka agent creates these profiles. So, first of all, it discovers the cgroup from the target provider. It then loads and attaches a BPF program to your cgroup. It waits 10 seconds and then reads
16:00
the BPF map from the kernel. It transforms these maps into PPro format and then symbolizes the debug symbols on the fly, and all of that is then sent to the Parka server, and the process is repeated every so often.
16:22
All right, let's talk about visualizations. So, Parka's web-based UI looks like this, and you can select a profile type that you want to take a look at on the dropdown, and once you've done that, in this case, a memory-induced bytes profile, you can click on a specific point in time,
16:40
select that, and it will give you the profile as an icicle graph. The same is true for the CPU profile, and in this specific example, we had various profiles merged into a single profile. So, the underlying data here is actually many profiles,
17:00
but you get back one very specific profile where everything is combined and summed up for you. So, you get like one hour worth of profiles into one profile. And then the other really interesting profile type is a diff profile, and that works by clicking on different points in time
17:22
for the same binary, for example, for the same process, and then you can see in green where less, I think, memory is in use, and then in red where more memory is in use with each of these stack traces. So, that will tell you where the memory is actually allocated.
17:45
Let's look at Parker in action, and this instance is running on Minikube, and all the data is coming from the Parker agent with eBPF. First of all, let's select our profile type, and we can see we want to query the last hour.
18:01
Now we can see all the time series, basically, that Parker has ingested for these profiles. We can hover over these, and we can drill down into different labels. So, for example, let's go into the namespace equals cube system. Up here, we can see the first series, and that is actually the Kubernetes API server,
18:23
and if we click on this profile, we can scroll down and we can see the Kubernetes API server's CPU profile without us having ever touched the API server. The same is true if we were to, for example, click on this etcd Parker demo, so now we're seeing a profile
18:41
of the etcd running in the Minikube cluster. This is for Go, but it was instrumented with eBPF. Let's take a look at the namespace Parker, actually, and next to Parker and the Parker agent. I've deployed several other languages. Rust, sadly, isn't yet too functional,
19:02
but hopefully by the time you're seeing this, we might have had the chance to improve this. Instead, let's look at Node.js, and we can actually use a Regex matter to look at all the Node.js applications
19:21
deployed to this cluster, and if we, for example, click on this one, we will get a Node.js profile. Awesome, the profile has been loaded. Let's scroll down, and we can see next to the Node.js binary, we can see also the just-in-time part
19:41
of our Node.js application, and in here we can see the main JavaScript file calling a Fibonacci function, and that has been calling itself all the time, and as you can guess, it is a recursive implementation of Fibonacci, and we can see how often each of these functions were called,
20:02
and that's really cool because for now, by just adding one flag to the way we run our Node.js binary application, we get all of this profiling data. Let's look at another example because we actually have deployed a Java application to this cluster as well.
20:25
Instead of looking at one profile on its own, let's merge all of the profiles we got in the last hour, and this is really cool because now we can see our Java application as well, and we haven't really done much to that
20:41
except a few flags when starting this application, and we can drill down into our Java application and see what that is up to. As the last part of the demo, I want to show you how you can use the pull-based mechanism to instrument a Go binary on its own.
21:02
For that reason, I've started Parca, the binary, outside of Minikube, and it's instrumenting itself. If we go to select profile, we can now see that we have a couple of more profile types available, and first, let's take a look at the in-use bytes memory.
21:20
Looking at this, we can see where our heap memory is allocated and currently used. Right now, we can see that Badger, the in-memory database we are using, has allocated the most memory on the heap. Next to that, we can take a look at
21:41
the memory allocated bytes total that tells us which functions are allocating most of the memory. In this case, we can see that the flat profile from pre-prof actually, like this map sample, is allocating lots of memory, and that is completely fine because that is how we actually ingest
22:02
the profiles into Parca. The last profile type I want to take a look at is the Go routine created total, and we can take a look at the Go routines used by the Go program, and as you can see, we are creating a bunch of Go routines
22:20
for the HTTP server in this case. We can also merge these together in the last hour, or in this case, since Parca has been started, and we can see that, for example, in here, we can see that the run group function has spawned a bunch of Go routines, which is totally fine.
22:41
The last thing I want to take a look at with you is comparing these profiles. We can take a look at this point in time and then compare it to this point in time, and we can see down here in red where these Go routines have been created the most. In comparison, we can see that our HTTP server created a bunch of Go routines.
23:05
Now that we have seen Parca in action, let's talk about the storage. Parca was actually another project first called Conprof, and Conprof was a direct fork of Prometheus that we modified to be able to store profiles,
23:20
but that came at a certain cost, and we needed something better. Instead, we were inspired by Prometheus time series database, but have rewritten it since, so it's really optimized for the continuous profiling use case, and what we actually done was before we stored a profile
23:42
just as a zip file, basically, in the Prometheus TSTB, in the time series database, where it's now we disassembled everything. We optimized how we can store these things, and then we do that in a more optimized way. So that's what we've done with Parca, and that's why it also was rebranded as Parca
24:03
among a few other things. Parca storage is still inspired by Prometheus. It separates the metadata when ingesting into the storage. There was a key difference to before, and now it also handles stack traces natively in the storage.
24:21
We do that by using different chunk encodings, so the export is used for the values, just like Prometheus, and double data is used for the timestamps, just as Prometheus, but we also use run length encoding for values that basically always stay the same, so if you always profile for 10 seconds,
24:43
we can always just store the 10 seconds once, and then we just increase the counter if we see that value again, and we also use something called value sparseness, so if we don't see a stack trace, we don't store anything for it, and we kind of pretend that it's zero.
25:02
The Go profile struct actually looks like this. We've seen a visualization of that earlier with the different data types, but this is really what it looks like in Go, and as you can see, we have the samples over here and then the locations and functions and mappings all as slices
25:21
or what you would call in other languages, arrays. When ingesting these profiles in Go, what we do is we need to create these meta store entries and kind of take away the metadata, and as you can see, the location has the ID, the address,
25:42
and then that has a line or multiple lines, and then each line can have a function and the function names are often repeated, so we can kind of walk all of this, store it into a key value store, or previously it was a SQLite database, and then we can from there just use the values
26:02
that you can see up here and pretend that these are basically time series just like in Prometheus. Once we have extracted the metadata from the profile, we are left with the stack trace IDs, SU IDs, and the location slice as well as the values,
26:23
and that is what we are working with onwards. Just to give you a quick overview of the architecture, the profile is ingested up here. It is then parsed and validated with the Go library of pprop. We then convert the profile
26:41
and uniquely store these mapping locations and function as said earlier, and then what is left are these time series values basically that we then can ingest into this highly customized time series database for profiles. Even after we have extracted the metadata of the function names and so on,
27:02
we are left with some other metadata such as the timestamp, duration, and periods. So these are all numbers essentially, and we can utilize double delta chunk encoding, run length encoding on these various types, and because timestamps are always increasing by a certain amount,
27:21
the double delta encoding is really good for this, and then for the durations and periods, they are really staying the same most of the time. Run length encoding is what we use. Just a quick note, we only store the timestamp in here, and then when retrieving a profile, we look for the timestamp, and then the offset in the time series
27:42
is what's used to then fetch all the other data. So how does it look when querying a profile from the storage? So in this case, we wanna see the heap profile of parka of the instant local host on port 7070, and we want to be very specific about the time,
28:01
so we give it the Unix timestamp. This query is then transformed or actually made as a gRPC request that looks like this, and it's defined in a profile. And last, I want to quickly explain how comparing or diffing profiles work,
28:21
and you can see two profiles over time with two samples in each profile, and the first sample first has a value of 253, and then it has a value of 257. And if we subtract that, we get back a value of minus four for this first sample. And then the other sample in this case
28:42
has a value of 26 at first and then a value of 24, and if we subtract that, we get a value of two. So that is how we can compare different profiles and then make these like red or green visualizations. The other synthetic profile is a merge profile,
29:00
and these profiles are created by summing up all the values of all the matching samples. So in this case, the first sample has these two values, and if we sum them up, we get 510, and then the other one has 26, 24, and we get the sample value of 50. So that is how we can use many profiles
29:21
across a lot of time and then really sum it all up, and you get kind of like the whole view of all the profiles over an hour or even a day, for example. All right, now you might be asking what is Farkas roadmap? We want to get persistent on disk. We are fully in memory,
29:40
and that is because we are also experimenting with different storage formats such as a columnar store and experimenting with Apache Arrow, and we wanna nail all of that before even writing to disk any of the profiles ever. We want to be able to query only parts
30:01
of stack traces, for example. We wanna improve the language and runtime support, especially in the Parquet agent via eBPF, and we wanna add additional profile types such as heap and allocations and IO such as networking and disk utilization. If you are interested in that, and we especially are experimenting
30:22
with columnar store right now, we invite you to jump on our Parquet Discord or attend the Parquet office hours where we are happy to discuss pull requests or ideas or feature requests or anything really, and we will also give updates to the community.
30:40
So hopefully, if any of this sounds interesting to you, you can attend, and we'll see you there. Thank you for listening. Again, I'm Matthias Loewe, and feel free to reach out, but I'm also here for the Q&A now. Thank you, and bye-bye.
31:18
So thanks a lot for the awesome talk.
31:21
It was really, really interesting to watch and to listen to. So welcome to the live Q&A session. We have a couple of questions in the chat. A few of them have already been discussed, but I guess it makes sense to repeat them here so that people watching it later in the video will also see what went on.
31:43
The first one here is, if an application that's being profiled is running somewhere where Parquet cannot scrape it, is there any way to push data to the Parquet server? Yes, there is, and first of all, thank you for having me.
32:00
Pleasure being at Fosdom. I would love to be in Belgium with all of you. Always was a great experience, but soon, soon again. But this is turning out really good as well. For sure. So to the question, yes, if you can't scrape or push or anything, there is a project called, like on GitHub,
32:23
it's called, it's in the package organization slash profile, and that is essentially a way of profiling with pprof where you can turn on the profiling, can turn off the profiling in Go with the Go programming language, and then the resulting thing could be sent off
32:41
to a remote endpoint with gRPC, for example. And that was something I actually hacked on, kind of like making serverless a thing. So imagine a request coming in, and then you can profile the request, and then once that request is done, you can send it off to a remote service
33:00
to store the profile there if the amount of requests you get aren't as high. And that would also mean that if that's exactly the thing, like it's a serverless function, you actually gain a lot of value from profiling and improving that function if you can reduce the amount of CPU and memory,
33:22
or in the end, like the time spent running the function. So that's like one of the ideas that we were playing with. Yeah, we need to pick it up, but there certainly are ways to do this. Nice, thank you.
33:40
Then we have one more from Brian asking that we saw a presentation from Pyroscope earlier here in this step room, and if you could elaborate a bit on how the projects compare. Right, yeah, it was a great presentation. I think like overall, it's really,
34:01
as mentioned in that talk as well, it's doing kind of the same things slightly different, and we really try to make sure that we're building on top of pprof on our side, as we thought that's kind of already an open standard, open format for representing profiles.
34:21
And that's like everything we do, kind of is based around pprof. We create pprof profiles with eBPF, for example, as well. We can scrape pprof profiles from Go endpoints There are some clients for Rust and other languages that create pprof as well, but as I said, we are kind of really focusing
34:41
on generating pprof profiles with eBPF, and then I think that's like the biggest differentiation. I mean, the storage is different as well, and there are different trade-offs to be made, as mentioned in the talk quickly as well. We are also looking at moving
35:01
from a Prometheus time series database to a columnar store. We have actually an RFC, like a request for comments document out in the open, which people can read through, and I'm happy to link this later as well, where we discuss the different trade-offs on time series database versus columnar store.
35:21
So I think that there are many different aspects, but I mean, same overall ideas, and I think it's really great to see the ecosystem moving forward with this and kind of expanding on the three pillars of observability. Cool. Thanks a lot. So maybe I can sneak in a question myself.
35:42
Go ahead. A bit of a beginner's question, but about the visual representation as icicle graphs, right? So what I'm always wondering, if these are actual stack traces or if this is kind of some merge, so if you, for example, have a recursive algorithm,
36:01
then you maybe, when you take snapshots, get different stacks all the time because you have different depth of recursion and so forth, but overall, it's kind of repeating the same functions. Is there anything aggregated or would you just see a sequence of different stacks next to each other? So yeah, it's definitely aggregated,
36:22
and that comes by the nature of it being sampled profiling. So basically, there's tracing profiling where you would actually record every single stack trace that ever happens. You would record that, but the overhead of doing that is super high, whereas sample profiling, you just do it 100 times per second, right?
36:43
Computers are so fast. You just do it 100 times per second, and then that's what makes up the actual profile in the end. So yeah, they are somewhat aggregated already, but it's kind of like, if you're doing it 100 times per second, then at least the stack traces
37:02
that show up the most will definitely be in there, and that's the ones we are interested in, right? So I don't know if that answers your entire question. Mm-hmm. Yeah, so basically the same thing is true. The aggregation is through the same thing, so yeah. Yeah, exactly.
37:21
Yeah, so like we, and I think that's true, as I said, for most implementations, like you could lose trace profiling, but that's super expensive, so you end up with like a sample profile. Yeah. Cool. Connor, do you have any more questions? Yeah, thank you for the presentation. I really like the merge view where you can get like more of an average understanding
37:43
of how the system behaves. Any thoughts about being able to use that data to identify outliners? So in my case, I run a lot of Grafana servers, many, many, many, many, and I would, maybe, I'm not sure, but I think it would be interesting
38:00
to see if there's some server misbehaving. So if I know the area I'm curious about, I could instrument it using metrics, but if I don't know exactly what's wrong, then it's much trickier. Yeah, that's a really good question, or like idea around how to do profiling,
38:23
and like certainly that's one of the aspects why we kind of like went with the profiling, with the Prometheus service discovery and labeling. We really want to make sure that like our kind of like goal in the end is you run a eBPF agent on every node,
38:41
and they send off these profiles to Parker, and then in the end, you can, by merging everything, you can see like the outliers or the biggest kind of resource hawks that...