Graphics Performance Analysis with FrameRetrace
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 644 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/41344 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | |
Genre |
FOSDEM 201863 / 644
2
3
5
6
7
8
10
12
16
18
27
29
30
31
32
34
39
46
47
48
55
57
58
61
64
67
76
77
80
85
88
92
93
98
101
105
110
114
115
116
118
121
123
128
131
132
133
134
140
141
142
143
146
147
149
162
164
171
173
177
178
179
181
182
184
185
187
188
189
190
191
192
200
201
202
204
205
206
207
211
213
220
222
224
229
230
231
233
237
241
242
243
250
252
261
265
267
270
276
279
280
284
286
287
288
291
296
298
299
301
302
303
304
305
309
310
311
312
313
316
318
319
322
325
326
327
329
332
334
335
336
337
340
344
348
349
350
354
355
356
359
361
362
364
365
368
369
370
372
373
374
376
378
379
380
382
386
388
389
390
393
394
396
400
401
404
405
406
407
409
410
411
415
418
421
422
423
424
426
427
429
435
436
439
441
447
449
450
451
452
453
454
457
459
460
461
462
464
465
470
472
475
477
478
479
482
483
486
489
490
491
492
493
494
496
497
498
499
500
501
503
506
507
508
510
511
512
513
514
515
517
518
519
522
523
524
525
527
528
534
535
536
538
539
540
541
543
544
545
546
547
548
550
551
553
554
555
559
560
561
564
565
568
570
572
573
574
576
578
579
580
586
587
588
590
593
594
596
597
598
601
603
604
606
607
608
610
613
614
615
616
618
619
621
623
624
626
629
632
633
634
635
636
639
641
644
00:00
Performance appraisalFrame problemWeb pageGoodness of fitMultiplication signPresentation of a groupComputer animationLecture/Conference
00:39
Android (robot)Computing platformGraphics processing unitBlock (periodic table)Computer hardwareSource codeNumberPresentation of a groupComputing platformPhysical systemExtension (kinesiology)Projective planeWorkloadOpen sourceSpacetimeINTEGRALFunktionalanalysisGraphics processing unitGoodness of fitVapor barrierException handlingSinc functionMultiplication signFigurate numberGame theoryProfil (magazine)Position operatorCartesian coordinate systemComplex analysisWindowSoftware bugSoftware developerBitSource codeFlow separationComputer hardwareMathematical analysisCodeProduct (business)Device driverComputer animation
04:37
Mathematical analysisIntelKernel (computing)Graphics processing unitCASE <Informatik>Frame problemMathematical analysisProjective planeExtension (kinesiology)NumberTracing (software)Computer animation
05:10
Computing platformKernel (computing)Graphics processing unitIntelMathematical analysisSoftware development kitDifferent (Kate Ryan album)ImplementationCross-platformWindowDevice driverComputing platformWorkloadComputer hardwareDirectory serviceFrame problemBranch (computer science)Profil (magazine)Right angleComputer animation
05:55
Mathematical analysisDevice driverSet (mathematics)Computer architectureDifferent (Kate Ryan album)NeuroinformatikCASE <Informatik>VolumenvisualisierungFrame problemComputer hardwareLecture/Conference
06:31
Computing platformGraphics processing unitKernel (computing)IntelMathematical analysisShader <Informatik>Metric systemVolumenvisualisierungLogical constantUniform convergenceElectronic visual displayState of matterVisualization (computer graphics)StapeldateiComputer hardwareImplementationLoop (music)VolumenvisualisierungExtension (kinesiology)Block (periodic table)Cartesian coordinate systemTracing (software)Computer animation
07:30
Mathematical analysisMetric systemGraphics processing unitVolumenvisualisierungShader <Informatik>Uniform convergenceLogical constantElectronic visual displayState of matterVisualization (computer graphics)StapeldateiStapeldateiFrame problemLatent heatMultiplication signVolumenvisualisierungSystem callMetric systemSoftware developerDevice driverComputer hardwareWorkloadComputer animation
08:19
Mathematical analysisGraphics processing unitVolumenvisualisierungMetric systemShader <Informatik>State of matterElectronic visual displayLogical constantUniform convergenceVisualization (computer graphics)StapeldateiProfil (magazine)Representation (politics)Maxima and minimaFrame problemComputer hardwareVolumenvisualisierungCore dumpDemo (music)Logical constantDebuggerShader <Informatik>Error messageMathematicsLatent heatUniformer RaumState of matterComputer animation
09:54
Demo (music)Shader <Informatik>Metric systemUniform convergenceVolumenvisualisierungPixelCore dumpFrequencyGraphics processing unitAverageThread (computing)Bit rateInterprozesskommunikationFloating-point unitBinary fileHybrid computerMathematicsThree-valued logicComputer multitaskingPolygonSoftware testingStreaming mediaVertical directionGeometryVertex (graph theory)TesselationState of matterSystem callStapeldateiSource codeSteady state (chemistry)Read-only memoryCache (computing)Total S.A.TLB <Informatik>Process (computing)MeasurementLevel (video gaming)Computer hardwareMaß <Mathematik>Hand fanFrame problemTape driveHierarchyFrame problemVolumenvisualisierungDescriptive statisticsMetric systemGraph (mathematics)Computer hardwareVertex (graph theory)Electronic mailing listTable (information)PixelCache (computing)Latent heatOnline helpComputer animation
11:18
Digital filterAlpha (investment)Gamma functionTexture mappingCountingData bufferGeometrySource codeSteady state (chemistry)Vertex (graph theory)Compilation albumPlane (geometry)Euclidean vectorBlock (periodic table)ProgrammschleifeLogical constantSummierbarkeitSurfaceError messageSynchronizationData typeSkewnessCache (computing)TLB <Informatik>Operator (mathematics)Chi-squared distributionPointer (computer programming)LaserMUDAddress spacePrice indexWorkloadArtistic renderingMereologyObject (grammar)Shader <Informatik>Single-precision floating-point formatDevice driverVolumenvisualisierungForm (programming)Semiconductor memoryFunction (mathematics)Computer hardwareVapor barrierBitFluid staticsSoftware developerMedical imagingTouchscreenIntermediate languageSicTriangleStapeldateiPixelProgrammer (hardware)Frame problemSimulationSound effect
14:19
Core dumpGraphics processing unitMetric systemVolumenvisualisierungStapeldateiSystem callState of matterShader <Informatik>VolumenvisualisierungPixelMetric systemElectronic mailing listDepth of fieldFocus (optics)Clique-widthSound effectGraph (mathematics)Cartesian coordinate systemComputer animation
15:15
Shader <Informatik>PixelGraphics processing unitCore dumpMetric systemVolumenvisualisierungStapeldateiSystem callState of matterIntelGeometryThread (computing)Vertex (graph theory)PolygonComputer multitaskingSoftware testingTexture mappingVolumenvisualisierungPixelStandard deviationDifferent (Kate Ryan album)Shader <Informatik>Vertex (graph theory)BitComputer animation
16:17
Metric systemVolumenvisualisierungStapeldateiState of matterShader <Informatik>System callDigital filterVolumenvisualisierungUniformer RaumMessage passingArtistic renderingComputer animation
16:59
Metric systemWide area networkCore dumpRevision controlGeometryTesselationSteady state (chemistry)Source codeSystem callStapeldateiState of matterShader <Informatik>VolumenvisualisierungVertex (graph theory)CompilerTexture mappingCompilation albumDigital filterGreen's functionCloningView (database)Menu (computing)Daylight saving timeScalable Coherent InterfaceAlpha (investment)Nichtlineares GleichungssystemFingerprintPrimitive (album)PolygonClique-widthVolumenvisualisierungGraph coloringMenu (computing)State of matterSet (mathematics)FlagShader <Informatik>HierarchyMusical ensembleFraction (mathematics)Subject indexingAlpha (investment)Computer configurationDirection (geometry)Error messageString (computer science)Disk read-and-write headRhombusDifferent (Kate Ryan album)TriangleMultiplicationNetwork topologyBack-face cullingComputer animation
19:48
Metric systemVolumenvisualisierungSystem callStapeldateiState of matterShader <Informatik>VolumenvisualisierungState of matterSemiconductor memoryVapor barrierFrame problemComputer animation
20:13
VolumenvisualisierungMetric systemShader <Informatik>StapeldateiSystem callState of matterDigital filterVolumenvisualisierungDemo (music)MathematicsFrame problemCategory of beingState of matterComputer animationLecture/Conference
20:40
Device driverSource codeElectronic visual displayTexture mappingAndroid (robot)VolumenvisualisierungVisualization (computer graphics)Computer hardwareGeometryPolygon meshData bufferProof theoryPixelElectronic visual displayVertex (graph theory)Volumenvisualisierung1 (number)Buffer solutionImplementationTexture mappingComputer hardwareSoftware developerDifferent (Kate Ryan album)FunktionalanalysisSet (mathematics)MereologyDiscrepancy theoryBitSystem callWindowProfil (magazine)GeometryDevice driverVisualization (computer graphics)Computing platformLevel (video gaming)Figurate numberLecture/ConferenceMeeting/InterviewComputer animation
22:26
Android (robot)Cartesian coordinate systemComputer hardwareComputing platformData storage deviceTracing (software)BitSimilarity (geometry)Online helpLecture/Conference
22:51
ProgrammschleifeUsabilityVideo trackingMetric systemMetric systemKey (cryptography)Mathematical analysisComputer programmingState of matterWorkloadElectronic visual displayCartesian coordinate systemUsabilityOnline helpProcess (computing)Computer animation
23:50
Cheat <Computerspiel>VolumenvisualisierungSystem callIntegrated development environmentCore dumpFrame problemMusical ensembleTracing (software)Address spaceWorkloadStapeldateiCartesian coordinate systemSingle-precision floating-point formatVertex (graph theory)Multiplication signSoftware developerPatch (Unix)Computer programmingCollaborationismWindowTexture mappingComputer fileStandard deviationDevice driverPresentation of a groupLecture/ConferenceMeeting/Interview
27:12
CollaborationismGoogolService (economics)Program flowchart
Transcript: English(auto-generated)
00:06
Good morning. Welcome to FOSDEM. I hope you guys had a fun time last night. It's good to see so many people have recovered from the festivities. Hopefully I'll have a seat to sit down when I'm done with my talk. It's very crowded. I was told before I came
00:24
that this was the worst possible time slot to give a talk at FOSDEM first thing Saturday morning, right? But I'm happy to be here. I'm glad that Luke has organized the Graphics Dev Room again this year and that he made time for me to give a short presentation
00:43
on my work, so thanks a lot. I've been working on Linux platforms for more than a decade. Several of those years I spent building graphics performance tools based on a Windows tool that was used throughout the industry. In that position I was able to see how important
01:06
performance analysis tools are for graphics workloads. My project over the past few years has been to try to enable the same workflows for Linux platforms. I've also spent a lot of time automating the integration system for Mesa at Intel, which has helped Mesa's
01:27
productivity and quality quite a bit. But this is really the project that I've been most interested in since I started with the Mesa team. So a little bit about GPU tools and why you don't really have very many good solutions
01:43
in the Linux space. In general, when you have GPU tools, there's a graphics card vendor that understands it's very difficult to go and find out performance bottlenecks or what's happening on the GPU. They've gone and funded some tools specific to their own
02:02
hardware to help developers or their own driver team figure out what the performance profile is of specific applications. But they are very reluctant to go and enable the same capabilities for their competitors. If you do find a good GPU analysis tool, you'll
02:22
find it only works with an AMD GPU or an NVIDIA GPU. Some of the exceptions in the Linux space are made by Microsoft or other entities that care more about cross-vendor functionality.
02:41
Most of the tools are written for Windows and Linux as an afterthought. They're either closed source or the extent to which they're open source is just two commits where they've dumped a huge pile of code into a GitHub account. Whether it compiles or not, you may find that it does not. This is changing a little bit. Intel has
03:06
some engineers that are working on performance tools like myself, and Lionel Landerwillen and Robert Bragg have worked on GPU Top. So there is more native support for performance tools. RenderDoc is another example where Valve has gone and funded a developer to really
03:25
invest in native Linux graphics analysis tools. One thing about a lot of the tools is that tracing and retracing is often not reliable. This can be because the tool was initially written for Windows DX11 or DX10 games. And
03:44
then when they go to implement tracing for OpenGL, they find the complexity of the extensions makes it hard to really capture the workload that you want to investigate. Another reason why tracing is often unreliable is there's not that many users. You might
04:05
have a tools team that goes and tries to build a tool, but unless you have lots of developers going and applying it and looking at different workloads, you're not going to discover the bugs in your tracing system. And up until recently, a big barrier has been the support for GPU performance counters
04:23
in Mesa. Since Linux 4.13, that's enabled now for Intel GPUs, and AMD Performance Monitor is available for some of their newer hardware as well. So now that Mesa is exposing these extensions, there's a whole lot more that we can do.
04:43
So my tool is called Frame Retrace. It's built on top of API trace. I chose API trace because I think it's the most widely used GPU analysis tool. There's a lot of people that use it for quality assurance to make sure that the frames retrace properly. And
05:03
because it has a large number of users, there's often a lot of corner cases of tracing that they've gone and fixed. It's a community supported project, so there's lots of people working on it. Right now, frame retrace is just a directory in a branch of API trace. It's just a UI
05:21
that is built on top of it. Because API trace is cross platform, frame retrace is also cross platform, so it will investigate OpenGL workloads on Windows just as well as it will on Linux, and that's an important capability for driver teams. Because if you have two different driver implementations for different platforms, you can compare the
05:43
performance profile for the workloads and find gaps in your implementation or in the Windows implementation. Our counter support begins with Haswell. There were hardware counters prior to Haswell, but the architecture was different enough that the driver team decided not to enable
06:02
them. So your performance will be better with a newer computer anyways, right? So the Mesa driver team has been using this tool heavily to go and find issues in their driver, and there's a whole set of examples of different special cases that
06:22
they've missed and we found basically by looking closely at each render in a frame and understanding what the bottleneck is. Right now, I'm trying to add support for Radeon hardware and Raspberry Pi through the AMD Performance Monitor extension, and there's some other folks that are looking
06:44
at that with me, and it's going pretty well. There's a few stumbling blocks for the Radeon implementation of that extension. I think that cross-platform support in this tool is one of the main things that needs to be finished before it's a good candidate
07:02
for being upstreamed into API trace. I think that you'll see that the tool is pretty compelling and useful and superior to the API trace UI in some ways, so I'd like to see it go upstream. So what does this tool do? Most graphical applications have a render loop, and the render
07:26
loop just renders the frame over and over again. So if you are looking just at the renders in those frames, you can divide up the frame into each specific draw call, and this tool will give you the metrics associated with each draw call, and you can
07:44
see exactly which render is the one that's taking all the time in your frame. Without it, I mean, generally, you just have a huge asynchronous workload going off to the GPU, and you have no idea why you're missing vsync. You can explore the frame by selecting specific renders, and it'll show you the render targets throughout the frame, which
08:04
is helpful to understand how a frame is composed. It has an API log, which is pretty standard. For driver developers, it's pretty helpful to have batch disassembly, so the batch commands, which are sent directly to the hardware, are disassembled and associated with the render that you've selected. So this is a capability that, at least on Intel
08:24
hardware, you have to, up until now, you would have to dump hundreds of gigabytes of data for any kind of meaningful frame, and then try to sift through the data to try to find out exactly which render went wrong, and this will give you a much more performant
08:41
implementation and let you see exactly what's going to the hardware for each draw. One of the main features that end users and game developers need is a shader debugger or some way to experiment with their shaders and find out why their shaders are mis-rendering. So with frame retrace, you can go to a specific render, look at the shader, change the
09:02
shader, edit it, compile it, and it'll render again, and it'll give you a new performance profile for that shader, or an error if you've made a mistake. You can do the same thing with uniform constants. Just go and see what the constants are and change them, and the frame will render again. There's a couple of experiments that you can do to help you
09:22
try to figure out what the max performance would be for a specific render, and the thing that I've just been editing now is a hierarchical representation of all the GL state so that you can change the coalface and see what happens. So if you have a
09:40
problem with your GL state that's affecting rendering or performance, you can muck with that. So those are the things we'll go through in the demo. So I'm taking a risk. Let's have a demo and see what happens. This is the UI for frame retrace, and this blue bar is actually a graph of renders with
10:09
no metrics, but you'll see here there's a long list of GPU metrics associated with the L3 cache, the pixel shaders, the vertex fetch hardware. A lot of these are somewhat
10:26
inscrutable if you're not familiar with the hardware or don't do a lot of geo-programming. The one that you really want to look at if you want to see why is this slow is you look at how many clocks were required to render the frame. And so this is a graph where each bar is a specific render. There are quite a lot of them, but by far the
10:47
most expensive one is here, and there's a table that will show you the metrics. So here is the clocks, and you can see that it's more than 10% in the entire frame. It's just for this one render. So if you're curious about what a GTI L3 bank L2 read
11:07
is, there's a longer description for that metric that will help you decipher what it means. But typically you can go through here and find an explanation for why this might be the bottleneck for your workload. If you want to see the render target at this
11:25
part in the frame, you'll see that our heroine has found the object of her desire. The rendering of this frame, if you want to see what's actually being rendered, it's rendering the whole screen. In the API calls, it's just drawing a couple of triangles
11:43
for the rect. So it's a little bit puzzling why this might be long, but there's also this GL memory barrier, which is probably something that we'd be interested in looking at. If you want to search for GL memory bearer, you can look at the different renders, which
12:02
contain GL memory barriers. So if you wanted the experiments, if you wanted to see, okay, well, how fast would this be rendered if I just had a simple shader with it just drew pink, you can select that and you can see that the cost is much lower. We go to
12:23
the shaders and in the fragment shader, it's got the, you know, just a substituted fragment shader that just draws pink. So let's disable that and go back to the shaders. So we're now in the fragment shader again, and you can see that there's quite a long fragment shader. So it looks like it's processing all the pixels with some effect, I guess.
12:47
The vertex shader, if you look at it, it's a whole lot of nothing until you get to the very bottom and it just does nothing. So we capture the intermediate representation and the static single assignment form that's output by the Mesa driver. NER is our new
13:06
intermediate representation and the SIMD8 is what's actually sent down to the hardware. Same thing for the fragment shader. You can see exactly how the shaders are compiled. So this is very helpful for a driver engineer, or I guess if you're an elite OpenGL programmer,
13:25
maybe you could make sense of this. So we spoke about the batch. This is an example of the batch. If you look at a handful of renders, you can select one and you can see this is the binary packet that's sent down for the rendering. Again, more for
13:43
driver developers. All right, so let's go back to experiments. If we look more closely at these renders, let's look at the render target. You can see that if we stop at render,
14:02
that means it's going to show the render target immediately after this render. If you advance through these renders and you can see that it becomes progressively blurrier, so there's a little blur and it's going to get even more blurrier on this render. And then finally, it's going to compose those blurry images based on the depth of
14:22
each pixel. So in the background, there's a light here that's quite blurry. And if you look at the first render, it's in sharp focus. So it's a depth of field effect that they're achieving with these final renders. It's just one example of how you can experiment. So, this is an expensive pixel, but it may just be expensive, expensive render, it may
14:46
be expensive because there's quite a lot of pixels. So if you want to look for expensive per-pixel metrics, you can graph on the second axis. So I'm just going to narrow the list of metrics that are displayed. So now the width of each bar represents roughly
15:04
how many pixels are drawn. And so you might look for narrow, tall bars representing very expensive renders. So let's disable this one to make it larger. So you might focus in
15:22
on this tiny shader here, which I guess because of the way it's drawing this particular texture. It's not very many pixels at all, but it's quite expensive per pixel. All right, so what I want to do now is explore a little bit. So let's go to standard
15:54
bar and I'm going to look for vertices. So I want to go and look on the render
16:11
target for where our heroin is rendered. You can see the different render targets that are drawn in this pass. And if we highlight, we'll see that those are the renders that
16:26
are drawing her body. And so let's see. We'll start here, I think. There we go. So this is the full rendering of the character. If you clear before the render and stop after,
16:44
all you'll get in the render target is the character itself. So the reason I wanted to do this is to show how you can go to the uniforms. These are all the uniforms that are bound for the render. You can just change one of them, hit return, go back to the render
17:00
target, and we've gone and moved ahead. So for people who aren't really familiar with OpenGL, this is a really kind of interesting way to look and dissect a more complicated frame and understand some of the techniques or how the API is used. So let's put her head back on. Oh, I mentioned shaders. So let's go to the vertex shader. Somewhere
17:28
at the bottom it's going to assign a color. Let's just go ahead and modify that. I mean, if I compile this, I should get a syntax error saying I've made a mistake, but let's
17:44
do zero. One point zero. So I'm just going to make the red channel zero with that multiplication. And we'll go look at the render target, and now we have a hulkified heroine. So
18:05
that really demonstrates that you can mess around with the shader, try to figure out why it's mis-rendering. You can see how quickly this is. I mean, the fact that you can do this in a fraction of a second is far better than what you had before with the other tools. So what I've been working on recently is this hierarchical state tree. So you
18:26
can collapse different items that you don't want to look at. If you don't know how I've organized them, you can search for sub-strings, like maybe I'm looking for the scissor state. If you want to go and change something, the menu shows you the full set
18:41
of available options in the GL for this particular blend feature. And a lot of the different GL state settings have a set of four values, so it'll give you the index of each one. It might be red, green, alpha. It might be some kind of enabled flag. So
19:04
if I go and disable green, I mean, our heroine was green before, but if I say, hey, there's no green, and we look at the render target, we'll see that she's kind of fading away. So it's fun to play with. But here's another one where culling is enabled for
19:26
this character. That means that the triangles on the back of the character are not rendered because they're facing the wrong direction. If I change it to cull the front of them instead of the back, I go look at the render target, I turn the character around, and she's decided that it's too dangerous to go after the diamond and is going to avoid
19:43
disaster and walk right back out. So that's just an example of how you can mess around with these things. One thing that's interesting, if we go back and look at the final render, I've gone and changed the state, but the character hasn't turned around for the final
20:02
render. And the reason for that is that I actually disabled that draw with the memory barrier in my experiment. So if I turn the frame back on so it's rendering properly and look at the render target, I'll see that the final frame is rendered with the changes. So that's my demo of the features. I think there's a lot more that can be
20:23
done in each tab. There's a whole lot of glState that I haven't gone and implemented, but I think what I've tried to do is demonstrate that each category of state is supported in a relatively easy way to expand, and there's a bunch of experiments that need
20:40
to be added, but the proof of concept is there. All right, so back to the things that still need to be done. Well, one thing I didn't talk about too much is that the fact that you can have this exact same performance profile for Windows is very important for driver developers because differences in rendering
21:05
will stand out starkly when you compare two different sets of this UI running on different platforms because the renders are exactly the same, it's running the same glCalls, and so you can easily find discrepancy in your implementation. Things that need to be done. There's no tab for looking at the textures. If you're
21:25
texture-bound, having an experiment that will clamp the mipmap level down so there's not so much texture data going down is important to see if you've just made textures that are too large. There's no display of the geometry or the vertices, so that's something that I think is of interest to end developers to try to figure out, okay, maybe
21:43
there's just so many vertices that I'm stuck at that part of the fixed-function pipeline. The depth buffer is not displayed. Unity specifically asked for overdraw and hotspot visualizations in the render target, where if you've drawn twice to the same
22:01
pixel in the render target, it'll show up as more expensive, help them figure out if they've got a problem with their engine. There's a bunch of UI improvements. This is all written in QML, and so you have to do quite a bit of hand-tweaking to get the display exactly how you want. Adding support for hardware is, I think, the most important
22:22
thing, which is what I'm working on right now. Another very important thing to enable is Android. There's a whole lot of 3D applications coming to Linux platforms in the Android Play Store. None of those can be analyzed for your driver or for your hardware, and so we need to get API trace working on Android so that we can then capture the traces and
22:45
then analyze them in this way on similar hardware. I've had a little bit of help from some folks I've mentioned before. Lionel has helped me a lot with the performance analysis metrics, and I think his tool, I wish it was being demoed at OZM as well, because
23:04
it's very interesting, so if you find him here, get him to show you what he's done. One thing, when you take a GL program and you relink it, you need to reattach a whole lot of state from the previous program, and that process can be somewhat intricate.
23:21
For the workloads I've looked at, I've done it properly, but whenever there's more features in the GL that an application might have used, that's where the path becomes unpaved. Radeon metrics is what I'm implementing now. Unfortunately, the AMD performance monitor doesn't display metrics. It just exports raw counters, and then you need another application
23:44
to go and compose those counters into usable metrics like we had displayed, so that's a key problem I'm trying to fix now. If anyone's interested, there's a whole lot of features that can be worked on independently, and I'd welcome collaborators. Thanks for listening. Any questions? Yeah.
24:05
The reason that this doesn't address Vulkan at all is because there's no tracing support in API trace, but Vulkan certainly could be addressed with a similar tool. There is a tracing infrastructure that's implemented by LunarG, and RenderDoc has a certain amount
24:22
of tracing, and so there's no reason why the features couldn't be mapped on. I just haven't done that yet because I'm focusing on the GL workload. This is a very cool tool. I was wondering how do you communicate the batch data and the extra details of the shader? Sure. In the i965 driver, you can set
24:44
an environment variable to dump the batch, and you can set an environment variable to dump the SIMD16, so we just capture that on standard out. The batch data is, like I said, it's so much, so there's a special patch that you apply to Mesa and recompile
25:01
it to let you turn on and off that environment variable just before you begin your render so that you don't have to pay that penalty for the whole render. Yeah.
25:35
How did I capture the frame? To get the frame, you use API trace. You say API trace, trace
25:49
this GL workload, and it serializes every single GL call into a file. Before I started the presentation, I played through the frame up until frame 150, which is the one we
26:01
were looking at, and stopped. Almost every GL program on Linux, if it isn't traceable by API trace, the developers have then changed API trace. Yeah, sure. That's what application
26:34
engineers do all the time. They capture, you know, whatever. Grand Theft Auto, and there's actually some teardowns of Grand Theft Auto on Windows where they go through
26:42
the different renders and then show you the techniques. You could conceivably go and export the vertex data and the texture data, and that wouldn't be legal, but you can go and hack away. Any other time? Okay. Thank you.