We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Graphics Performance Analysis with FrameRetrace

00:00

Formale Metadaten

Titel
Graphics Performance Analysis with FrameRetrace
Untertitel
A Responsive UI for ApiTrace
Serientitel
Anzahl der Teile
644
Autor
Lizenz
CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
LeistungsbewertungRahmenproblemWeb-SeiteGüte der AnpassungMultiplikationsoperatorCoxeter-GruppeComputeranimationVorlesung/Konferenz
Humanoider RoboterSystemplattformGraphikprozessorp-BlockHardwareQuellcodeZahlenbereichCoxeter-GruppeSystemplattformPhysikalisches SystemMaßerweiterungProjektive EbeneBeanspruchungOpen SourceMinkowski-MetrikIntegralFunktionalanalysisGraphikprozessorGüte der AnpassungFeuchteleitungAusnahmebehandlungsinc-FunktionMultiplikationsoperatorFigurierte ZahlSpieltheorieProfil <Aerodynamik>OrtsoperatorKartesische KoordinatenFunktionentheorieBildschirmfensterProgrammfehlerSoftwareentwicklerBitQuellcodeGrenzschichtablösungHardwareAnalysisCodeProdukt <Mathematik>Treiber <Programm>Computeranimation
AnalysisIntelKernel <Informatik>GraphikprozessorCASE <Informatik>RahmenproblemAnalysisProjektive EbeneMaßerweiterungZahlenbereichAblaufverfolgungComputeranimation
SystemplattformKernel <Informatik>GraphikprozessorIntelAnalysisSoftware Development KitDifferenteImplementierungPortabilitätBildschirmfensterTreiber <Programm>SystemplattformBeanspruchungHardwareVerzeichnisdienstRahmenproblemVerzweigendes ProgrammProfil <Aerodynamik>Rechter WinkelComputeranimation
AnalysisTreiber <Programm>SchnittmengeComputerarchitekturDifferenteNeuroinformatikCASE <Informatik>VolumenvisualisierungRahmenproblemHardwareVorlesung/Konferenz
SystemplattformGraphikprozessorKernel <Informatik>IntelAnalysisShader <Informatik>Metrisches SystemVolumenvisualisierungKonstanteGleichmäßige KonvergenzDatensichtgerätAggregatzustandVisualisierungStapeldateiHardwareImplementierungLoopVolumenvisualisierungMaßerweiterungp-BlockKartesische KoordinatenAblaufverfolgungComputeranimation
AnalysisMetrisches SystemGraphikprozessorVolumenvisualisierungShader <Informatik>Gleichmäßige KonvergenzKonstanteDatensichtgerätAggregatzustandVisualisierungStapeldateiStapeldateiRahmenproblemUmwandlungsenthalpieMultiplikationsoperatorVolumenvisualisierungSystemaufrufMetrisches SystemSoftwareentwicklerTreiber <Programm>HardwareBeanspruchungComputeranimation
AnalysisGraphikprozessorVolumenvisualisierungMetrisches SystemShader <Informatik>AggregatzustandDatensichtgerätKonstanteGleichmäßige KonvergenzVisualisierungStapeldateiProfil <Aerodynamik>SelbstrepräsentationLokales MinimumRahmenproblemHardwareVolumenvisualisierungSpeicherabzugDemo <Programm>KonstanteDebuggingShader <Informatik>FehlermeldungMathematikUmwandlungsenthalpieUniformer RaumAggregatzustandComputeranimation
Demo <Programm>Shader <Informatik>Metrisches SystemGleichmäßige KonvergenzVolumenvisualisierungPixelSpeicherabzugFrequenzGraphikprozessorMittelwertThreadBitrateInterprozesskommunikationGleitkommaprozessorBinärdatenHybridrechnerMathematikDreiwertige LogikMultitaskingPolygonSoftwaretestStreaming <Kommunikationstechnik>VertikaleElementargeometrieKnotenmengeTesselationAggregatzustandSystemaufrufStapeldateiQuellcodeFließgleichgewichtROM <Informatik>CachingTotal <Mathematik>TLB <Informatik>Prozess <Informatik>EinflussgrößeMAPHardwareMaß <Mathematik>Fächer <Mathematik>RahmenproblemMagnetbandlaufwerkHierarchische StrukturRahmenproblemVolumenvisualisierungDeskriptive StatistikMetrisches SystemGraphHardwareKnotenmengeMailing-ListeTabellePixelCachingUmwandlungsenthalpieHilfesystemComputeranimation
DigitalfilterJensen-MaßGammafunktionTextur-MappingZählenPufferspeicherElementargeometrieQuellcodeFließgleichgewichtKnotenmengeSampler <Musikinstrument>EbeneVektorrechnungp-BlockProgrammschleifeKonstanteSummierbarkeitFlächentheorieFehlermeldungSynchronisierungDatentypSchiefe WahrscheinlichkeitsverteilungCachingTLB <Informatik>Nichtlinearer OperatorChi-Quadrat-VerteilungZeiger <Informatik>LASER <Mikrocomputer>MUDAdressraumIndexberechnungBeanspruchungRenderingMereologieObjekt <Kategorie>Shader <Informatik>Einfache GenauigkeitTreiber <Programm>VolumenvisualisierungBildschirmmaskeHalbleiterspeicherFunktion <Mathematik>HardwareFeuchteleitungBitHydrostatikSoftwareentwicklerBildgebendes VerfahrenTouchscreenZwischenspracheOISCDreieckStapeldateiPixelProgrammiergerätRahmenproblemSimulationSoundverarbeitung
SpeicherabzugGraphikprozessorMetrisches SystemVolumenvisualisierungStapeldateiSystemaufrufAggregatzustandShader <Informatik>VolumenvisualisierungPixelMetrisches SystemMailing-ListeSchärfentiefeFokalpunktCliquenweiteSoundverarbeitungGraphKartesische KoordinatenComputeranimation
Shader <Informatik>PixelGraphikprozessorSpeicherabzugMetrisches SystemVolumenvisualisierungStapeldateiSystemaufrufAggregatzustandIntelElementargeometrieThreadKnotenmengePolygonMultitaskingSoftwaretestTextur-MappingVolumenvisualisierungPixelStandardabweichungDifferenteShader <Informatik>KnotenmengeBitComputeranimation
Metrisches SystemVolumenvisualisierungStapeldateiAggregatzustandShader <Informatik>SystemaufrufDigitalfilterVolumenvisualisierungUniformer RaumMessage-PassingRenderingComputeranimation
Metrisches SystemWeitverkehrsnetzSpeicherabzugVersionsverwaltungElementargeometrieTesselationFließgleichgewichtQuellcodeSystemaufrufStapeldateiAggregatzustandShader <Informatik>VolumenvisualisierungKnotenmengeÜbersetzer <Informatik>Textur-MappingSampler <Musikinstrument>DigitalfilterGreen-FunktionKlon <Mathematik>SichtenkonzeptMenütechnikSommerzeitSCI <Informatik>Jensen-MaßNichtlineares GleichungssystemElektronischer FingerabdruckPrimitive <Informatik>PolygonCliquenweiteVolumenvisualisierungGraphfärbungMenütechnikAggregatzustandSchnittmengeFahne <Mathematik>Shader <Informatik>Hierarchische StrukturFormation <Mathematik>BruchrechnungAutomatische IndexierungJensen-MaßKonfiguration <Informatik>RichtungFehlermeldungZeichenketteSchreib-Lese-KopfRhombus <Mathematik>DifferenteDreieckMultiplikationTopologieCulling <Computergraphik>Computeranimation
Metrisches SystemVolumenvisualisierungSystemaufrufStapeldateiAggregatzustandShader <Informatik>VolumenvisualisierungAggregatzustandHalbleiterspeicherFeuchteleitungRahmenproblemComputeranimation
VolumenvisualisierungMetrisches SystemShader <Informatik>StapeldateiSystemaufrufAggregatzustandDigitalfilterVolumenvisualisierungDemo <Programm>MathematikRahmenproblemKategorie <Mathematik>AggregatzustandComputeranimationVorlesung/Konferenz
Treiber <Programm>QuellcodeDatensichtgerätTextur-MappingHumanoider RoboterVolumenvisualisierungVisualisierungHardwareElementargeometriePolygonnetzPufferspeicherBeweistheoriePixelDatensichtgerätKnotenmengeVolumenvisualisierungEinsPuffer <Netzplantechnik>ImplementierungTextur-MappingHardwareSoftwareentwicklerDifferenteFunktionalanalysisSchnittmengeMereologieDiskrepanzBitSystemaufrufBildschirmfensterProfil <Aerodynamik>ElementargeometrieTreiber <Programm>VisualisierungSystemplattformMAPFigurierte ZahlVorlesung/KonferenzBesprechung/InterviewComputeranimation
Humanoider RoboterKartesische KoordinatenHardwareSystemplattformInformationsspeicherungAblaufverfolgungBitÄhnlichkeitsgeometrieHilfesystemVorlesung/Konferenz
ProgrammschleifeBenutzerfreundlichkeitObjektverfolgungMetrisches SystemMetrisches SystemSchlüsselverwaltungAnalysisProgrammierungAggregatzustandBeanspruchungDatensichtgerätKartesische KoordinatenBenutzerfreundlichkeitHilfesystemProzess <Informatik>Computeranimation
Cheat <Computerspiel>VolumenvisualisierungSystemaufrufProgrammierumgebungSpeicherabzugRahmenproblemFormation <Mathematik>AblaufverfolgungAdressraumBeanspruchungStapeldateiKartesische KoordinatenEinfache GenauigkeitKnotenmengeMultiplikationsoperatorSoftwareentwicklerPatch <Software>ProgrammierungKollaboration <Informatik>BildschirmfensterTextur-MappingElektronische PublikationStandardabweichungTreiber <Programm>Coxeter-GruppeVorlesung/KonferenzBesprechung/Interview
Kollaboration <Informatik>GoogolDienst <Informatik>Flussdiagramm
Transkript: Englisch(automatisch erzeugt)
Good morning. Welcome to FOSDEM. I hope you guys had a fun time last night. It's good to see so many people have recovered from the festivities. Hopefully I'll have a seat to sit down when I'm done with my talk. It's very crowded. I was told before I came
that this was the worst possible time slot to give a talk at FOSDEM first thing Saturday morning, right? But I'm happy to be here. I'm glad that Luke has organized the Graphics Dev Room again this year and that he made time for me to give a short presentation
on my work, so thanks a lot. I've been working on Linux platforms for more than a decade. Several of those years I spent building graphics performance tools based on a Windows tool that was used throughout the industry. In that position I was able to see how important
performance analysis tools are for graphics workloads. My project over the past few years has been to try to enable the same workflows for Linux platforms. I've also spent a lot of time automating the integration system for Mesa at Intel, which has helped Mesa's
productivity and quality quite a bit. But this is really the project that I've been most interested in since I started with the Mesa team. So a little bit about GPU tools and why you don't really have very many good solutions
in the Linux space. In general, when you have GPU tools, there's a graphics card vendor that understands it's very difficult to go and find out performance bottlenecks or what's happening on the GPU. They've gone and funded some tools specific to their own
hardware to help developers or their own driver team figure out what the performance profile is of specific applications. But they are very reluctant to go and enable the same capabilities for their competitors. If you do find a good GPU analysis tool, you'll
find it only works with an AMD GPU or an NVIDIA GPU. Some of the exceptions in the Linux space are made by Microsoft or other entities that care more about cross-vendor functionality.
Most of the tools are written for Windows and Linux as an afterthought. They're either closed source or the extent to which they're open source is just two commits where they've dumped a huge pile of code into a GitHub account. Whether it compiles or not, you may find that it does not. This is changing a little bit. Intel has
some engineers that are working on performance tools like myself, and Lionel Landerwillen and Robert Bragg have worked on GPU Top. So there is more native support for performance tools. RenderDoc is another example where Valve has gone and funded a developer to really
invest in native Linux graphics analysis tools. One thing about a lot of the tools is that tracing and retracing is often not reliable. This can be because the tool was initially written for Windows DX11 or DX10 games. And
then when they go to implement tracing for OpenGL, they find the complexity of the extensions makes it hard to really capture the workload that you want to investigate. Another reason why tracing is often unreliable is there's not that many users. You might
have a tools team that goes and tries to build a tool, but unless you have lots of developers going and applying it and looking at different workloads, you're not going to discover the bugs in your tracing system. And up until recently, a big barrier has been the support for GPU performance counters
in Mesa. Since Linux 4.13, that's enabled now for Intel GPUs, and AMD Performance Monitor is available for some of their newer hardware as well. So now that Mesa is exposing these extensions, there's a whole lot more that we can do.
So my tool is called Frame Retrace. It's built on top of API trace. I chose API trace because I think it's the most widely used GPU analysis tool. There's a lot of people that use it for quality assurance to make sure that the frames retrace properly. And
because it has a large number of users, there's often a lot of corner cases of tracing that they've gone and fixed. It's a community supported project, so there's lots of people working on it. Right now, frame retrace is just a directory in a branch of API trace. It's just a UI
that is built on top of it. Because API trace is cross platform, frame retrace is also cross platform, so it will investigate OpenGL workloads on Windows just as well as it will on Linux, and that's an important capability for driver teams. Because if you have two different driver implementations for different platforms, you can compare the
performance profile for the workloads and find gaps in your implementation or in the Windows implementation. Our counter support begins with Haswell. There were hardware counters prior to Haswell, but the architecture was different enough that the driver team decided not to enable
them. So your performance will be better with a newer computer anyways, right? So the Mesa driver team has been using this tool heavily to go and find issues in their driver, and there's a whole set of examples of different special cases that
they've missed and we found basically by looking closely at each render in a frame and understanding what the bottleneck is. Right now, I'm trying to add support for Radeon hardware and Raspberry Pi through the AMD Performance Monitor extension, and there's some other folks that are looking
at that with me, and it's going pretty well. There's a few stumbling blocks for the Radeon implementation of that extension. I think that cross-platform support in this tool is one of the main things that needs to be finished before it's a good candidate
for being upstreamed into API trace. I think that you'll see that the tool is pretty compelling and useful and superior to the API trace UI in some ways, so I'd like to see it go upstream. So what does this tool do? Most graphical applications have a render loop, and the render
loop just renders the frame over and over again. So if you are looking just at the renders in those frames, you can divide up the frame into each specific draw call, and this tool will give you the metrics associated with each draw call, and you can
see exactly which render is the one that's taking all the time in your frame. Without it, I mean, generally, you just have a huge asynchronous workload going off to the GPU, and you have no idea why you're missing vsync. You can explore the frame by selecting specific renders, and it'll show you the render targets throughout the frame, which
is helpful to understand how a frame is composed. It has an API log, which is pretty standard. For driver developers, it's pretty helpful to have batch disassembly, so the batch commands, which are sent directly to the hardware, are disassembled and associated with the render that you've selected. So this is a capability that, at least on Intel
hardware, you have to, up until now, you would have to dump hundreds of gigabytes of data for any kind of meaningful frame, and then try to sift through the data to try to find out exactly which render went wrong, and this will give you a much more performant
implementation and let you see exactly what's going to the hardware for each draw. One of the main features that end users and game developers need is a shader debugger or some way to experiment with their shaders and find out why their shaders are mis-rendering. So with frame retrace, you can go to a specific render, look at the shader, change the
shader, edit it, compile it, and it'll render again, and it'll give you a new performance profile for that shader, or an error if you've made a mistake. You can do the same thing with uniform constants. Just go and see what the constants are and change them, and the frame will render again. There's a couple of experiments that you can do to help you
try to figure out what the max performance would be for a specific render, and the thing that I've just been editing now is a hierarchical representation of all the GL state so that you can change the coalface and see what happens. So if you have a
problem with your GL state that's affecting rendering or performance, you can muck with that. So those are the things we'll go through in the demo. So I'm taking a risk. Let's have a demo and see what happens. This is the UI for frame retrace, and this blue bar is actually a graph of renders with
no metrics, but you'll see here there's a long list of GPU metrics associated with the L3 cache, the pixel shaders, the vertex fetch hardware. A lot of these are somewhat
inscrutable if you're not familiar with the hardware or don't do a lot of geo-programming. The one that you really want to look at if you want to see why is this slow is you look at how many clocks were required to render the frame. And so this is a graph where each bar is a specific render. There are quite a lot of them, but by far the
most expensive one is here, and there's a table that will show you the metrics. So here is the clocks, and you can see that it's more than 10% in the entire frame. It's just for this one render. So if you're curious about what a GTI L3 bank L2 read
is, there's a longer description for that metric that will help you decipher what it means. But typically you can go through here and find an explanation for why this might be the bottleneck for your workload. If you want to see the render target at this
part in the frame, you'll see that our heroine has found the object of her desire. The rendering of this frame, if you want to see what's actually being rendered, it's rendering the whole screen. In the API calls, it's just drawing a couple of triangles
for the rect. So it's a little bit puzzling why this might be long, but there's also this GL memory barrier, which is probably something that we'd be interested in looking at. If you want to search for GL memory bearer, you can look at the different renders, which
contain GL memory barriers. So if you wanted the experiments, if you wanted to see, okay, well, how fast would this be rendered if I just had a simple shader with it just drew pink, you can select that and you can see that the cost is much lower. We go to
the shaders and in the fragment shader, it's got the, you know, just a substituted fragment shader that just draws pink. So let's disable that and go back to the shaders. So we're now in the fragment shader again, and you can see that there's quite a long fragment shader. So it looks like it's processing all the pixels with some effect, I guess.
The vertex shader, if you look at it, it's a whole lot of nothing until you get to the very bottom and it just does nothing. So we capture the intermediate representation and the static single assignment form that's output by the Mesa driver. NER is our new
intermediate representation and the SIMD8 is what's actually sent down to the hardware. Same thing for the fragment shader. You can see exactly how the shaders are compiled. So this is very helpful for a driver engineer, or I guess if you're an elite OpenGL programmer,
maybe you could make sense of this. So we spoke about the batch. This is an example of the batch. If you look at a handful of renders, you can select one and you can see this is the binary packet that's sent down for the rendering. Again, more for
driver developers. All right, so let's go back to experiments. If we look more closely at these renders, let's look at the render target. You can see that if we stop at render,
that means it's going to show the render target immediately after this render. If you advance through these renders and you can see that it becomes progressively blurrier, so there's a little blur and it's going to get even more blurrier on this render. And then finally, it's going to compose those blurry images based on the depth of
each pixel. So in the background, there's a light here that's quite blurry. And if you look at the first render, it's in sharp focus. So it's a depth of field effect that they're achieving with these final renders. It's just one example of how you can experiment. So, this is an expensive pixel, but it may just be expensive, expensive render, it may
be expensive because there's quite a lot of pixels. So if you want to look for expensive per-pixel metrics, you can graph on the second axis. So I'm just going to narrow the list of metrics that are displayed. So now the width of each bar represents roughly
how many pixels are drawn. And so you might look for narrow, tall bars representing very expensive renders. So let's disable this one to make it larger. So you might focus in
on this tiny shader here, which I guess because of the way it's drawing this particular texture. It's not very many pixels at all, but it's quite expensive per pixel. All right, so what I want to do now is explore a little bit. So let's go to standard
bar and I'm going to look for vertices. So I want to go and look on the render
target for where our heroin is rendered. You can see the different render targets that are drawn in this pass. And if we highlight, we'll see that those are the renders that
are drawing her body. And so let's see. We'll start here, I think. There we go. So this is the full rendering of the character. If you clear before the render and stop after,
all you'll get in the render target is the character itself. So the reason I wanted to do this is to show how you can go to the uniforms. These are all the uniforms that are bound for the render. You can just change one of them, hit return, go back to the render
target, and we've gone and moved ahead. So for people who aren't really familiar with OpenGL, this is a really kind of interesting way to look and dissect a more complicated frame and understand some of the techniques or how the API is used. So let's put her head back on. Oh, I mentioned shaders. So let's go to the vertex shader. Somewhere
at the bottom it's going to assign a color. Let's just go ahead and modify that. I mean, if I compile this, I should get a syntax error saying I've made a mistake, but let's
do zero. One point zero. So I'm just going to make the red channel zero with that multiplication. And we'll go look at the render target, and now we have a hulkified heroine. So
that really demonstrates that you can mess around with the shader, try to figure out why it's mis-rendering. You can see how quickly this is. I mean, the fact that you can do this in a fraction of a second is far better than what you had before with the other tools. So what I've been working on recently is this hierarchical state tree. So you
can collapse different items that you don't want to look at. If you don't know how I've organized them, you can search for sub-strings, like maybe I'm looking for the scissor state. If you want to go and change something, the menu shows you the full set
of available options in the GL for this particular blend feature. And a lot of the different GL state settings have a set of four values, so it'll give you the index of each one. It might be red, green, alpha. It might be some kind of enabled flag. So
if I go and disable green, I mean, our heroine was green before, but if I say, hey, there's no green, and we look at the render target, we'll see that she's kind of fading away. So it's fun to play with. But here's another one where culling is enabled for
this character. That means that the triangles on the back of the character are not rendered because they're facing the wrong direction. If I change it to cull the front of them instead of the back, I go look at the render target, I turn the character around, and she's decided that it's too dangerous to go after the diamond and is going to avoid
disaster and walk right back out. So that's just an example of how you can mess around with these things. One thing that's interesting, if we go back and look at the final render, I've gone and changed the state, but the character hasn't turned around for the final
render. And the reason for that is that I actually disabled that draw with the memory barrier in my experiment. So if I turn the frame back on so it's rendering properly and look at the render target, I'll see that the final frame is rendered with the changes. So that's my demo of the features. I think there's a lot more that can be
done in each tab. There's a whole lot of glState that I haven't gone and implemented, but I think what I've tried to do is demonstrate that each category of state is supported in a relatively easy way to expand, and there's a bunch of experiments that need
to be added, but the proof of concept is there. All right, so back to the things that still need to be done. Well, one thing I didn't talk about too much is that the fact that you can have this exact same performance profile for Windows is very important for driver developers because differences in rendering
will stand out starkly when you compare two different sets of this UI running on different platforms because the renders are exactly the same, it's running the same glCalls, and so you can easily find discrepancy in your implementation. Things that need to be done. There's no tab for looking at the textures. If you're
texture-bound, having an experiment that will clamp the mipmap level down so there's not so much texture data going down is important to see if you've just made textures that are too large. There's no display of the geometry or the vertices, so that's something that I think is of interest to end developers to try to figure out, okay, maybe
there's just so many vertices that I'm stuck at that part of the fixed-function pipeline. The depth buffer is not displayed. Unity specifically asked for overdraw and hotspot visualizations in the render target, where if you've drawn twice to the same
pixel in the render target, it'll show up as more expensive, help them figure out if they've got a problem with their engine. There's a bunch of UI improvements. This is all written in QML, and so you have to do quite a bit of hand-tweaking to get the display exactly how you want. Adding support for hardware is, I think, the most important
thing, which is what I'm working on right now. Another very important thing to enable is Android. There's a whole lot of 3D applications coming to Linux platforms in the Android Play Store. None of those can be analyzed for your driver or for your hardware, and so we need to get API trace working on Android so that we can then capture the traces and
then analyze them in this way on similar hardware. I've had a little bit of help from some folks I've mentioned before. Lionel has helped me a lot with the performance analysis metrics, and I think his tool, I wish it was being demoed at OZM as well, because
it's very interesting, so if you find him here, get him to show you what he's done. One thing, when you take a GL program and you relink it, you need to reattach a whole lot of state from the previous program, and that process can be somewhat intricate.
For the workloads I've looked at, I've done it properly, but whenever there's more features in the GL that an application might have used, that's where the path becomes unpaved. Radeon metrics is what I'm implementing now. Unfortunately, the AMD performance monitor doesn't display metrics. It just exports raw counters, and then you need another application
to go and compose those counters into usable metrics like we had displayed, so that's a key problem I'm trying to fix now. If anyone's interested, there's a whole lot of features that can be worked on independently, and I'd welcome collaborators. Thanks for listening. Any questions? Yeah.
The reason that this doesn't address Vulkan at all is because there's no tracing support in API trace, but Vulkan certainly could be addressed with a similar tool. There is a tracing infrastructure that's implemented by LunarG, and RenderDoc has a certain amount
of tracing, and so there's no reason why the features couldn't be mapped on. I just haven't done that yet because I'm focusing on the GL workload. This is a very cool tool. I was wondering how do you communicate the batch data and the extra details of the shader? Sure. In the i965 driver, you can set
an environment variable to dump the batch, and you can set an environment variable to dump the SIMD16, so we just capture that on standard out. The batch data is, like I said, it's so much, so there's a special patch that you apply to Mesa and recompile
it to let you turn on and off that environment variable just before you begin your render so that you don't have to pay that penalty for the whole render. Yeah.
How did I capture the frame? To get the frame, you use API trace. You say API trace, trace
this GL workload, and it serializes every single GL call into a file. Before I started the presentation, I played through the frame up until frame 150, which is the one we
were looking at, and stopped. Almost every GL program on Linux, if it isn't traceable by API trace, the developers have then changed API trace. Yeah, sure. That's what application
engineers do all the time. They capture, you know, whatever. Grand Theft Auto, and there's actually some teardowns of Grand Theft Auto on Windows where they go through
the different renders and then show you the techniques. You could conceivably go and export the vertex data and the texture data, and that wouldn't be legal, but you can go and hack away. Any other time? Okay. Thank you.