Sync points in the Intel gfx driver

Video in TIB AV-Portal: Sync points in the Intel gfx driver

Formal Metadata

Sync points in the Intel gfx driver
Why and how we're adding a sync API to the Intel gfx driver
Alternative Title
Graphics - Intel Gfx
Title of Series
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Production Year

Content Metadata

Subject Area
Point (geometry) Intel Device driver Mechanism design Synchronization Point (geometry) Synchronization Spacetime
Intel Context awareness Scheduling (computing) Sheaf (mathematics) Software bug Synchronization Atomic number Kernel (computing) Charge carrier Befehlsprozessor File system Software framework Information Graphics processing unit Area Mapping Block (periodic table) Namespace Point (geometry) Data storage device Electronic mailing list Metadata Streaming media Bit Befehlsprozessor Ring (mathematics) Buffer solution MiniDisc Summierbarkeit Right angle Quicksort Figurate number Block (periodic table) Physical system Reading (process) Writing Geometry Point (geometry) Slide rule Implementation Vapor barrier Service (economics) Computer file Variety (linguistics) Civil engineering Device driver Online help Data storage device Streaming media Event horizon Field (computer science) Interprozesskommunikation Writing Device driver Read-only memory Operator (mathematics) Computer hardware Software testing Implementation Stapeldatei Server (computing) Consistency Cartesian coordinate system Similarity (geometry) Cache (computing) Kernel (computing) Pointer (computer programming) Hypermedia Personal digital assistant Computer hardware Synchronization Vertex (graph theory) Object (grammar) Electronic visual display Scheduling (computing)
Android (robot) Intel Scheduling (computing) Multiplication sign View (database) Set (mathematics) Mereology Inference Mechanism design Mathematics Synchronization Different (Kate Ryan album) Kernel (computing) Memory management Videoconferencing Moving average Software framework Process (computing) Extension (kinesiology) Identity management Block (periodic table) Point (geometry) Menu (computing) Complete metric space Flow separation Befehlsprozessor Process (computing) Buffer solution Order (biology) Block (periodic table) Hacker (term) Arithmetic progression Geometry Directed graph Point (geometry) Frame problem Trail Implementation Server (computing) Service (economics) Addition Computer file Virtual machine Device driver Branch (computer science) Time domain Device driver Term (mathematics) Googol Operator (mathematics) Ranking Contrast (vision) Data structure Form (programming) Multiplication Tape drive Interface (computing) Video tracking Code Frame problem Kernel (computing) Personal digital assistant Device driver Synchronization Object (grammar) Electronic visual display
Intel Scheduling (computing) Context awareness Code Direction (geometry) Archaeological field survey 1 (number) Water vapor Client (computing) Inverse element Mereology Independence (probability theory) Exact sequence Pointer (computer programming) Mechanism design Synchronization Object (grammar) Kernel (computing) Set (mathematics) Videoconferencing Moving average Flag Electronic visual display Software framework Information Graphics processing unit Touchscreen Broadcast programming Trail Block (periodic table) Point (geometry) Web page Streaming media Term (mathematics) Complete metric space Flow separation Sequence Data management Process (computing) Buffer solution Artistic rendering Order (biology) Website Summierbarkeit Figurate number Block (periodic table) Spacetime Web page Point (geometry) Ocean current Frame problem Asynchronous Transfer Mode Server (computing) Implementation Service (economics) Civil engineering Patch (Unix) Real number Letterpress printing Device driver Distance Event horizon Device driver Goodness of fit Term (mathematics) Operator (mathematics) Authorization Directed set Traffic reporting Addition Multiplication Stapeldatei Server (computing) Weight Planning Semaphore line Limit (category theory) Cartesian coordinate system Frame problem Vector potential Particle system Word Spring (hydrology) Kernel (computing) Personal digital assistant Device driver Synchronization Video game Object (grammar) Electronic visual display
Device driver Intel Googol Point (geometry) Synchronization
5 0 answer then so because you don't know me my name is jesse bonds are work at the until open-source technologies with some of these guys and hopefully more you at some point going to talk a little about are synchronization work that were that were doing that right now they're moving away from or not and so moving away from that having a having some additional mechanisms that she was some additional features of the give users space more flexibility so that I think
most you know what the fences all are and at least roughly about will over I I'll barriers which is actually where I came from normal when I 1st came the graphics before the fences and I was used to people talking of barriers and all that help me understand it so tell the world about that and talk about some of the existing implementations of of fencing and synchronization and tackled many bits of 1 of the things we talk about Alan at team's doing things today and what we're adding this end but also a lot of things in the slides on this talk a little about some the future and and motivation so something to keep in
mind uh for this talk are just questions so in another the different there are ways to to some work geographic strata and ways to do civilization you might want to do the things but you know say for example you have any buffer handles you just doing now mallets and your executing on GPU or some other awful engine so how you how you do not have the synchronization the or what if you just passing the driver not a list of of pointers of objects that you're operating on but just a single person can execute this so friends found for similarly if you're just emitting commands directly from when 3 like your application no and maps on GPU ring and you just this sort of thing or you want to know something like will or service or X or whatever to do some steadily itself or independent of what the kernel during the the data from the doesn't do any scheduling but that's that's going to change shortly will arm and then additionally you know it can be tough to debug um performance problems are lock up since the fall of the if all of the synchronization is done for you implicitly inside the kernel the test fields on fact that a lot of the bugs we have all the really tough bugs tend to be in that area was looking at German try and figure out what what what what just happened the or like the DMA fences solve it with you don't see grasping drama block you can always do that easily all all and so the driver it's possible but it's nice to be outside as well so anyway he means in mind we want to select a said and coming from from from work and Ireland disks way back when I know we talk a lot about I barriers and you in in the context of drug file systems so that there's in a regiment of the very regimented it ended in data and barriers so that you know you have an explicit synchronization points and you know you have consistency relative to that of the and unfortunately and in Linux said was a real challenge that because when it runs at a variety of hardware some of the some of the storage devices that we're on top of you can even do this reasonably so even if you flush the caches and you may not be a way to ensure that the stuff that we have this and that of and it that this is also not not something that's explicit it's not like Henry pass around you don't pass around the file descriptor referring to over generally this you just do a spot here I'll stream from the application so here's just the really basic example we've got like a stream of I coming from Iraq the the right the variances than than the right so the barriers ensures that those previous accesses have completed that right and landed on this before you get the next straight so that's I can guarantee some sort of atomicity and and that of the consistency so the moving on to the graphics side section very similar think the added added this to deal with that the a dual 1 point the for their duties so they want a way to do to do synchronization especially so that they can allow applications to kind of read and write buffers on the CPU and GPU and of rank-and-file framework and you haven't Geo finish the whole the whole operation or something on so that that has a potential to really improve performance in this example you can see that several contexts and each of the dating back to the provocative actions in and that's what the offences you see context days amid some or maybe the 2nd batch as a buffer but once that French past then you know that the CPU nose OK I can start accessing this buffer now it's it's consistent with the rest of the geochemistry of but you don't have to do a Gelfling finish of you know your whole operation so that can really improve performance because now you can kind of you know messenger R. puberty vertex buffers while you're still on running work and in the end events case the namespaces Hopper context so you can see context a context need each other events by idea of 1 of them and that's changed with more recent stuff on the on the on the fence that don't work the so this was a big big improvements to jail and as such it was constant and obstacles so the sum of about while laughter but I think what people want and different anyway so it's pretty similar of it has added a a few things that the namespaces is now shared across contexts so that just makes 0 on the application side manages events the of of that and this client-server
distinction so and in the in the previous case the the this is a pretty Spencer's you have operations in G allow you to weigh on them or check whether the signal the the pop sink implementation allows you to either block you're process or ask the server block for you so if you've got like a persuasive Rolodex or well on your service where whatever and potentially augmentation allow you to have the blocking occur there and you can still do than In of wasting if the CPU efficiency so that's where we are today with part of the charter but on top of that there the Android guys of were probably the person so to really put together and ship them in a big way explicit synchronization on mechanism the layout of the geometry of the the extension so this this is notes if it works with the opsin framework and but it's designed so on top of some underlying OS mechanism for synchronization and that you that's provided an explicit way so this in importance and import files refer as a single object and then use that like you wouldn't see that for him and like I said then rinsing framework is what that's about half of and of his and with is so and that's that's currently in the Linux staging branch to so it is it is upstream form the and it's it's assigned that support multiple kernel drivers on so it's not just a graphics thing it's was that has an maybe I use this can actually use for waiting on hold of merging creating the objects the and on on internal side of C. it uses these timelines so here we have a history of most simple for view of the pirates and work on my machine with multiple devices and drivers to you generally have all time per device or per engine that's that's completing work in some fashion so in the case of hello and here we've got you you just to run a pipeline the video because pipeline which may or may not be partly due to the camera pipelines no spitting frames of so you probably have a finite creature those and then you have the points that correspond to different events on on that of time 1 of example at after the you decode frame is done you probably signal on the fence the the sink objects that that was complete so when you just want to consume the buffer feeding on onto another driver do similarly you can decide whether you use at your completing work you signal and pass those on camera this is this is really In contrast to the DMA fence implantation we have in the world today where the stuff is an implicitly but there's no reason that you can work together so on the Americas thanks to Robert model for the viewing that there wasn't too well to to give it all upstream from the now really nice and it's a lot of simplified quite a few of the drivers each of the drivers active and can be implemented their own their own form of completion tracking have their own form of fencing internally with the references we move over over to using common set of clothing of interface which and another set of this within the kernel the callbacks we can enable signaling and can make things explicit if you want but it also handles implicit and what lessons about this allows you to actually use of Harbor Harbor synchronization sodium inferences you you want in such a way that so you have your GPU tied to your video decoder engine and video decoder engine when it completes a friend and signal the harbor the of wire over to the GPU that that's complete the difference of 2 whereas I think you 306 framework would have to so on the identity inside we don't use the references in and that's kind of a work in progress it's kind of a picture for us we did just make big change over to using our own request tracking structure because we're getting ready to go and out of order in terms of our commence emission of the interview schedule so needed to move over to that with the next logical step is to use the references and several questions in the working on that and the synchronization today and 15 this is implicit so going back this
example of rather than having fences in your mainstream I today the buffers come back like fences so for example in this context of these submit some work that depends on all consumes a some of the buffers that context they had red written kernel what automatically synchronize that for you make sure of the operation current order rather than users with having to create offense weight on evidence in the work of sold the synchronization that's what I mean when I say all inputs of on the on the other hand that because we're using buffers and this civilization is provided you can actually use buffers as kind of a offense like objects and so if you look at the SNA code risk that a lot of other things like that to make sure doesn't have bubbles to the particle I so the plus side of implicit so of the implicit synchronization is it's really easy breeze visitor right then afterward about or in other words often consist of text here or on the on the downside it's you know as much visibility and what's going on necessarily especially if we're having a schedule in the future and so we will use examples to apply for the waiting of buffers that you could have submitted things out of water and kept the GPU busier than the other 1 so that's kind of 1 of the downsides of the implicit synchronization just easy to is to be inefficient the and as said about buffers can actually you have with the mechanism because you can query a buffer status and see if it's done there's a there's a downside of using buffers is not quite the same as having a dedicated video personalization of so that the plans and that we have actually limitations for this today on the way upstream of the independent synchronization get some of the some questions which will be the effect of so the plan years to add add flights to some of the existing apples like exact and so the summit of batch the GPU you can things so the flag would say I wanna get the offense for this batch and then you will know now that we have and all the internal layers of when we were to the emittance and we without all of which and on the usually support was the 100 synchronous API because it's so only invent a new maybe I went into its inverse Framework RT has 1 of its reason 1 so you and then other entry points so it's not just the GP rendering it's also display site where you we need to know when things happen so today we actually have a page with completion events but will be titles and hence figure out what to different object that you do most of the words of so that you can synchronize things up in your in your display server the on and as I said the GPU scandalous coming which should be really cool that should actually allow the kernel to get better utilization of of the GPU rather than stalling on internal semaphores that I'm in which also allows to prioritize edges so so if you've got a display server needs to get something on the screen otherwise user experience suffers a can they can do that at the expense of some of the other applications that might be I'm going to and when they war origin 7 parts to have a real preemption of Walker preemption you insert arbitration and print those points with future GPU separate preemption so we can you can actually create a which of course can be real scandal on the MESA side I I did some simple patches to the plenoptic in terms of the the that it's just the sum of buffers and that was a to attribute that you don't really have a lot in terms of of the implementation of the base of what makes next produced pretty simple so going back to the questions of 1 of the big ones uh what you don't have buffer handles and it's really hard to do buffer-based synchronization you don't buffers so the St. Francis and an explicit synchronization couples go around so we can we can actually query for command completion of the the authors again if if you're passing the driver just appointed same go run this on we know what you're accessing care we can get sensors suspect for that as well and synchronise against it without having that knowledge of the buffers I think that when 3 the 3 case honoring 3 case that kernel drivers and where what's being submitted but there still cases we wanna synchronize so in that case you do have to evolve the can be answered of the offense somewhere and and also like I said there's a potential for uses the scheduling so if you've got you know multiple direct rendering clients all the buffers to say well and server and when they when those clients send buffers to the Windsor survey they also sentences corresponding to move buffers well and can actually do since these intelligent scheduling pick what do 1st so for example if if it sees that it's got you know 8 out of 9 of buffers of needs to render the screen and it knows take you know 5 milliseconds to get everything ready and then render the screen and there's another client but that doesn't have radio and historically has been taking longer a concurrent events and say all this was not done and you know the more use the old frame and and well can actually when it's the it's spring frame he user experiences as good as it can be with a client server they're low but at least you your service where the managers getting its friends of the user experiences and so that's what a nice applications of of the explicit and if you passing back and forth process the I did mention that the framework uses follows your so once you get a single point from the justify repression pass around like about how you what clothes you use the same point on top of that it becomes life easier to debug report from the lockups because you've got this was synchronization so on the and so they've got some tools to visualize timelines and I can show you all the same way that because you have these explicit sequences they all of this this process expenses taking too long to signal and work backwards from there and you know what to do I and then synchronizing excretion in 2 different blocks we can use the internal stuff for so we can also use the the sequences of and user space so use this outputs maybe camerawork ambiguity can use distances to coordinate its activities 1 the downside there that you might have some additional bubbles in pipeline from you have sufficient possible so that's that's what
I have any any questions on this stuff the no or thanks p p