Utilizing GPUs to accelerate 2D content

Video thumbnail (Frame 0) Video thumbnail (Frame 13364) Video thumbnail (Frame 26229) Video thumbnail (Frame 39094) Video thumbnail (Frame 39827) Video thumbnail (Frame 40577) Video thumbnail (Frame 41327) Video thumbnail (Frame 42077) Video thumbnail (Frame 42827) Video thumbnail (Frame 43577)
Video in TIB AV-Portal: Utilizing GPUs to accelerate 2D content

Formal Metadata

Utilizing GPUs to accelerate 2D content
Title of Series
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Over the last 15 years, GPUs have gone from being a piece of hardware found almost exclusively on the machines of gamers to being present in almost every single desktop and laptop computer. This hardware presents opportunities to greatly improve power usage and performance for graphics applications. Over the last 5 years GPU utilization in the desktop application world for accelerating 2D graphics has slowly moved forward, however their intended use for video games also presents us with a number of limitations. Over the last 15 years GPUs have gone from being a piece of hardware found almost exclusively on the machines of gamers, to being present in almost every single desktop and laptop computer. This hardware presents opportunities to greatly improve power usage and performance for graphics applications. Over the last 5 years GPU utilization in the desktop application world for accelerating 2D graphics has slowly moved forward, however their intended use for video games also presents us with a number of limitations. In this presentation I will talk about what GPUs are, why we want to use them, in what different ways they can be put to use, and some of the challenges we've encountered when using them at Mozilla. I will also try and touch on some of the technical details on the different tradeoffs that the most common algorithms present.
Discrete group Axiom of choice Presentation of a group Pixel Context awareness Code INTEGRAL Multiplication sign View (database) Execution unit Source code 1 (number) Set (mathematics) Darstellungsmatrix Shape (magazine) Data dictionary Medical imaging Programmer (hardware) Bit rate Different (Kate Ryan album) Bus (computing) Matrix (mathematics) Endliche Modelltheorie Vulnerability (computing) Social class Predictability Area Boss Corporation Touchscreen Texture mapping Mapping Moment (mathematics) FLOPS Flow separation Process (computing) Befehlsprozessor Vector space Artistic rendering Triangle Right angle Cycle (graph theory) Quicksort Reading (process) Resultant Spacetime Geometry Surjective function Asynchronous Transfer Mode Point (geometry) Laptop Implementation Connectivity (graph theory) Real number Virtual machine Translation (relic) Branch (computer science) Parallel computing Computer Rule of inference Event horizon Number Power (physics) Product (business) Goodness of fit Causality Bridging (networking) Term (mathematics) Operator (mathematics) Computer hardware Authorization Traffic reporting Alpha (investment) Domain name Hazard (2005 film) Content (media) Variance Plastikkarte Affine space Word Computer animation Motherboard Personal digital assistant Shader <Informatik> Statement (computer science) Speech synthesis Game theory Window
Complex (psychology) Pixel Presentation of a group Range (statistics) Open set Shape (magazine) Computer programming Software bug Medical imaging Mechanism design Core dump Touchscreen Texture mapping Mapping Cross-platform Structural load Gradient Shared memory Sampling (statistics) Quadrilateral Maxima and minima Instance (computer science) Demoscene Process (computing) Befehlsprozessor Buffer solution Phase transition Order (biology) Triangle Figurate number Geometry Point (geometry) Computer file Open source Device driver Drop (liquid) Mass Event horizon Computer Number Goodness of fit Latent heat Computer hardware Energy level Data structure Stapeldatei Graph (mathematics) Scaling (geometry) Assembly language Information Tesselation Surface Line (geometry) Directory service Cartesian coordinate system Grass (card game) Vector potential Shader <Informatik> Video game Game theory Window Library (computing) NP-hard Musical ensemble State of matter Multiplication sign Direction (geometry) 1 (number) Genetic programming Set (mathematics) Function (mathematics) Mereology Computer font Programmer (hardware) Mathematics Bit rate Endliche Modelltheorie Extension (kinesiology) Predictability Area Curve Algorithm Concentric Moment (mathematics) Entire function Band matrix Type theory Vector space Auditory masking output Right angle Resultant Row (database) Implementation Vapor barrier Image resolution Graph coloring Wave packet Revision control Aeroelasticity Software testing Utility software Task (computing) Mobile Web Graphics processing unit Polygon mesh Interactive television Vector graphics Computer animation Calculation Vertex (graph theory) Buffer overflow
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation Federal Department of Foreign Affairs Moment (mathematics)
Computer animation Formal grammar
find the OK so now it's time to listen to best shelter and the presentation the thank you is it so boss is an engineer in the mother has graphics seem it's him was implemented a large portion of the code needed for Windows in
fix that is also very much involved in the modes in the community creating a real speaking community to bring together Dutch like and Dutch-speaking Belgium so please a welcome bus and there was a lot more than I claim credit for but it sure right I here to talk with you about utilizing GP used to accelerate to the content which is mainly what I've been focusing on during my career was a lot of as was said I work for the graphics him for Mozilla I mainly work on Windows uh acceleration windows but I've also done for example the initial additions and in the so I have a little disclaimer here uh the uh the market and the technology in this area moves very fast and to make public statements if any of those turn out to be outdated I'm sorry right the so I when I don't to talk about unknown to talk about why we want to use GP is I'm going to talk about the strengths and weaknesses of them at the challenges for 2 D rendering of what GPU pipeline looks like the available approaches we can take when doing to you entering which he use or GPU assisted two-year entering and existing implementations that are out there this everybody here knows what a GPU is the this is that would be a problem right so so when we when used you use well 1st of all these days they're present in almost every machine where we need to make a distinction between discrete and integrated GP integrated you use you will find in your machine chip sets saw for example in the new since any breach on the diet along with your CPU and discrete you used or separate chips which are either either solar on your laptop motherboard or you can find you know the nice bulky cards that you put into your desktop machine usually game now when we when use them well they are called graphics processing so it seems to be sort of an obvious choice for the job and what they basically do is they give as much better flops floating point operations and per and much better per dollar from which to that euros to them so if we look at this market uh that the the 1st of 3 D accelerator cards they were introduced were introduced in 19 around 1987 and later a very famous 1 of that was the view due to we do do produce about 500 megaflops a 2nd and the rules I'm power of about 10 wants at a cost about 300 dollars and it was a purely gaining in device there was absolutely no other use it was really hard to use for anything else and now this means that for what you got about 50 megaflops per 2nd I per dollar you got about 1 . 6 but megaflops a set that of great commentator CPU in those times event into it produced about 233 megaflops if we take a specific variance of also the midsection of the market in terms of design by about 17 what's a coset 500 dollars and you can see that the number is there are you know less need a false what was of all 4 dollars but the differences are and as many as the these days nowadays you get a radio 167 50 which is a discrete uh ATI GPU from about the mid segment of the market but you're to back up that produces a terror flop which is it's thousand bigger follow-ups the 2nd it has a thermal is Ibarra 150 watts no we become a lot more power hungry it to work on that really the constantly 100 always which is a world of difference like that surprised at the for your average machine therefore 6 . 7 gigaflops a 2nd per Watt and attended a 2nd per dollar now if you compare that to a common CPU these days which is the Ivy Bridge 5 5 3 4 fights as and bypass that produces a 10th of that at a higher price at double the price but lower power unlike in the old anyways you can see how that we have over 4 times as many gigaflops of what uh per what on the on the radio on and we have over 20 times as many gigaflops a dollar when so everybody likes you well that's basically the y now we have to figure out how to do it well on a GPU strength and weaknesses most modern GPU are triangle restless that means that when you have a game and the game will have models they consist of triangles in Gp will help rejecting those triangles onto your screen essentially they have a much a fixed-function hardware to help with that I fix isomers basically you know and a chip that is designed to do specific things very good at that but for things like blending which is basically what you know blending a transparent pixel of something else texture sampling which is interpolating when you're reading from an image you know when you when a sighted upside down etc. and triangle authorization is that's their business aside from this they also have programmable shade reports now on modern GPU these words general floating point computation units and their use in several steps dictionaries pixel shaders geometry shaders wholesalers domain shape those of the common ones that you see in modern graphics API that's the pixel shader for GL people we call the friendship but now the shared cause we can really let them do anything as long as it's basically for component map and so these are the strike right they're really good drawing triangles when you have discrete GPU gave an immense memory with much bigger in your CPU but they are very good at parallel computation is these cost by themselves are not that great but you get a thousand of them all in imperil every cycle everyone knows thousand words for example you a bunch of computations i'm and the hazard said very good for component vector operations now why would they be very good for component vector operations while turns out everything they work which is working on vector vectors now that is pretty obvious we're talking pixel manipulation if you think of the red channel of blue channel and the alpha channel which is the transparency you can see how I've noted that a single blending formula for for using uh the portrait put enough operator over for blending and as source onto destination and that's all for component vector map right up that we also have the vectors themselves so points in 3 space trying now 1 might say well we have xyz right so we only have 3 components what's the what's the why would be for components while turns out we add a little while to these vectors and that means that when we can make it better that we can make a bigger transformation matrix and we can have an affine transform which actually in the last column also has a translation so we cannot only rotate vectors about we can also move them on the plane because we're actually talking points and vectors so this do you know this this matrix manipulation years uh mold guess you're seeing here is basically for dot products right of up to 4 component vectors so this is basically all that they're going that's where they're really really good at it so we will exploit that because have a bunch of weaknesses they're very poor branching performs light up until very recently you write a shaded you can have an if statement you can say all but I always hit this really simple path in my statement problem is is going to run all the other bad class well because there's no branch prediction you want you know so it's just a master result at the end but it is going to run all your culture you shader is expensive as the very worst case they came up the that's a problem it's a big problem for a lot of things they also have done a great serial performs as they said individually these things on that great there's just a thousand of them that's literary at another problem they don't of preemptive multitasking which basically means 1 thing no they're going right and still there is no way to stop them except give a big reset command to the whole thing and then you have this set of everything again you know it's not like it's you can switch context like we do CPU right but there is people working on that it will happen in future at the moment it's not there they'll so once you make a job that's too hard you're absolutely royally screwed this this week is gonna lock up this again do nothing else rate so it can draw the
person's you wire anything like that that's a disaster right the state manipulation is expensive we have this set all although we basically have to set the program of the scorers rating have to prepare everything for this task that you're going to set out to do and then you can fire adopt and the fiery off it goes into the GPU pipeline now the GPU prior might get to work would you just it's a fire and forget mechanism is so if I want the result back from that I'm going to have to say as if I'm a CPU right I'm gonna say OK I'm just going to have to wait till the GP used done enough to stall in order to get data back out of it so reading back from these devices is very expensive as well basically we need to make sure that everything we do goes to this great not back to us we we take it back or 10 but it's it's in feasible when so that produces a number of challenges when using GPU for 2 rendering wow something I haven't mentioned yet but I will later on all of it is that high quot quality anti-aliasing for us in 2 D graphics is really really important in video games which are very dynamic really a static concentrator moving around things like that if you're anti-aliasing delivered lower quality you know it's still pretty good lot people play games with an anti-aliasing at all 4 duty graphics that's non-optional everybody knows those jaggies just standing still on your screen it's going to be disaster so we need to find ways to high quality and you it's either with the GPU which really only a high in GPU and to do well or going to have to do it with the C. but in different ways the top 1 and now the other thing about our at our frequent interactive sheep scene changes when I go into a room and again when I switch barriers etc. people are perfectly used to loading screen and that's OK right and you got upload all the new data to get all the new texture of the wall there's the structure of the room the structure of the characters that you find all those things on it but you can take your loadings you can do some you know you can take your time if I switch tabs and you know everything changes the text changes the images changes in the shapes change that people are gonna be very happy if they see loading screen every time this which have great so so this model isn't that great for for desktop applications it's a problem you have to solve now another problem is that these things are made to triangles that the way we describe checked into the graphics which is usually through wise and Bezier curves and that's not trying to you know we have to find a way to get this triangle drawing work I'm also these things are used to having a model all you position things in spacing you set it up to render it into the drawing most API is at the moment still work is an imperative trying to mainstream which doesn't batch very well it's hard to batch and you batch uses a set state changes are expensive so we need to do as much you can in a single drop this is a hard problem as well finally this text but I already I'm already running late with my presentation I wasn't going to talk about that if it's never going to make it but it's a very complex problem if you want to learn more about it we have to talk to you about it after when so I promised to summarize the GPU pipeline I'm taking the 2 most basic share types of vertex shaders in the pixel fragment shaders here I've taken out all the other ones because that would complicate things needlessly basically the 1st data GPU pi ways the input assembler it's where all the verdicts is go and buffers going instance buffer go then you basically gives you all the data it needs to draw your right then you get the prediction that the stage in which the GPU will will do the supported these GPU cost will go to work on your shares on your verdict versus now uh these vertices will then be for example transform you know moved around etc. etc. and after that they will restore rats rises now the rest riser gets this is well after the training defines your ass arises ghostwriter referees these triangles but for each pixel it will do a couple of death will do that buffer test that will be used as a buffer test if you want to be used as the test a about success which are really cheap this when you're not paying a price for pixels now if it passes all those that then the pixel gets filled these are paying for that pixel that pixel belong to the pixels here the pixel shader has an opportunity to manipulate to take the interpolated data from those verses from those triangles and to change them into the desired output pixel so in love with 1 or more pixels which have a color and you know with transparency and things like that which will go to the output merger and the output merger actually has read write access to your destination directly interval merger pixel on to the final destination surface which will bring it on your on to procedural texture opt for to your screen when so that's what the bible and looks like and now I'm going to talk about the different ways that we can use the pipeline due to the drawing there's a couple of basic approaches and and these are really the fundamentals of of what you can do er would would commonly is done the 1st 1 is converted to initiate a right I can make a curve if I make a triangle small enough I can describe triangles right have to be tiny triangles quote point they can be that and it's a process called tessellation now there's also grass rising the CPU and just doing your composition on a GPU basically making use of that great for people in vector map your sampling in your blending of then there's to coverage competition shaders which is basically using a shared cost the geometrical calculations talk about that and then there's direct harbor implementations which is basically something like open VGA or invidious path rendering extension in open GL which Razali allow you to get your tuning geometries to something that is magically implement the harbor 1st tessellation I have some examples here of how the figures are tessellated now the top figure which consists of only lines you could probably already seen kind of how I could change that triangles and as you can see to be truly uh it can be trivially made into being into having these 4 triangles over here now the other shape His a curve on 1 side and lines and the other which means that you can take the inside and you can make that into a triangle previous right but then you have the outside in is that actually to actually do the curve itself and you need a lot of triangles that this is a very much inflated 1 you know if you actually want to ride this size the files would be much much bigger but obviously would be obscene it would be much of a point on this slide and if you get to her uh and even more complex shapes you can see that the triangles get more and more complicated you get loads and loads and loads of them to get sufficient quality right so it has a couple of prose 1st of all there's no overflow with issue would basically means you know your triangles are covering the pixels that you actually wanna fail you're not running a pixel shaders rainy pixels that you're not actually going to by using a GP was intended it was made to enter clerestories finals which you have these triangles but you know you're doing what it wants you to do other shares it using a very simple because simply sample for a texture gradient they can help the color you know it very easy you know antialiasing using multi-sample Antilles 1 right and all somebody using is the standard method that most games use as I said works a lot better on high and hardware them in Long Harbour now the downsides are the tessellation is hard and and is generally the very best algorithms are are big O n log n + k where k is the num self-intersections that something has an end is the amount of line segments that you're shape subdivided into of this is all CPU work so really although you're losing Mississippi you work in the blending the referees and drawing you're adding a bunch CPU work in generating a shape which means if you shape is very complex you might actually be spending more CPU work during this then you would have in the 1st place if you would just you know the rats resistance CPU also anti-aliasing if you want really high quality off your hardware as sport and say well is a really hard problem you have to make all these tiny little triangles for each pixel in you get a resolution 10 mesh so you no longer have a nice triangle of your shape give all these been little triangles that just there is little tricks to make pixels look like guaranteeing this pa properly for some things you can do using
interpolated in nice things but in the end it's a very hard problem the I'm now next method I will describe is roster is this is pretty pretty simple right you visit the draw a white your shape in white into your sense a lover right so your sense for now has ones for the pixels which fall within your complex shapes and zeros for the pixels that then you upload that last year's sensor buffer if you have minimal uh rendering to it to directly and then you drop the triangle the basically you know a Rapture of our sense of of uh your entire sensor buffer and you let that uh you let that basically like the GPU do the test on the sensor buffer and then it fills the pixels that you want filled now the good thing about this approach is that you using the stencil buffer there is no overflow because the roster at will very cheaply throw out all the pixels that fail the sensor buffet the I please conventional TD against to draw that sensor buffer and it is fairly simple shaders that 1 should not now if you are doing a with a if you're using sensor you do get anti-aliasing with MSA if using texture is uh where you basically have the full range of 2 56 levels of transparency you can do good a with your T. rendering API and you get really high quality right that the concert that there's no hardware benefit for generating a mass so as if you were there are demanded from this height is you're uploading all these large surfaces of your of your sensor buffers against potential overfill when using textures is then the pixels on the front of the sensor buffer which only 2 pixel shader to figure out whether they have an output value and there is a high state change common is really hard to batch if you use this approach practically impossible now the next 1 is to eat it uh t coverage competition shape of this approach was really many well known by a paper by max of research from the Charles the Indians plant uh for those of you who were graph mosquito coming over there are so it was called resolution resolution-independent that curve rendering using programmable graphics hardware and what they do is they look at all these curves that we have here and then they make these triangles these quadrilaterals here that basically currently in the process each pixel that falls within the quadrilateral and calculate on which side of the curve that will fall per pixel so paralyzes really well the inside is then filled with a simple come with a simple budget triangles of symbols you can get them any outside very few triangles fills really nice and you get the shape over here and a lot of interest in this approach but as a much the CPU works really minimal but the implementation is fairly simple with doesn't allow it doesn't require complex uh tessellation algorithms or anything like that it paralyzes really well because we're doing all these pixels you know we're computing with the cabinet and in parallel for every pixel the bandwidth becomes a very small we only need 2 triangles for an entire curve were tessellating per we need hundreds sometimes thousands and you get really high quality of releasing this you can simply calculate how for the pixel shader is from the curve in order to calculate what the coverage of the curve over the pixel shader would be of that pixel b the constant the shakers are complex and you're running them on a lot of pixels that you not actually filling if you wanna reduce that overfitting and to subdivide your curves to make your uh to make your triangles your quadrilaterals have less uh overlap uh or have less overlap with the non filled area and intersections are a little hard to do FIL just have to take that from the then finally there's direct hardware implementations of the exact algorithms are implementation-dependent the advantages are that they are optimized available harbor being implemented by the hardware vendors for their hard work the baseline magic you know he's giving your utility stuff and then make it happen at the downside is they require underlying harbors working nanowires supported and they're usually not that well tested his you only couple device have nobody's using the great so the existing implementations but which is the last part of what I'll talk about uh we have was my leaves which is directly from Microsoft uh which is basically they're drawing API in old reason Windows versions of its Windows only with the proprietary obviously is essentially a using library that wraps directory it is purely all almost purely tessellation based it multizone widely used it's used by far and 7 upwards used by II since I 9 upwards and is used by steam and lots of others offers comes then from Google we've scale were skewgee out to be specific which is the sky version that renders using open GL it's open source this cross-platform as I said it's implemented open GL it uses a hybrid approach so we can draw either using coverage competition or tessellation or so it can do actually upload CPU masks so it uses all the different uh think on now I'm currently and I don't know where crimes exactly using right now but they're not using it everywhere yet but they're working on it and we are also using it for canvas on mobile devices the and then there's from Apple there's cost you all which is their core graphics back and so on it's always acts only it's used by survived modern versions and as far as we know it's a little hard to dissect it uses a hybrid approach that we because it's nightly news thing we can go in and see what it does sound then this guy would yell skiing you maintain open source cross-platform also use a hybrid approach and I don't know of anybody actually use in practice I'm sorry they know it's full of bugs if it ever go somewhere that will be wonderful at this point it doesn't look like it has or that it will and then there's to a hard direct hardware implementation like image 1st of all there's an and the path rendering extension which is an extension my NVIDIA aid are requires event a GPU recent 1 with recent drivers is completely implemented in hardware it outputs that it's basically Open Geospatial to roster it to the mass do yourself a buffer and then you just draw the outline of aligned triangle is fairly new not widely used not writing tested we've work with a Midi engineers with it it shows some potential also as a couple of issues will see where goes and then there is basically what is a specification the G it's basically on open vector graphics like open GL it's by Karlsruhe group just like open GL and any describes an API that can be used for a vector graphics and I don't know of any widely used implementations uh particularly mobile vendors are looking into this at the moment to save our mobile devices and most information I have a that is In this magical proprietary and Mobile World and so I can say a lot about it but it is not yet widely used to the best of our our right but that was everything at the same I hope it was not too technical and not too specific to the graphics uh if you have any questions we got a couple of minutes for that uh you're all if you have a question wait till they come to you with the microphone because otherwise we won't get into the recording the recording will be less and the the of those libraries you mentioned in the and which ones are are Torre gets to to be a useful or folks in the in and out of or interesting in the into the future work currently the jail and directly and we are looking into the path together with the video engineers but for now that's not going yeah the as the thank you thank thank you you download them a little the yeah
we and a and
and and the the
the so please we uh there are lots of people who wants to enter the room so if you can go to the middle and and so on
and and so
it and and
and and and the the and
and the the and the in the
the and and
the and the 2 and and
and the use of the political Department of the you and and and
and and I a 2nd microphone or on the right side if you want to work with because the and the for the moment
the the so now let's listen to a gene's grammar