Three Years Experience with a Tree-like Shader IR

Video thumbnail (Frame 0) Video thumbnail (Frame 1262) Video thumbnail (Frame 2415) Video thumbnail (Frame 4655) Video thumbnail (Frame 8225) Video thumbnail (Frame 9338) Video thumbnail (Frame 17165) Video thumbnail (Frame 19357) Video thumbnail (Frame 20326) Video thumbnail (Frame 24521) Video thumbnail (Frame 28338) Video thumbnail (Frame 33349) Video thumbnail (Frame 36232) Video thumbnail (Frame 49033) Video thumbnail (Frame 51713) Video thumbnail (Frame 58617)
Video in TIB AV-Portal: Three Years Experience with a Tree-like Shader IR

Formal Metadata

Three Years Experience with a Tree-like Shader IR
Title of Series
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Three years ago a small team at Intel took on the task of rewriting the OpenGL Shading Language compiler in Mesa. One of the most fundamental design choices in any compiler is the intermediate representation (IR) used for programs. The IR is the internal data structure used for all program transformations including optimization and code generation. At the time the compiler was designed, a number of alternatives were investigated. In the end, a tree-like IR was selected. With hindsight being 20/20, this talk will present the tree-like IR that was chosen and the issues that have been found with that IR in the interim.
Computer animation Open source Software developer Right angle Row (database) Compiler
Ocean current Mathematics Computer animation Multiplication sign Architecture Real number Bit Code Formal language Compiler Computer architecture
Axiom of choice Classical physics Web crawler Dynamical system Implementation Parsing State of matter Multiplication sign Source code Virtual machine Set (mathematics) Code Machine vision Product (business) Hypothesis Revision control Hardware description language Compiler System programming Resource allocation Descriptive statistics Computer architecture Addition Arm Projective plane Expression Expert system Cartesian coordinate system Sequence Compiler Message passing Loop (music) Computer animation Search algorithm Quicksort Software architecture
Message passing Computer animation Inheritance (object-oriented programming) Operator (mathematics) Network topology Expression Set (mathematics) Bit Maxima and minima Remote procedure call Figurate number Quicksort
Topological vector space Trail Computer file Plotter Multiplication sign Source code Set (mathematics) Regular graph Code Coprocessor Permutation Power (physics) Causality Different (Kate Ryan album) Data compression Natural number Operator (mathematics) Single-precision floating-point format Algebra Binary multiplier Mathematical optimization Rotation Software developer Expression Mathematical analysis Sound effect Bit Sequence Compiler Vector potential Befehlsprozessor Computer animation Network topology Quicksort Resultant Asynchronous Transfer Mode
Addition Multiplication Expression Field (computer science) Revision control Optical disc drive Array data structure Structured programming Process (computing) Pointer (computer programming) Computer animation Semiconductor memory Shader <Informatik> Network topology Game theory Resultant
Logical constant Structured programming Computer animation Network topology Expression Virtual machine Shared memory Set (mathematics) Bit Data structure
Computer file Multiplication sign Set (mathematics) Code Intermediate value theorem Gaussian elimination Goodness of fit Structured programming Semiconductor memory Different (Kate Ryan album) Mathematical optimization Condition number Social class Information Diffuser (automotive) Fitness function Bit Cartesian coordinate system Sequence Compiler Word Arithmetic mean Pointer (computer programming) Computer animation Uniformer Raum Network topology Statement (computer science) output Video game Right angle Quicksort
Trail Numbering scheme State of matter Multiplication sign Set (mathematics) Similarity (geometry) Code Computer programming Goodness of fit Array data structure Different (Kate Ryan album) Semiconductor memory Compiler Object-oriented programming System programming Mathematical optimization Address space Addition Mapping Information Shared memory Infinity Bit First-person shooter Variable (mathematics) Complete metric space Word Befehlsprozessor Computer animation Uniformer Raum Shader <Informatik> Phase transition output Quicksort Speicheradresse
Ocean current Logical constant Dynamical system Parity (mathematics) Multiplication sign Function (mathematics) Code Array data structure Goodness of fit Semiconductor memory Extension (kinesiology) Mathematical optimization Dependent and independent variables Mapping Information Structural load Data storage device Shared memory Cartesian coordinate system Subject indexing Database normalization Computer animation Shader <Informatik> Network topology Geometry
Group action Sheaf (mathematics) Set (mathematics) Parameter (computer programming) Mereology Food energy Dimensional analysis Front and back ends Chaining Array data structure Mathematics Semiconductor memory Different (Kate Ryan album) Single-precision floating-point format Forest Functional programming Error message Resource allocation Social class Compact space Texture mapping Electric generator Block (periodic table) Constructor (object-oriented programming) Electronic mailing list Opcode Variable (mathematics) Sequence Message passing Process (computing) Right angle Quicksort Dataflow Trail Numbering scheme Device driver Code Knapsack problem Gaussian elimination Structured programming Operator (mathematics) Computer programming Hierarchy Reduction of order Energy level System programming Binary multiplier Address space Mathematical optimization Form (programming) Addition Information Expression Volume (thermodynamics) System call Uniform resource locator Computer animation Personal digital assistant Network topology Calculation
Ocean current Trail Building Group action Computer file Multiplication sign Mereology Front and back ends Product (business) Chaining Revision control Structured programming Energy level Damping Office suite Hydraulic jump Computer architecture Electric generator Block (periodic table) Projective plane Bit Sequence Compiler Message passing Word Loop (music) Process (computing) Computer animation Shader <Informatik> Personal digital assistant Network topology Statement (computer science) Formal grammar Game theory Digital Equipment Corporation Table (information) Window
Group action State of matter Multiplication sign Direction (geometry) 1 (number) Client (computing) Open set Function (mathematics) Mereology Bookmark (World Wide Web) Mathematics Array data structure Computer configuration Different (Kate Ryan album) Semiconductor memory Core dump Compiler Cuboid Object-oriented programming Functional programming Electric generator Shared memory Fitness function Physicalism Bit Flow separation Sequence Demoscene Arithmetic mean Process (computing) Uniformer Raum Phase transition output Right angle Lipschitz-Stetigkeit Quicksort Overhead (computing) Computer file Link (knot theory) Device driver Online help Code Wave packet 2 (number) Revision control Computer programming Energy level System programming Squeeze theorem Mathematical optimization Computer architecture Multiplication Interface (computing) Projective plane Expression Volume (thermodynamics) Line (geometry) Limit (category theory) Cartesian coordinate system Frame problem Compiler Pointer (computer programming) Computer animation Shader <Informatik> Statement (computer science) Formal grammar Game theory Library (computing)
is the recording started and yet just is a recording started but you know OK right so my name is economic and with the open-source technology center Intel and 1 of the main developers on also compiler in Mesa
I and my talk about the experience we've had over the last 3 years with a tree-like IR for the given jail shading language and some minutes better but time connect going over a little bit of the
background of the current compiler architecture and how we we ended up with this i and talk about a bunch of the problems that we discovered with which the IRA architecture work over the last 3 years on and then talk a bit about wasted 2 would involve the existing IR and into something better that doesn't have a bunch of those problems and so we started this project in around 2010 and at that time the geocell compiler was kind of a disaster but technically supported GLS 0 1 . 20 but there were a lot of real world changes that just wouldn't compile war and generated completely awful code
of it was written using a custom parser generator are so the guy who wrote it again in Michael crawl basically wrote his own version of yak to write a compiler and he wrote it as an and and put this In fairness he wrote this as an undergrad thesis project at his university so that he got anything working by himself the could actually generate code and compiles anything was pretty spectacular accomplishments but as you know something that you want to be overrun real applications of it just it just wasn't good enough arm and it wasn't really an architecture that anyone else could understand or maintain or that we can make improvements on and in addition to the this sort of custom parser generator of the the use of used a register stack so basically like the old 387 are it was yeah the loop eventually we're choice in I'm so we set out to rewrite and at time none of the people involved with the project on it was primarily myself and and 1 other guy at Intel we're going to be working on this we were graphics people and not compiler people's so we set out to find out what sort of the the state of the art was in in production compilers and we we a looking
primarily at 2 sources at the time 1 was advanced where our design and implementation by Stephen Muchnik on which is a really really excellent book if you're interested in compilers it is a really good source but it kind of has a disadvantage that it was published right just on the cost of when SSA was becoming mainstream so there's a chapter about as as a in the kind of practices with 1 of this this kind of new thing that people are trying and we'll see if it takes off on the other source that we looked at all I had was data Hansen's classic text from a retractable C compiler which is the book I did if you've ever heard of the L C C compiler basically the book about that compiler 1 of the interesting things about that book is he developed the of fairly sophisticated system called Elburg for writing machine description languages and so if we sort of have this vision of where we have all these different we're GPU back on that we want to be able support wouldn't it be great if we could just write a description of when you see this kind of sequence of things you can generate these instructions to do it and it has this kind of cost of in in his system had really sophisticated art and dynamic programming architecture so that it would kind of like an AI search algorithm try to find the least cost instruction sequence for all on a particular set of I R and so we can have these these 2 things as as well that we were building from so that and and particularly because of things that are the with the Albert system works it wants to know a lot about the How values generated how partial values generated in expressions are then later used on unfortunately also really wants to be integrated with your CSE pass and register allocation and that's kind of work does not being being compiler experts new we sort of Mr. Basketball tackled that more about that later so what we ended up with this 80 tree like
expression I Our on where we have a base class of our values and in all the different sorts of of our values derived from that our so the most important 1 being higher expression where we have and the these and you for the operation whether it's an adder remote wire whatever and then the set of operands that that that the expression takes so we get these these kind of trees like this and we gotta multiply and add in a minimax expression and so we can't evaluate this tree and by you know a fairly natural of tree visiting
our way we'll figure out OK this is a an accidents mad are with with saturation but we conclude this whole tree in figure out that it should be a single instruction on and then we have a a few passes over a few of cogeneration bits in the composer were with this but being able to convert these sort of really common and max trees
are to to generate saturate modifiers add multiplies to generate matter there's a whole world I think 6 different kinds of trees that we can figure out that are actually a of a instruction there's also a bunch of some pardon algebraic optimization passes the bid the explain this this tree like nature so we can identify that if you have a power instruction that's 2 to some power we can actually generate the X 2 instruction instead of but it turns out that G. Houston that really regular instruction sets with primarily very very simple instructions and they don't have complex addressing and primarily the benefit that you get out of being able to on do sophisticated analysis of these trees is if you have CPU's have crazy instructions for crazy addressing modes in a like the scalar plots offset kind of addressing motor post increment addressing modes and things like that you can identify those in the trees and be automatically generate those instructions TVs don't have any effect on to a lot of this potential from these trees we discovered of when we're quite deep into it we were really gonna be able to realize that potential has these trees can have of various rotations and permutation so if you want to be would generate a matte instruction you have to identify this kind of tree and that kind of tree and all other kinds of weird rotations including transposing the were the minimax are to be awarded to generate the saturates so the these being able to evaluate these trees turned out to be in practice a lot more complex than we had anticipated or or had hoped so we also have the difficulties with code generators for so this particular tree ends up being a single instruction on but there's a bit of difficulty with the code generator if you have a sequence that's going to generate multiple instructions because then as the code generator is we're cursing back up through the tree it has to keep track of where it has stored these partial values so that then the next layer up in the tree can know that it has to so the that the multiplier stored its value in register 26 so then you would generate the ADD instruction knows that OK that has to come from red that source has to come from registered . 6 and add it making the the cogeneration old kind of a nightmare know it also doing any kind of CSC past exceptionally difficult cause you have these crazy crazy big big trees on many of 2 do your see here's your CSCE on some parts of of the trees so we also have difficulty identifying these kinds of sequence if say in the middle here there's us was all operations so the result of the multiply its whistled arm and be little identify that 1 trying to generate the the mad instructions and it did get it in a being a lot more complex than we had anticipated to see the difficulty of if you have if what the developers written as some piece of code like this where you're multiply in your ad don't that sort of naturally in the same tree the tree processor will completely miss OK I can generate the value if if x isn't used anywhere else I can generate a value y using a single Matt instructions on how to overcome this we had to write this really awful passed in the in the compiler called uh if you look in the in the MESA source code is a file called of opt tree thing they will try to find these kinds of places and and merge those those trees together so little take the the tree that generates x here and put it in place of that the reference of of X and that expression it's it's a horrible piece of code I I feel very sorry that this year and had to write that that code I know a judge to him several times for it I the so
In addition to date expressions we also have on another kind of all value is odd D references on the simple which is the the simple version Multi reference on and there are other more more complex the references so that it so there is a separate 1 of these
subclasses each for on a radio references and for structure the references so if you have some complex thing like you have another way of structures and that structure contains an array of structures that contains an array of structures etc. etc. and you want to get at some actual data member and that you end up with a whole tree to get that to dereference slowed down through that array of structures of etc. to get down to the actual piece of variable the the you what I'm paying kind the end result is you have a huge subtree for this with a lot of pointers that you have to walk through why you're doing this recursive process just to figure out that what you actually wanted was register 48 so this makes it complex shaders especially complex treatise on take a huge amounts of memory in the IR are there are a couple of games are available
on steam that right now on a 32 bit machine had difficulty compiling all of their shares because they exhaust via yeah so you could in this giant tree structure for the value and it's not that cold right it's you can talk about that because we have this extremely verbose and set of data structure for expressing these things the that so we can have 3 main kinds of
our values we have expressions we have these and trees of variable the references and we have some constants India with sensors are used the have the exact same
structure for dereferencing our contemporaries in so whatever is that the um the application has explicitly declares that only exist during the lifetime of the share and temporary their automatically generated by the compiler on uniforms she inputs and outputs at only those walking the same in the compiler even know at the cogeneration back and they they may be treated differently and these yeah In that is being important into the assignment instruction which has a the reference for the left hand side and then some value tree for the right hand side and any any condition modifier on it so that we we put that in there with the intention that In the assignment could be a conditional assignment many diffuse have either about predicted predicated execution or have explicit condition codes on on all the the assignments so the on instructions not right value depending on the value of the condition code so we can have this this this thinking that we would use this systematically as word generating code and finding out if statements and doing things like that and they would to generate things no compactly and make it easier on the cogeneration back on what precisely is this right hand side is just add X and Y it's potentially this giant tree of of good luck with that's going to generate you know 46 intermediate values and you know maybe all those need to be conditional and maybe they don't it kind of depends on the back and so we have this this thing in there that should have made things easier but in practice actually make things quite a bit more complicated because now and of the optimisation passes when they're trying to do that you tracking word dead code elimination of things like that have to be aware of it median assignment occurs or dust because they're they're all conditional on it it just sort of yeah that it it makes things more more difficult instead of making them easier so we these great ideas and all these things that that really seemed like a good idea at
the time of and most of it turned out to just the painful and sort of make everyone's life much more difficult and the only thing that is in fact a bottle of this this sort of panned out was yeah when we created the tree structure so we created a set of of sort of helper classes to make it easier to are traversed through the trees and all we execute operations at certain all kinds of nodes in the trees so you can do things on them are to make writing independent optimization passes a lot easier I and then has worked out but it's been kind of a double-edged sword on so 1 on the 1 hand it's really easy to as you know if you want to write a new optimization pass it identifies a certain kind of sequence and replaces it with a different kind of sequence it's really easy to write books you can sit down and probably right to have them before I'm done with this talk the flip side of that is not about 27 of those and so now we have all these different optimization passes that operate completely independently and most of them once they've done some modifications on a set of instruction trees have to stop because now they changed information that they they later need and basically start over so through my whole optimization sequence we in up running through these trees hundreds of times of so when talking through these trees through all these pointers to these extremely verbose the reference structures of its extremely a catch unfriendly it's it's downright cash mean that the the and and the prevalence of it is fairly off often right so we've got the best fit the this thing so we can't out of this thing could and and it's ending of being of the costly so I think I think we need to do is is attack sort of the the the the worthiness of files of all these these by the trees and and try to get some some memory back so 1 of the things that we should have done in the 1st place is basically we
have 2 different sets of ways that that we keep track of values
on anything that uses a warehouse an array just pretend that wanna CPU and give it some memory locations essentially now we actually have a lot of infrastructure for doing this kind of thing the other these kinds of things in GPU programs thanks to a uniform but for objects where you can create the region of memory that has a bunch of variables and you can directly access from the shade so with a bunch of this infrastructure now and transitioning to it least during the initial compilation phases just putting everything that's an array into sort of a Fig uBio on and then assuming that will optimize it out later I would be pretty easy and then everything else just give it a fake register out of a pool of of infinite registers right so just have of 32 bit register set the and then keep on mapping of register number 2 are 2 of the variables so we can we can still I be able to tell that this register is actually a vertex shader input if we needed to generate different code for that as of 1 of this we could also include a swizzle information on on the register usage so even though we would have this extra mapping to say that 238 is actually variables through I think it would end up being a lot smaller than the current at the reference system for a few reasons are the biggest 1 is that there's 1 mapping per variable instead of there being an I the reference for every time you try to use that In addition to all of these sort of compiler-generated temporaries and through through a bunch of optimization passes we generate a pretty good number of of of of temporaries none of those actually need are variables anymore you can just say OK I need a temporary it's the next state register and not even not even have this mapping because you don't need it not that variable is invisible to anything and then also after the share of men fully link you can just so way the Mac yeah complete other completely thrown away or at least just reduces down to only including on things that are visible external to the shader like the shooter and puts forward the set of uniforms so it ends up being a much smaller set of things to to keep track of you will also need a similar kind of mapping for the for a raise the you know that a particular base at address is mapped to the the base of a particular variable I don't really make this work we needed a couple of miles of additional optimization passes that we that we don't have now but 4 words to support for supporting you because we actually should have now the is a
past that kills redundant loads and redundant stores of the of race so so only at if you take a OK I Idea loaded index 6 of this array and register 85 the next times on axis register 6 you just the index exceeded pulled from that that same register instead of every loading it and likewise with the with stores will abstract response on you BOC yet but we will pretty soon with the I can with the extensions called a some geo for extension that adds that that capability and not so then if you've managed to kill off all of the redundant loads and stores of of an entire way then you can kill its entire mapping the memory just keep that arrays are in registers are since we have all of the lowering past already for the Jeep use most is can't do non-constant indexing of certain kinds of arrays and so quite a few of them can do dynamic indexing of of the arrays of the output from the vertex shader that fragment share for example on we have a lowering past that converts that kind of dynamic indexing to some really ugly awful code on that makes it look like a bunch of constant accesses on with those optimization passes and the existing Loring past we would end up with parity with what we have today of the so then what we end up with is instead of the current tree of good we'd end up with a bunch of new loads and something that looks like
this for our our are values of so it's it's still hold because it's going to still derived from the biology of from my value is still has tight informations we haven't lost that
I and this structure on OK 64 systems has the same size as the existing arity reference variable but it's thought of what larger by a 4 bytes on LP 32 systems we had you don't have the Giants tree of these anymore for any kinds of news array of structure array accesses and you also would no longer need to have on so expressions and as are values so the we've increased on on LP 32 we increase the size of an individual node but we've reduced the on the depth of the trees so still is this a winning trade off of this right here woods dramatically simplify I cogeneration back ends and would dramatically reduce memory usage also noting that I have if you after having done linking and and dead code elimination if you make a quick pass over the instructions and sort of compact out holes in the EU's registers you can keep track of the highest register number that's being used and the cogeneration back and if it says all the highest register number that new high the register number that was used in this program is 48 and I have 128 registers in the back and doesn't even have to bother doing its own register allocation pass on so for simple citizens would actually make cogeneration right it's faster to because you don't even have to bother with so it can at least on 1965 driver is the single most expensive part of the code generator so that seems like winning so the next place that we wanted to go after having read reduced are memory usages we wanna get to of what I'm a call flatland get out of this so 1st of of trees and get to a flat looking I r and I think that that would look something like this where you kind of smashed a single IR expression into an are assignment so you have the register of the left hand side the operation that's going on on the registers that are the right hand side and and the right n you know already from this where were sort of giving everything it's on big register and we've flattened things out you can almost smell SSA right it's like it's almost there already but there's there's up a couple of additional things that they need to happen to actually I make make that work out I and there are a couple other value like things that the that needs some treatment of so I so we have a an additional instruction for for function calls that the list of parameters if the kind of look like this where the the car has the location where it's going to assign to the return value of the call and then the functions being called an and all the list of parameters of that would get refactored into a new kind of class hierarchy I'm also texture instructions are tension sections are kind of a mess and in the current i are because defuse keep adding new texture instructions have additional parameters we can of people growing this this horrible and instruction node in in the I R and I think we want to do with that is just so that away and make texture function detection instructions look like function calls and I are that are the think is take the take a page out of the out of the LVM playbook in and use what they call intrinsic centers have specialized intrinsics for for texture instructions instead of having them the specific opcodes so if we flatten things out like this we've we've lost some information we we no longer know how all the how this volume is generated by a single instruction is later used right so if we have a and apply instruction really have that multiply instruction in isolation we have no idea when examining that that's the only thing that ever uses this multiply instruction is an add instruction and that construction is all used by in and that's all used by a max so we don't have any way to generate the knapsack construction more at least not directly on and also some of the some of the the optimization passes that want to take different kinds of sequences of instructions and replace them with a different single kind of instruction become more difficult so yeah a lot of those instructions we could probably sort of immediately generate as this 1 1 when calling our calculation which is I think a fairly terrible name but it couldn't I could think of anything else better call it 1 of has read the slides and if when we're going say directly from the AST to on 2 of the even if we could just generate the the mad sad and then on but most of the optimization process that we currently have create new opportunities to generate the these these complex instructions on this is especially true for a polar instruction I we had a would pass that operates at the high level i are and then we found it because of optimizations that happened in our back and on the the local level higher than that the 965 back end users that even it was creating more opportunities for to generate the construction so we had to create a past operated on the higher with multiple things they're trying to to generate our work instructions so we still need to be be altered to recognize these things that that were not really sort of tree-like flows of of expressions to to be able to generate more complicated things so we could and did you know go backwards from the back to a tree the trigger pass on the on the forest of trees then use that to go back into this form to get those complex instructions but I was looking just saying that there is no less having to write that code and then I think if I make error right that could again I will Hill cut me so by you so the way that competitors that don't have tree like cars are traditionally do this and that by the way is pretty much all of them is is using you it was a simplified form review teaching and this 1 person the audiences it is already thinking but you don't need you change with SSA were around now that's not not entirely true I am so even going back to the new 1 of the when the original papers about SSA and the address the issue of you chains and that you don't need you need chains or as many different things with SSA as you did without SSA
so for example you don't need you need change for doing CSC or for doing dead code elimination or plenty of other things but you do still need them forward doing instruction generation and a very small set of optimization pass the dimension again with that with SSA is that so you a whole bunch of simplifying assumptions about your chains and a lot easier to generate energy but to date and you can represent them more compactly and for the what we need to maintain our cogeneration pretty with what we have today we can get away with a really really simplified form of you chains that so we won't have to so really fight for duty change that roughly matches what we would still need with SSA so we won't have to really throw away any code when we make that transition that and the 2 can can live on side-by-side the so primarily what we need to know is is within a single basic block when it is a value only used by 1 other thing and and vise versa when our sensors within a particular basic part when there's a value that's consumed only have a single of reaching definition can the cases we don't even we don't need any information just don't even have extracted and so we can get away with a really really simplified form that we would generate directly from the tree and it would keep
keep up to date in in a really are easy way I so 1 other thing that we can't do today
so and and for which we would our new UT chains is to
be able to detect cases like this where we have 2 different our and I believe we would you get this after going to SSA would want to do this before we have 2 different reaching definitions of acts if we just push that that clamped back up in the each of those basic blocks we can generate on uh a DP with saturate on both of those and they cut instruction of of what we generate today will be generated is 2 . products and then an extra move with with saturate today that would that this is useless actually when might In the back ends we might have actually taken the 965 back and we have a pass it tries to to push those on to eliminate those those moves with with saturate I don't know if other back and have a similar thing of so there's a couple other bits of of doing laundry In the current compiler architecture that that I wanted to bring up of they're kind of outside the loop so where efficient all 2 of them that the mistakes that we made with with by our our we don't explicitly track basic blocks so we can use implicitly have them because of the way that that if instructions and in some loop in the instructions on are structure on but there isn't an explicit structure that's here's a basic block on and now to get fixed and I think that that would will sort of be that when we do our transition to the flat land so it was really difficult to keep basic blocks before but that the the tree structure made it a lot harder as soon as you try to move things around in the trees it would invalidate your basic blocks and you basically have to build them from scratch so we just didn't didn't bother but we also don't have an explicit FIR office which statements on when at at at AST level when a switch statement is encountered its immediately translated through some really core code and into a sequence and a sequence of of if statements which measures the time is what your GP was going to do however if you were if the switch value is uniformly constant almost every GPU would be better off using a jump table of and we have absolutely no way to generate that we don't care about it too much because if you look at cheater he'd be there are exactly 0 occurrences of switch so we have the past that there is an artifact of there not being a lot of of gains for Linux and especially not a lot of games from unexpected games that were also turning consuls and our PlayStation 3 and Xbox 360 couldn't really do switch statement so nobody wrote them in their shaders but now there's a new generation of consoles there's a pretty good probability that you start seeing some of those I have the thing that needs to die in a fire is the explicit generation of theUS t so we 1st started on the compiler project In case of any files and because of supporting Windows and any files that word that were that were generated as part of the build process of work trapped in source control so if you had a yacc grammar file DEC finally gets generated from it was committed to source control so let the time you grammar you have to remember to commit this the files to and people really horrible about that and so all it was very
common that he committed that's the file was 12 versions beyond behind of the the the grammar file I am so I wanted to keep how 1 of the 2 b as few changes as possible to the grammar file so what I did work when when writing that was handed explicitly create and AST and then process that AST in these in a scene file into the actual IOC so there's this explicit step there I'm now that we don't do that ridiculous thing of committing that the generated C file to source control on it would make a lot more sense to do a bunch more of that work actually in the the act file and skip that step of generating this this other what explicit AST sense the activity of the the grammar itself generated implicitly so we would like to be able to you know cut some some transient memory usage and and probably get a little bit more performance out of out of the of the compiler from and that way we can I didn't care about it too much our because sheer compilation time wasn't showing up that much in applications that existed for Linux in 2010 but now you know this the same applications that that run out of out of the among 32 bit systems of takes quite a bit of time to to compile everything and I think the 1 that hurts us the most is Dota 2 that if I remember correctly when you run it without any extra options at start it tries to compile 11 thousand years yes yeah that's the way it runs out of via right and yet it takes a while and so everything we can do to to speed up the initial compilation Our would would be a tremendous help to anyone wanting to run I came up with perhaps some other folks they're doing work on shaded cash so that the 2nd run when you you basically see cash 1st pesetas so then the 2nd run of the game it starts off really quick and but the 1st run is still you've got time to go get a sandwich by go get mean drive to the next city over to your favorite sandwich shop that's there and then come back in here a yeah but this can blue that's very things that have asked any questions that the find what we have in mind is that the little little things those sounds but then what happens to the lining of the same somebody some suppose that 1 the 1st to show you and then you get a sense of right so and then there on the actually a separate issue usually and so there's a bunch of games and the 1 that that I know of that's the that's the worst about this for in its in its native client are is is serious same 3 where they have the you know at the time that all the shares are but they don't necessarily know 0 hello shares are going to be used together until they encounter things may be much later in the level so what they'll do is compile everything up front and then when you will say to all the and know that the shorter its use with actuator that actually compile link program to link them and that's what you know but this is right around the corner because it passes their wallets relinking the initiator and the you the it seems like a silly architecture but it turns out that's sort of the natural way to do things in direct access where everything is separated and you just of you construct shares so that the outputs of 1 in the inputs of the other are going to have the same layout and then at any time even arbitrary so they use these 2 together on the the functionality is available in open GL through a separate share objects and as soon as I get back to Portland from this conference on many get back to working on that so and bunch of those acts of will set that are much better experience 1 1 sack of their reflect steps of those assumed that the industry this is g wrecked I sell the has this 1 game rich times the 1st frame and then decide for past to animate us some the lips and face so the facial expressions of 10 seconds to compile them but not yet going to run the entire a sequence in reframes that's that animators we great and excellent and so that of course after that after the 1st frame that to 10 seconds it was hitting this heading off to the inner look ridiculous but just different question from a on wrote that I really picky it should say compiler as but project by physical at the TOC 1 to keep it updated on so sorry you but that vector in that time you the shining object was to use a LDM amplitude optimization what happened to that of so that LBM is kind of opinion the behind but the back and there's this kind of the big difficulties with that of the first one is their story for ABI is what's an idiot and that makes a big problem of if you want to be able to say ship an updated driver on multiple different versions of the struggles that maybe have different versions of LVM installed basically means and this is what so of especially on of open sea on imitation lots of closed-source drivers use all and what they do is just in they pick some version of the volume that they like to import all of its code into the project and statically build it into the project and the earlier license lets you do that I'm not going import LVM intimates that that right and so then it means you know that if we want to start using it means that we basically in the same situation that the radium drivers on which is you can't get an updated on driver on the previous version of Fedora because of the all the a mismatch and I can't do that so up here and then also it's it's really doing cogeneration out of LVM if your code generator isn't in the upstream all the entry is this kind of a nightmare for us so it's approach approaching interface issue rather than something and limitations we have again out that that's that's kind of the the the 1st deal-breaker on the and there that's just architecture they have for the for and the UN meets the goals of that project right so I don't fully understand why they want to do that and you know it had been proposed a few times that that Mesa should have a standard interface between 2 of the core part and and the driver part and we pitched a fit about that that suggestions are a group of some kind of the same reasons right that's not we don't want to have this API because we would have the flexibility to be able to take the stuff that's garbage and throw it out and do something that's better arm and and not at this you know do like we do with the kernel and say no we keep dragging this cold horrible interface around for years and years because it was their wants yeah at this at the of the center but if we don't have much the left the ICML I'm running out with the however it is the high-level question and were quite certain respects and shares use switch statements in a significant way he added I don't know I don't know for sure that we will all around the world capital fact that would make use of switch state so 1 thing that's happening is people are looking for ways to and avoid the overhead of switching shares and 1 was the people do that is with like who were shaders on and and if I of then who was that you that you could do that to be able to kind of switch between at a very coarse way between the behavior of different behaviors initiator would be with a switch statement that selects on a uniform for example I have so now that the consuls can do this in in a very credible way it's at least possible on there's a couple of ways that the people could do that I and the 1 that would be similar is using others of piece a deal for functionality culture subroutines which roughly like function pointers jail that you set as uniforms on there's not a lot of use of those either for a I think mostly the same reasons that the previous generation of of consuls can't do them all on so that it work and that of of a train in a transitional phase here where the things that people were previously doing because it worked on P S 3 and Xbox 360 biggest are giving up a bunch of those techniques because now they can do different things on on PS for an box 1 on and so a start seeing a bunch of shaders that you know next you will see changes that would really different from the ones that we saw last year and and trying to predict exactly what those things 0 look like is is hard OK this is the thing that is