Liberating the Tegra GPU
The Grate project works on liberating NVIDIA's Tegra GPU user-space components by reverse-engineering the proprietary drivers. This talk will discuss where we are and what the future might bring.
but it wasn't that right so I'll be talking a little bit about uh figure the my my and some other people's efforts to try to understand and its internal workings or rather the workings of used
that is this drivers over Tercel tolling but about land and uh have about 20 years of graphics programming
experience 10 of those years of and professional in his work at the farm and finds foreign but the mall a driver team making the drivers that look care has been so friendly to try to pick apart uh and while doing that I was involved in the development of the of the 1 . 1 and 2 . 0 standards the and can leverage reasonably active open source country uh might think a portion of that is forget but also the catchall part of this movement and I'm an active demo series and 1 0 so that it's and fortunate now because of the they don't hold is there's a lot of them was far more 0 I have released their lots of them was farms is the market what do you do might certain the right so on this is tightened great well tell a little bit about what greatest to so many Gramfort aggressive probably didn't seem some uh it's a so it's a method to reverse
engineer the GPU and eventually to create an open-source driver from it uh it's probably the furthest behind 0 for all these arm sucked that projects due to not very much time will available for doing this but and that will still you go earlier on the usual everything here is based on a reverse engineering so you don't the invest all their life savings into something based on something until hear on was mission might indeed be incorrect as so I'll told but about what has happened so far um the acts in the summer of false 2012 am I and heard about the the Lemur project and the IRIS ISI channel but it was so boring because I'm India tainted due to my arm worked so I can't touch their stuff so instead I decided or rather which commenced need to study the Tehran a and then so I'll just quietly hacked away and around for some last year I had a was implemented model is a monster capturing and passing 2nd pick it apart and see little something uh I did some many tools so she a database or ridges database for the some of the I phase this as most of you always Munch's statement but it's it's a bit difficult because a lot of this this and component of the shaders as well so I had a very rough fragment disassembler and I had the uh reverse-engineered engineered the proprietary compiler a shared object interface I can't compiled shares from my own programs without using the driver a and then I get bored border hits and kind of stuff working for a
while so I at that points over and just a few months uh anything among players 0 I was told by a luge that I had to get my ass onto our see because something some was sold doing something and those Teri and and it turned out he had the model that you and idling while I was procrastinating as the we had already I think been working on the on the limits the Journal drivers but he started acting together some Mallard dead DRM interface flat uh the replayed command streams and it is not on DTX driver and even did some work on volume I before mass and and then we have jerry hired by minutes so and slowly trying to kind of follow up on someone is doing uh but the mind that my biggest interest is the reverse engineering part and not so much the drug adult part because I've already each that stretch that arm so but I will I will you know give it a go there was also more terror and the Moore also work in the future is the 1 and the green to ship them back like that yeah and also I should mention that Arab plot to see this thing the ephedrine a project has also had some had a little bit of look at some of the some of the stuff the help of as well so the out the uh the current status as of now this this is you know the I so yeah I'm basically have been focusing on the table to the I'm take 3 and 4 I will mention a little bit about that it's so we have an monstering dumping uh we a replay that and some basic rendering rendering them tampering will of States to kind of confirmed suspicions so so we can we new you know a a little bit of so the a the other some upstream Linux the arm driver uh now it's a them as of think the 2 3 weeks ago or so it has a support for the so the 3 core the downstream mildly their support of the blue things here are links and i'd have the slides but of them up online so once make am and I have a very very unfinished Manzagol in drivers and can dual player and re pixels but again it's at least some basic
stuff working uh that that deals with no telling and these kinds of things obtain a 3 year since just working uh I don't know I don't really know this but it's I I haven't seen any free of differences in the In the like monster so it looks like it's just the same 3 the core a dinner for has something more registers uh that was found um some although and think it's strictly compatible as and I think the 1st commonsense did didn't replace there but there has been replaced some constants doing for however so it seems to be roughly firstly compatible they won't have you heard about those of you who were in the lab and the previous talk from the heard a little bit about that it's have embraced and it's only has the GPU or the 3 d core from Keppler uh rest look to force and infrastructure since the bill lacking there so I have not done the work gets down and focusing on the 3 core on the trigger so that the 1 that's being replaced earlier this is something that will have to deal with well let's discuss that later yeah and and by violated to we work with the gas and some things that's yeah so that I will talk more anything more about the the K 1 and this so this is just a short summary alright so it's as them all but apparently we can't stand by and not so good at reading e-mail so I missed the the tiller can connect staged in my equipments to the big screen and my FIL platform doesn't have an external monitor so then I can show you anything here either yeah so I don't have a screen to show it on either so that and so trim slices unless the lines so I'm sorry about that would it would have been too exciting just like a rotating triangle the progeny so it's it's disk with that a quick little overhead so yeah so so imagine that triangle rotating areas at so a hold all go a little bit into into the hardware so as a few short all real flow the GPU is that it's yeah its name AR 20 from them and this is yeah deeply concentrations above medically
endowments and columns of where it's an immediate about runner that it consists of at least 3 components of the graphics to the at 3 4 and then there's also some kind of linear systems the encoder decoder I don't really know aspects but yet the plants are a programs the so the this could components of program through something called post next which is a DMA engine for writing register so to the gym of drives the interest the GPU and it has a proprietor of Julius right which is useful for reverse-engineering generic how I guess of the G R to the is it's a and it's public in the preventive great so this means that everything that needs to be doing and dx driver for instance is available and it's a sign of and we tend to an end user license agreement but it's it's not the 1st the they have source code available for our further doing some weights and so what that what it can do and a lot of different things related blitz fills with a note telling patents candid telling our linear and linearly addressed sources that are surfaces it in this the stretching rotating flipping uh but only like Chennai and as a source of what was knighted degrees of rotation of some kind of blending uh and coverage sampling and analyzing results which is but interesting I think interrupt 3 that was 1 so it can do basically it's capable of doing all the all of the 2
D operations from jails a conditional blending and and uh for both would must stencil masking with the censoring and all these things so that's quite capable um the other and J. our 3 D this the at the difference uh it's the it's a non unified China so that that it has a separate predictions shaded and bridges and fragment shader by saying and and it performs blending in the fragment shader rather fixed fixed-function circuits which means that when entering Polish old time it only has a 16 bit depth buffer but it seems to were for added support for the top 24 bits as well the this last 16 rent render targets including the from sensor which is quite a lot there supports occlusion curious about quite a bit of uh texturing new features although the they do have some kind of non-porous to textures which it's not so nice or other it is only supports all it supports that without the without the mic maps but also with the with MIT maps but them if you run if you try to access outside of the 0 1 range then you get an undefined result so we have listen separate extension file and yeah the standard derivatives and the area quite interesting I think draw path extension which is for drawing a 2 D graphics with there's a person things like that I haven't really the dig into that it's so it's an interesting feature so it might be that this this this feature softer only from the from the driver the rights uh there's a video product but that I haven't pertinent out of it experts on try that either found wants to have a look at it FIL free right so and the vertex shader and instructional of instructions that is basically a subset of lay and the 30 which is so that means that it can do for component vectors of vectors for the normal Alieu in instructions or or a single component of scalar as a view instructions and it seems to be able to those in peril although I don't I think I've ever seen the compiler generate that are there it has no so the control flows all loops needs to be unrolled but it's very straightforward to generate code for it looks basically exactly like GTG aside so on done and there's already a back end the 9 9 of all SMEs are driver so for way to share some code there and yet the fragment shader eyesight plays a it's quite a bit more interesting flesh praising uh so all the registers are either what representing 1 20 bit floating-point value or to you 10 fixed the point on that so 2 . 8 6 4 uh fixed points signed the around by there are at least 3 separate instruction stream so there's the the probing the shader programs are not so 1 stream of instructions there there's 3 different at least that suspect there's 5 but I haven't really figured out all the details but streams under the there are some additional comes control streams that kind of define how they get executed or scheduled uh with each other so there's the idea of the alias Trinity or something much unit when there's the multifunction units which just both varying interpolation and Comp what function evaluation so it does to be going in element was very so the triangle war and orange uh do sine-cosine these kinds of things and it seems that so those 2 or not executed in the same clock which so it that instruction artists room for encoding both kinds of instructions in the same kind of words in it seems there are securing the 2 different parts those units and then there seems to be some kind of export unit for writing the have the the color to the color buffer and dispersed some not 1 so it was a sudden shared that spill registers that when that man even more stuff starts the show up but I haven't really understood that pocket I am there's no control flow whatsoever so moments enrolled all conditional just become my other conditional selective predictions so the real you and unit it's pretty well understood and something like this the few bits that's on this there's only a few bits that i've spicy pop up that I don't not yet you feel I have a good understanding of what it means and instructions coming packets of 4 no 64 bit words of which I there are 4 scalar operations and or the 3 scalar operations and the up to up to 4 embedded constants so that's either you know that to 20 bit floats or more that force and at this point was and the other is that it is basically a glorified mad can do this that 1 this nation 3 source operands this is so this is a an example for for the this the mad instruction in some bits to choose uh some of the different modes
words where's the the component before going at the change the order of sort of adding so if you multiply before it adds these kinds of things that I'm and before the and also are saying that that seems that seems to mean that it accumulates the results so of this nation registers come adds to that and when it does that it uses some kind of special accumulation registered since and it's also has many maximum conditional select and I think is a minus sign in so far as that's when you do with change the man to man is still get the the see stuff so still tries to add seen so it's sort right so we just you just posterior inference instead of smack problem of yeah and older although aliens options are a pretty good bits so so there's some condition register that that it sets it and that connects to circuit conditional this summer yet the results can saturate the source of and absolute on negates of this seems to have some more under but I can't see it the of the multifunction units this problem based on this paper and called the high-performance error efficient multiple function and litter which is some and research paper and it proposes the idea of merging with varying and the the and the public's function evaluated the on the so the complex functional inter part is pretty much a must and it's very simple just takes they register and in in and its rights to this nation of the result to the same register as the source and thus these operations um in typical and it is style there's 2 steps training him and also the so the exponential function so I haven't served and study and what's but intermediate result is but it doesn't center dozens into require an extra extra values and things like that so it might be brain reject range reduction or something like that the can I move but the varying right is still emit mistress that as C where by which varying its and splits and in in its instruction that can figure the figure out which registered at rights to and I know I've had had some friends look at this rope cockerel also had a look at it Mr. of haven't been able to figure out some mystery soa with if someone else wants to have a look then that would be also yeah only texturing construction is so somewhat understood it's it seems the encoding is really simple the but it's not entirely clear how it passes the Fisher kernel Kx uh that's not so much that it's it's a mystery but more that you know I haven't really felt like working too much about without giving the various working because looking at the same the same but the the Fisher kernel for an entire triangle is extremely little interesting this is yet but it in simple and both wickets of 2 DAQ maps so I compose onto the exact same binary code so it seems the the harbor deals with all the time all the suffering so yeah uh listening to have friends and normalize the came up in the test some Isaias required and then there's the export as instruction that I'm the fund which which ran during this at the rights to but stressed this just looks like garbage 2 minutes of so I'm not wrong and not really to spilled that either yet so as you guys problems and then this is rough stuff from and the hope this the the 1 of uh there's something that needs to be done and solace is just implementation working mothers eyes this song reverse-engineering so that we have eventually Darren patches that should the upstream and uh and yet finishing out the kind of the details uh we should probably have an x or dx driver finished their thesis stub driver but it so person anything but with except from mode setting against so does maxilla it in any of them drawing I know the and so let's say this is the year from its origin to is with much documented it's a should be pretty easy to do all of and we have a working lived idea and so so it's made mostly kind of knowing really how indeed X and of his works I think you're who just the thing is that I don't know the acts Dudek stuff so for me it would be quite a bit of reading code to understand all and on yet numbers they're reverse-engineering and the varying rights are but as the storm history with the variants uh exporting and others is going which happens so it happens if you didn't use of registers and they also seem to trigger the same kind of good of if you use more than more than 32 uh uniforms because there's only 1 uniform of load of this and there's some weird stuff there I don't really see where the the extreme uniforms gets sent to the harbor at all but it's does work it is he does give the right results so of French so maybe maybe there's something wrong with Mike monster and stuff there down there's thermistor Gollum driver and that's a huge the huge which his work and it's probably going to go on for years so and that's pretty much my talk the some questions but
so that the the that they have helped to those the holds and we use with the never drives but it's so take rows and kept and we'll relates it it's not the about technology it's because think notation terrorists so apart from the vertex shader being there and being many 30 basically the vertex set that is based on data of rigid I say so it's yeah yeah believes those even than the bit fields are exactly the same yet so it's so uh so it's the insertion and I'm pretty sure they just bolt along the kind of lecture after the and kept was the desktop technology not related to the occur yet well then you about the traffic of the K 1 he's just get but it has nothing to do with the old old up to of the of yeah I should probably start using ATR at a ah Twente rather than Tegra because it's so there is a product line and not surge if you let them marketing people here who the and then haven't been keeping up to date with that of what entity does and does not release I think and then you go side they started to slowly we share or act you know start with with the community I thought it was an impression that I was reading that Tegra was more open source in any of your other work has been done so far so there do you reduce the only thing is map of source and their associated so at the terror figure has great support in the and the channel become work this that they they do excellent work and and and support 1st purely uses a set of missing base is missing and the and I'm I'm talking a bit to some media people and and and trying to get some agreement to get some information out but I don't think there's any chance that they will not the driver sir or the documentations the any other questions yeah he said the most enjoyable plot of the project view will CEO reverse-engineering part of working on the kids that could be accused of going through you'll work flow the tool to use that part of the project shirt and live I should've spoken a bit above that against uh we have a we have a bunch of tools uh knots and where as sophisticated as the the walls of forums around and things that but do you have a an repulsed or equal greater use of the the uh under the a logical organisation great best driver that that contains a of some library toothpick into OpenGeo programs can lD preload and it will die out the command streams status and the primal and besides from that the and we have a an disassembler for the vertices and fragment shaders so that that actually takes geocell source source code compulsive with them on Torre compile and on out of this Assembly afterwards like the compilers a bit crazy because the error increases untraditional in a way that its output into binary is actually a command lists that you submit to the to the on hardware to countries so it's not so it's not just the instructions streams and you actually gets the whole thing to and yet there's some other I have I have a introduce himself or the lower than the tools so as indentation for registers is there I'm planning to some that up streams at some point because I've I've also added some personality them tools for instance speeding 1 of its working point right so that so yeah that's I think that's pretty much it for the tooling the around thing want to yeah so it and then the workflow is basically writing small open Delius programs and the looking at what they're what they're doing uh of what the output is how it changes when I change that sequence of commands I am could go into it more details of where what exactly the upstream of looks like instance is exhausted the X. driver that you have on the to do this is that the except 86 the dual of what went on to read that like problem over you get development happening at the same time now a hazard that that there's nothing has happened all index driver essence terror terrorists work ours since gets hired by so high that their arms there has been a little bit to talk from a video about put the possibility of open sourcing their existing integral driver so that might happen but then I wouldn't have my hopes to high that either and what about you and what does that relate to like you could have great project yes so others to the strong to 2 different did Henley Jerrome's form for terror 1 medicine and 10 by someone at a medium for for the kernel of to testing the kernel and then there's another 1 which is maintained by Oleg media our other TTS tearing now and began the 1 doing the most in this Muslim that's now so there's what the 1 on the great product project which I think it's a cleaner and a approach to its the right and the other 1 has a higher level API that is a little bit too high level I think and interval so you have something in there are in the in the repository there's some tests for doing some weights and stuff like that with the the 2 D units of that some functions from injecting the right stuff so I think it is I think that the DDX stuff for someone analysis the driver should be fairly using binding to the extent analyzing and suffers easier if that's that the it's a it's old known problems really and the are it is that that causes like waiting design decisions that are in fact waiting on the reverse engineering of some of these open issues to complete is that something where people can actually help out or you already have like a timeframe of would you could if you don't know when you might be available and none of this is that there's no kind of project management of older so just some repos and and coach and the uh so if someone wants to pick something up on the free to that uh I would appreciate if they let me know that they're they're working on it so I'm not spending time on for something but apart from that so get there's basically me and uh there there's basically just me doing something right now that seems to be that can be ta