From Silicon to Compiler

Video thumbnail (Frame 0) Video thumbnail (Frame 5515) Video thumbnail (Frame 8494) Video thumbnail (Frame 11400) Video thumbnail (Frame 14957) Video thumbnail (Frame 16219) Video thumbnail (Frame 18932) Video thumbnail (Frame 22124) Video thumbnail (Frame 23390) Video thumbnail (Frame 24423) Video thumbnail (Frame 25567) Video thumbnail (Frame 29927) Video thumbnail (Frame 32161) Video thumbnail (Frame 33852) Video thumbnail (Frame 34836) Video thumbnail (Frame 37675) Video thumbnail (Frame 41331) Video thumbnail (Frame 51775) Video thumbnail (Frame 53608) Video thumbnail (Frame 54740) Video thumbnail (Frame 57079) Video thumbnail (Frame 59323) Video thumbnail (Frame 60463) Video thumbnail (Frame 62642) Video thumbnail (Frame 65456) Video thumbnail (Frame 66552) Video thumbnail (Frame 69382)
Video in TIB AV-Portal: From Silicon to Compiler

Formal Metadata

From Silicon to Compiler
Reverse-Engineering the CoolRunner-II Bitstream Format
Title of Series
Part Number
Number of Parts
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Programmable logic devices have historically been locked up behind proprietary vendor toolchains and undocumented firmware formats, preventing the creation of a third-party compiler or decompiler. While the vendor typically prohibits reverse engineering of their software in the license agreement, no such ban applies to the silicon. Given the choice between REing gigabytes of spaghetti code and looking at clean, regular die layout, the choice is clear. This talk describes my reverse engineering of the Xilinx XC2C32A, a 180nm 32-macrocell CPLD, at the silicon level and my progress toward a fully open-source toolchain (compiler, decompiler, and floorplanner) for the device. A live demonstration of firmware generated by my tools running on actual hardware is included.
Slide rule Presentation of a group Digital electronics Code Multiplication sign Tape drive Function (mathematics) Black box Traverse (surveying) Number Web 2.0 Helmholtz decomposition Logic programming Energy level Endliche Modelltheorie Computer-assisted translation Firmware Computing platform Computer architecture Social class Demo (music) File format Block (periodic table) Software developer Mathematical analysis Electronic mailing list Bit Directory service Compiler Process (computing) Software Block diagram Drill commands Logic Smartphone Whiteboard Quicksort Musical ensemble Table (information) Window Reverse engineering Surjective function
Axiom of choice Open source Execution unit Online help Food energy Medical imaging Spherical cap Single-precision floating-point format Logic programming Energy level Computer architecture Exception handling Enterprise architecture Block (periodic table) File format Binary code Bit Line (geometry) Compiler Configuration space Natural language Bijection Pattern language Quicksort Reverse engineering
Computer program Digital electronics Multiplication sign View (database) Canonical ensemble Product (business) Subset Number Prime ideal Propagator Term (mathematics) Natural number Operator (mathematics) Representation (politics) Nichtlineares Gleichungssystem Series (mathematics) Data structure Logic gate Form (programming) Computer architecture Identity management Sequence Visualization (computer graphics) Network topology output Summierbarkeit Right angle
Computer program Building Evelyn Pinching Video projector Multiplication sign Function (mathematics) Field (computer science) Product (business) Power (physics) Number Quadratic equation Medical imaging Mathematics Term (mathematics) Cuboid Arrow of time Series (mathematics) Social class Personal identification number Covering space Scaling (geometry) Block (periodic table) Electronic mailing list Flip-flop (electronics) Vector space output Finite-state machine Pattern language Right angle Reverse engineering
Functional (mathematics) Shift register Function (mathematics) Mereology Theory Power (physics) Heegaard splitting Term (mathematics) Semiconductor memory Contrast (vision) Data structure Logic gate Form (programming) Capability Maturity Model Area Pairwise comparison Block (periodic table) Cellular automaton Data storage device Total S.A. Flip-flop (electronics) Bit Process (computing) Symmetry (physics) Logic Configuration space output Right angle
Computer program Computer file State of matter Correspondence (mathematics) Adaptive behavior Flash memory GEDCOM Heat transfer Function (mathematics) Mereology Metadata Bit rate Semiconductor memory Analogy Compiler Cuboid Convex set Series (mathematics) Data structure Booting Address space Physical system Shift operator File format Block (periodic table) Cellular automaton Mathematical analysis Bit Line (geometry) Personal digital assistant Logic Configuration space output Right angle Quicksort Figurate number Row (database) Relief
Area Gray code Clique-width Block (periodic table) Code Cellular automaton Multiplication sign Flash memory 1 (number) Bit Total S.A. Insertion loss Cartesian coordinate system Product (business) Logic Term (mathematics) Configuration space output Right angle Address space Row (database)
Greatest element Game controller Functional (mathematics) Multiplication sign Correspondence (mathematics) Function (mathematics) Heat transfer Mereology Number Subset Prime ideal Fraction (mathematics) Term (mathematics) Semiconductor memory Network socket Programmable read-only memory Selectivity (electronic) Data structure Information security Address space Block (periodic table) Cellular automaton Structural load Bit Line (geometry) Type theory Personal digital assistant Logic Buffer solution output Configuration space Right angle Figurate number Quicksort Row (database)
Block (periodic table) Cellular automaton Image resolution Multiplication sign Inverse element Theory Medical imaging Bit rate Symmetry (physics) Vector space Buffer solution Cuboid Quicksort Inverter (logic gate)
Game controller Group action Divisor Block (periodic table) Correspondence (mathematics) Cellular automaton Bit Function (mathematics) Mereology Frame problem Power (physics) Subset Sparse matrix Logic Matrix (mathematics) Right angle Quicksort Inverter (logic gate) Set theory Resultant Row (database)
Computer program State of matter Code Multiplication sign Function (mathematics) Computer configuration Videoconferencing Matrix (mathematics) Cuboid Social class Personal identification number Boss Corporation Block (periodic table) Maxima and minima Bit Lattice (order) Connected space Data mining Message passing Hexagon Drill commands Order (biology) output Configuration space Right angle Ranking Quicksort Writing Directed graph Row (database) Point (geometry) Dataflow Slide rule Implementation Functional (mathematics) Open source .NET Framework Streaming media Power (physics) Program slicing Energy level Data structure Firmware Booting Macro (computer science) Set theory Mathematical optimization Addition Focus (optics) Graph (mathematics) Matching (graph theory) Information Cellular automaton Physical law Flip-flop (electronics) Line (geometry) Logic Table (information)
NP-hard Computer program Group action INTEGRAL State of matter Code Multiplication sign Source code 1 (number) Cyberspace Function (mathematics) Disk read-and-write head Mereology Logic synthesis Logic gate Personal identification number Mapping Block (periodic table) Sound effect Maxima and minima Bit Entire function Connected space Radical (chemistry) output Configuration space Right angle Figurate number Quicksort Arithmetic progression Resultant Point (geometry) Slide rule Game controller Functional (mathematics) Open source Complementarity Number Product (business) Revision control Propagator Term (mathematics) Robotics String (computer science) Data structure Mathematical optimization Form (programming) Capability Maturity Model Default (computer science) Addition Standard deviation Scaling (geometry) Cellular automaton Projective plane Flip-flop (electronics) Compiler Logic Graph of a function Object (grammar) Library (computing)
Point (geometry) Functional (mathematics) Video projector Code View (database) Multiplication sign Streaming media Product (business) Bit rate Term (mathematics) Matrix (mathematics) Cuboid Energy level Set theory God Demo (music) Cellular automaton Feedback Planning Physicalism Bit Radical (chemistry) Process (computing) Computer science Configuration space
Functional (mathematics) Uniform resource locator Block (periodic table) Cellular automaton Buffer solution Electronic mailing list output Right angle Function (mathematics) Inverter (logic gate)
Functional (mathematics) Block (periodic table) Compiler Fitness function Right angle Rule of inference Mathematical optimization Number
Computer animation output Configuration space Right angle Object (grammar) Function (mathematics) Table (information) Rule of inference Connected space
Standard deviation Functional (mathematics) Cross section (physics) Assembly language Computer file Multiplication sign Bit error rate Sound effect Planning Bit Function (mathematics) Mereology Logic synthesis Thresholding (image processing) Measurement Web service Computer animation Symmetry (physics) Different (Kate Ryan album) Analogy Software testing Right angle Data structure Set theory
Covering space Point (geometry) Slide rule Logic gate
Point (geometry) Functional (mathematics) Flock (web browser) Matching (graph theory) Mapping Cellular automaton Projective plane Combinational logic Mathematical analysis Bit Subgraph Line (geometry) Mereology Compiler Product (business) Isomorphieklasse Term (mathematics) Computer hardware Energy level Cuboid Figurate number Abstraction
Computer animation
Hong who uh who were there a thank you but if the issue here is a half half a year if you know that good but my talk is called from so counterpart and compiler and it's pretty much back they're
going to start off with what we're doing and why we're doing this the little bit of architectural background on program logic for those of you who have not done work on programmable logic for and I jump to a block diagram of device will start with the a high-level overview of the solar continental drill down into more interesting stuff down to transistor-level and gate-level circuit analysis at the end I do have a live demo of firmware produced by my tools running alive silicon deadly there are no cute cat pictures this presentation sorry but little background about me I pretty much like to build and break everything I will do web stuff they pay me enough I mostly like to live down in low-level a embittering 0 0 firmware low-level board design RTL design and now getting down at the transistor level I just finished up my PhD a few weeks ago during that time I designed and created what I believe is the 1st ever college class on semiconductor versus an airing I tried quite hard to find another 1 that I could borrow notes the slides from as far as I can tell nonexistent but and I am also obviously a significant that contributors to a conference of the the the guys walking around with the captured their shirts say hi the this is is I have only been with active since January our models were presented on before I joined the company but they've supported me continuing the work so I diagnose them about this later so as far as what Raksi trying to do and why probable logic is really everywhere these days I you may not know it but a lot of especially high end networking AVE all sorts of stuff has probable logic in it because 8 6 these days the tape courses so high especially for leading and process nodes it usually is not actually effective may customs O'Connor unless you making a huge number of their way for a smartphone or something for anything else you're probably better off running on at the Jet the and the problem is therefore black boxes nobody really knows what happens when you compile your bitstream onto advise you know OK well we got these lookup tables we've got these RAM blocks but was actually going on underneath how do I know that the archaeal I give the compiler is actually equivalent to what the actual devices behavior we don't know or how do I do development on a platform that is not x 6 or x 64 windows or 1 of very short list of Linux distros they support so I unlocked and let's say we think we found a compiler but all we just pop the codon ensemble look at note there is no decompose the bitstream so if you think it's generating that code sorry screwed and of course reverse-engineering this is recon the vendors want people to think Bitstream reversing is hard and they advertise their closed-source proprietary format is being impossible to traverses there it's not tell as far as the methodology here I just decided earlier today to take a look at the size of the directory I had installed the Xilinx tools 18 gigabytes ever let you but I don't have time to look through 18 gigabytes of Zephaniah and I there are much better thing they can spend my time on N plus the license agreement says I'm not supposed reverse-engineering the software Evergreen output sort of
tell open source that is necessary but the so this is our
target is the Xilinx X C 2 C 32 way I chose it for a bunch of reasons 1 is that is very cheap for about a dollar 15 each undertaking in single units as of our couple weeks ago I think the place just 1 of to adopt something but they're still cheap enough you can afford to kill large numbers of and that's that you have your choice of a nice big Q of the package that's easy enhances our has plenty of plastic around the patterns with the cap while keeping by wires impact or you can do it should still be due to us that it is 1 of the things passenger revolve plant and take a look at the sort of it's a nice friendly 118 enterprises are for metal you and say so you can actually read out most of the upper middle layers optically you don't have to bother going into outer microscope except for the lowest layers so that helps it's a lot easier to use after to microscope this senator like workers should a lot images quickly the bitstream is also not that be around 12 k so there's not really that much they're reverse and it also fairly simple architecture you not gonna find block RAM then embedded on course market stuff like that it's just pure program logic and the vendor tools are free which always helps establishing so the bitstream format for this thing at a high level the help of the compiler is a genetic problem violence basically the equal of i have access but instead of being axis-aligned it's binary lines and you on zeros so this is the exact data like it's written to the chip not in the actual order of that later but is there is a one-to-one correspondence between these 1 zeros and the configuration that's on devices and the nice thing is that as you can see on the right side here the Bitstream generated by the Xilinx tools if you're so inclined does have comets turns out they're not very useful because OK blockbusters EIA well what is the look at energy there is not 1 references OK but really help me much so let's give a little bit
into the architecture of what is the p of the actually is so the of any digital equations can be expressed in a form known as sum of products that it's a canonical representation of visual equations in which you take a series of terms you that wise on them together either the input or its complement you just read the complement as coming into your driver license so you've got a and B and C or C and the data so you can express any arbitrary digital equation this why to so look at how life's an accepted you
that's so if we have a large say 32 input and get and we have a marks the input to each gate or at the at the gym the gate that lets we select between a constant 1 or the input my circuit I can effectively mocks any subset of these imports and just leave the others 1 which is the identity for the why the and operation so I can aren't together any subset of the possible inputs to this and get the same thing can be done for or obviously have you 0 I the identity of the white or the what this means is you can create a date that has a huge number of inputs and I can pick any subset from both at 1 time of matter so this leads to a natural structure for program a lot right you make a great of gates with through 1 marks each input you've got inputs coming in and out of going down and then just have a series of the dates in sequence standing stuff together so this is actually render from I 2 weeks you've got the input coming in and we select either X or X prime in that with the coming in from overhead taking up of that now here we skip the and gate this is morphological views or not actually showing the actual martyrs chip is more of a schematic view of what the circuit logically does I'm also rendering it as a cascaded sequence of gates and and in practice the actual implantation of silicon is usually a tree because nobody wants war and propagation delay when you could have a log instead so if
we take this and we want to build a full provides price what we do is we take a bunch of signals coming from our registers that's the worst thing we take a bunch of signal for input patterns then we feed these into 1 on the right we invert them so we have 2 times how many inputs we take em product terms out of that the product and then go to probable or at the output of that gives us with a score are outputs then we feed those either into 1 output pin or into a flip-flop when the state machines or something like that the so this is really all it takes to build a simple program a lot devices are S P the problem is as POV scale poorly you end up having art quadratic scaling with the number of inputs nobody wanted I'm still in size scales quadratically with the amount of work it can build a terrible so how can improve this well turns out we can make a great of small of these I so I did not show up as well as I hope in the projector but we got 1 s of the block here about 1 here across 4 switch the metal so now we can create a bunch of outputs from 1 as the of the bunch up from the other and then just feed them all into 1 big rotten fabric pick out whichever signals I'm interested in passing on this half the chip or that have the chip and then feed them into the field don't this is the building so now let's look at the specific CPL-V that wouldn't targeting and so there are 32 GPI opens which are full input output plus 1 input only 10 I'm pretty sure this is because they intend the pack reside in a 44 pinch your P package all the other stuff for j tag power and 32 GPI arrows use 43 pins the 1 have 1 and the so they just through 1 extra signal and the well rounded you can't drive and just serves as an extra input and then the remainder of the device is to function box which are basically asking all these each 1 has 16 GPI opens 16 flip-flops of the 65 signal global routing so 32 I O's 32 flip-flops plus 1 input only 65 we pick 40 of those the those 20 vector and arriving at 56 by 16 or I this is all about that was not document it is OK we can make technology mathemat list we know OK I have these inputs and together these outputs were together now how actually make that should do what I want but
don't panic Amalekites I am not going to talk about the capping imaging this has been beaten to death a talks like recon last year and I think the year before that and probably the year before that so we're not gonna cover that there's a lot of this stuff up on silicon prawn there's out during the lecture notes from the class I taught RPI last year there's a lot of material there but this talk is about reverse-engineering not all sample preparation were not going to be teaching you how it works were teach you how to actually reverses you this specific device
so here is the metal for a review device we can see that there is a roughly a left right symmetry is not exactly the here something that's not near here but up here it looks like devices pre symmetrical so 1st impression we probably got the 2 function blocks left-right-symmetric on devices the let's
go down a little bit after we've actual of all of the metal and power wires were now looking at the input layer this is after a process known as which stains p-type doping either brown or upgrades and election microscope are mainly though it's useful because it provides contrast theories of gates tell you you've got pairwise maturate here and here those are probably function part to something down the middle that's probably around then there's a large memory right here pretty obviously the from where the bitstream started then often we actually follow the bond wires from the guy out of his the package you can see these are the j tag pins TI TMS TCK I believe top-to-bottom BTL pain is right there so we can conclude that the játék shift register probably runs left to right across this configuration area and somehow allowed him to write this E prime here this also few small pairs as 1 of there is 1 of their I think is 1 of the upper left is a total of 6 I have not yet figured out what the still it was necessary to reverse the portions advice and he needed so that remains future work it so I think closer look at the function block so we know the 16 matter cells each megaseller contains 1 flip-flop in stores the output of 1 term of your A. and has a little bit more glue logic and the fact that we're on so there is 61 symmetry here 68 and copies just by looking at this we can be pretty sure we're looking at the matter then we can be some structures appear here this looks metric this looks symmetric this looks measured by different so as it turns out the end right not actually 1 solid blocking and or a split have that got 20 signals here 20 signals here collectively that forms for E. then they and the outputs of those 2 individual blocks together that gives you are part of terms and then that goes in the right here in out cells so now let's take a quick look
at a configuration that structure before we dive into detail by analysis so the programming documentation does talk a little bit about how the devices put together it turns out that even though the GED format is supposed to be something you can actually just feed to the proper remote device that's not the case here the bit watering the GED file is actually a virtual addressing in which they abstract away all of the quirks the Basilicata so for example all the and right this in all the or edits on the same ordering GED file for each 1 turns out half the actually and or a box on the device are mirrored left right yeah so you actually have to do address translation before you can take a bitstream generated by their tools and flash the check what would they do sort of document this there's a big excel file they published 2 people who were making people adapters that that is just a big grid of output and input and just has a intreate cells is OK which can from the judge file goes this physical address so does really tell you much about it but together on the actual structure is 48 rose by 260 columns there's 1 extra row configuration metadata I'll get to that later ah but it stores the 7 locked it's which some of them are always 1 entirely sure what purpose they serve the remainder of the 1 of 4 Analog devices and 0 for lot ice then there's 2 Don bits which indicate OK this this stories about we have a legal firm measured shifts been fully flash and it turns out also only 258 the 260 columns are usable the remainder are what we call transfer that's which as far as I can tell you just put them as a constant 0 just indicates this relief from the accident program we a about Africa programming are there is some documentation this in from Xilinx but didn't really give enough detail figure out OK what are they doing this and the other into which we get from this that since it is the convex then f f is going to the state of memory which it is black so therefore we expect most of the memory on device is going to be active while so most things should be turned off when the data is high and 1 that as well we don't know exactly what that does that so
now think about the actual die structure so you can see that the configuration memories not actually 1 block of 11 right here 1 here 1 here 1 here and 1 hiding all the way over here and as it turns out that the size of the system to about the center of and they go directly up the corresponding logic so it's pretty obvious without the memory that configures that part of a check the data just flows straight up during the boot process so this configures the and your rate is configured overriding this figure these maps of this configures these matters I we can confirm this if we look up at the metal tool you can actually see the lines coming off a sensible far prom going up and vertically into the Orion you trace it out all the way you can actually see it right in the individual configuration as for himself are there are a few bit lines here here for example that are connected I believe metal for reasonable metal to liquid federal out 1 wire so now think with the main logic
right if we actually look at we're the airstream cells are and count how many rows high each individual block we see that the raise 20 rows tidy or a zebras high ever have 20 resigned in each macrocells 3 side to like is a pretty good idea but we know what left to right which bits in the bitstream could figure which logic just by what's physically proximate to it and we can make a pretty good guess that since we have a 2 D S from right here in the configuration area we have a 2 D flash right we probably have either positive on the bottom address this axis and Gray Code going on a little more complex than that but logically we should have the other thing going across the
so if we take a closer look at the end right we know that they're 56 product terms we know that there so 340 rows and 40 companies abroad for a total of 80 if we actually count how many from cells are will see there's 1 12 that's why 256 times till this 2 losses 20 rows each to the conclusion is that since we know the story inputs coming in from the crossfire with got to block to reach 20 rows high each row probably corresponds to 1 relevant and then we know that the width of the right is double 56 just probably 2 bits per product terms 1 selects x ones left not actor maybe some survivor code and it turned out is 1 coated I just figure this out by experiment was pretty easy just try 1 see if it works not try something else is only 4 possibilities the so the or
a turns out to be 56 and terms 16 outputs if we count configured to still want all that's why at the same coming up at the prom but it's only 8 rows time and is only 56 inputs to see or a is not have an X and X prime and put my conclusion is that we still have a one-hot selecting for 1 particular input but we do have to actually raises the or a interleaving want figuration I got a little tinkering with pretty simple to figure out the actual that ordering to health include the macrocells
cells there are 27 configuration that SuperMac self looking at Cedars 160 atom so 160 as from cell and so on there's a 9 by 3 great it turns out this number 3 grid does not actually control 1 macrocell the bottom 2 rows of this RAM control the bottom 1 this goes to the other 1 this awful that Mr. look at the socket but it turns out that yes it is 27 so yes it is not by 3 no the number 3 structures that I cannot correspondence please so this makes intuitive sense the promise of bottom I use while case and that's why that we know 1 of Mr. transfer so the other 90 bits go directly up to here I figured out a significant fraction the functionality not quite or there's still some parking stuff from a unsure on right now are these also configure IL buffering and stuff like that so there are lines coming out from the east to the side of ice and stronger buffers now I will as as I jump and the security bits we know there are 19 done bits and lock bit somewhere on the devices we know that the physical address of these is in the right hand man so memory it's in the top row there's 960 from cells right here don't appear to be hooked up and he actual logic or I have not actually tried fitting these I don't know for certain that these allotments but it is pretty obvious especially when we know of those 9 bits for them have to be held load a lot advice there's a four-input nor it right here and the right there bimodal happen like that 1 and now to also now we get to the global routing so we know between the left and the right halves of the end rise there is something we know that it is 20 configuration the type 3 chaff we know 16 bits wide we know anything more than that the day she has about 2 sentences part global around tell there is given no idea whatsoever how it works and what we do now is that of those 65 signals coming in we know that even go to the left function block 20 forever right both subsets may may not have any relationship and we know since 16 that's why we probably have a bit selecting workers left a but like workers right but it is questionable how do you make is 65 to 1 March of 8 that's if you just had 0 255 you'd only 7 that it 155 it's to what sort of strange perverse coder using here the data she is of exactly know how well done
very so if we jump in the election microscope and take a look at the implant whereas the datasets at you see here's here short raise those are p-channel these are and telling you that it's possible to lower the rate theories are the individual channels the fats this is zoomed out the original image is a lot higher resolution but I can't really fit the practice cell I This is the amplifier liar what because the was that the some sort of symmetry going on both horizontally and vertically if we jump of the
for we see that there are 6 small busses of 11 signals each the rightmost 10 by 5 times 11 plus 10 was that anybody who divide to embarrass 65 signals on the well routing I think without them but so I spent quite a while instigate vectorizing this my automated tools are not quite as well developed as I had hoped so it did take me a while but I do we now have a full vectorization of the entire global routing matrix from implant where all with for of the little gray boxes are not actual laying out the standard cell outlines so that any other inverter here another inverter here another inverse here and so on so we can see it there 6 identical blocks going left to right here there's 2 we're blocks over there that are identical to each other at 1st glance but not identical to rats look at this study interested so those big drivers any side are pretty clear the buffer driving POI was gonna stand out 112 we need a fairly big buffer are there is a believe 20 some
fingers on this inverter so that looks like I think driver i it's actually a three-stage inverted because these low-level stuff in the probable logic doesn't actually have the drive current implies that much bigger Parsons so they actually have a three-state driver inverting it inverted again increase the current inverting it again increase the current once again taxi driving up of those 8 blocks each 1 contains 2 s from cells so it seems to make sense that we've got the blocks we know that we have 8 bits controlling the best output we have a pitch control right up so we probably have 1 bit per block controlling what was left and when the control workers right then the 6 identical blocking the 6 groups of of wiring up a metaphor tell it's probably some sort of corresponds going on here again we don't know what it is but we know of some relationship if so now let's take a look up metal frame so we can see there's power ground routing here will order all that for now but there are a set of 1 2 3 4 5 6 the has part about what's more interestingly there are only 6 years per row but they're not on the same plate tell there it is
if we actually traced out for all the rows each 1 of those the as is under each and our 4 groups there is exactly 1 via in each row column intersection but not the same place so now I finally realize what is going on the right matrix is not actually for crossbar is a sparse crossbar bond which of the 4 year olds each RAU can only pick 2 sets of the 65 and because if a subset for each 1 and the end result is that using all the subsets you can select any unique subset of 40 of the 65 but you don't actually need of automatic divide crossbar the so now let's take
a look at how the information works we have a pretty good how just how it is structured a high level we don't actually know the implementation to know I do apologize for some the missile alignment this was a quick tracing and so not all of the videos and the where's line at exactly this is not meant as a meeting called chip from Smith is something I can figure how works from so you can see there's a big pass-transistor here we've got the signal coming in from the upper layer but here and here and then we've got I output the marks here there's disaster so that goes through a two-input NOR gate and drive that so if we put this all together what we see is that each row is indeed an 8 1 try the box much so there's 1 pass transistor that's about each of the 6 possible outputs for out then there's a single discrete en masse nesting of the 3 p master less we drive constant 1 a constant 0 as well I turns out that the driver constant was active high all restaurant of low so this means that a blank bit stream of FF will cause past vital put all the outputs a well-defined state tell this make sense and all the other signals are active while there's also 1 additional signal i've called OK I have no idea of exactly where it comes from but I'm pretty sure that its use during the boot process to basically point device don't driving the outputs it'll consume less power when the chip is idle and also the French boss fight between drivers aren't fully configurable that have the device last and have not flash during the prices we want the driving signals of and necessary another thing much of the driving your firmware the the so as long as the single tidy tie out the and the rows are again not identical but it turns out we can actually do the routing using max flow which is create a source node free to the next 1 around we create a directed graph with paths for each of the legal connections to each role in Chris sink node with however many nets that 1 a dry out and it is manslaughter out everything the but so as it
turns out the structure is pretty much that I do have a full schematic this is not it I've had a for K the bottom 2 art make it fit on the slides but we've got a single configuration that we got the law which eventually draws the neck of logic and because that's really how it's functioning upper that goes to a single P must polling high that and must polling lower then got much in copies your mountaintop mines so those are the actual hex codes for in the bitstream for selecting that the current time so we make that role be set to a zeroth wanted to write to this is sort this out on it and I do also have a table in the source right tools that includes all the Mac settings for the slices so now can fully control the rank matrix but this 1 less than we know the ordering of the inputs we do we know how to select 1 of the input and write something we don't know which input is actually hooked up to which thing on this box don't have 30 again the are there were a couple of options are considered but it turned out that 1 the simplest wiser to make a few educated guesses about how things were structure so for example all of the macro cells in function block 1 they're flip-flops are probably in a continuous or in the past we don't know where the class the arbiter probably contiguous all of the input pin for function block to are probably contiguous as well so let's figure OK i the inputs the function what you are probably in the US tell is not really that many orderings for the global input pin and these 4 sets of inputs but this right during which 1 works so I went to campus optimal focus landing in drill a few holes in the insulator over some wires and lay down some nice gigantic 20 microns where programs and then I just started driving signals on the each match up here because I'm using is just a counter going from a to megahertz input dividing down to i believe 8 2 hertz LED and then I have 1 signal for that so for set is on a specific flip-flopped of constrained it was like OK here function block to macrocell 5 so but fate if we probe function but to macrocell 5 I have put this thing on funding what to matters of fighters before hearts and without program a signal is therefore square wave 0 yes is right because right so don't take too long to figure out the actual bit watering I
so here's the basic somebody actual layout got gates on metal 1 polymethyl 2 is vertical routing at trent deadlines OK then part of rat in M 3 then M 4 is the input because it turns out the actual ordering is a GP function what 1 global input GPS function what to the flip-flops left right so now we know pretty much enough to configure the POI unfortunate there's more to it because it turns out all of the product terms or nearly all of them are dual purpose they can be used a general-purpose logic you can feed into your eyes but each of the 16 macrocells also has 3 product terms have separate dedicated connections to it the use use for set reset clock enable tho cover some as the next slide form on a special connection the entire function block and the last 4 or they can tell from reading datasheet they have no special purpose but then she does tell us the that it has roughly what they do they don't tell us which it is suspected that allow the so I will just briefly cover what these terms are so we've got pop function block we have a local clock that we can use so we don't wanna waste global clock resources for the whole chip but we also want have a perfect flat clock because they're both increased you would and use a lot hard terms that we really don't need then we have the kids and research and dedicated output enables than per macro-cell dot product terms a B and C very much of a mess so far terminate 1 of several legal sources for set reset there's also a a of the control term said recently used instead of the 1 others product B which is 1 of several possible sources for the i above frappe enable and product term which can be used for a couple things you've got a clock enable its use as a clerk if you want a per macro-cell clock prefer just 1 flip-flopping you're clocking off 1 we're clock nothing else but you can use that at all Sadruddin actually a but turns out that this is quite a conventional CPR the structure the output of the POI is x sort of productivity or complement or constant 1 constant 0 this is a silent private optimization that is again document in that issue but they don't tell you how to configure industry and the tension here is to try to do efficient got x or you don't have to the do not x or y are not not x and y or not y and x teachers use the effort the question is OK that 56 six-party terms which 1 of these days but well turns out we can configure POI as much as you want we already know how to configure that reversed all that for that we figure the global routing as much as you want there and it's fairly easy to generate a bitstream that is known to use part terms the from the 2 all all we have to do is you emphasize something with an acts on it or terms that this action optimization little if you are not using the war right the so we just have y equals a and B it turns out that if we set the the number 2 acyclicity alright happened constant 0 and then we have or about with apart from saying we no longer have or a or prokopec she she about 500 because 2nd all the propagation y by doing this the compiler trying to optimize the does this by default for not using your therefore were just by creating an equation like y equals x in your source code you can trivially produce the best remedies apart from so all we have to do is say OK let's make some with users product from the will span these black max of x into all the outputs and to start the inputs that I've product and those from the head all turns out it is your base term 3 and plus 10 unfortunately I have things called customers that kept me from working on this product as much as I would like so I never actually figured out what product terms a and B are I'm pretty sure that the reader just above or just below par term say I don't know which the control turns out if you remember Potter C is 3 plus 10 so the low number 10 None of these terms are actually being used for anything up in PKB saved to the control terms are probably among the 1st couple terms but again have not actually had time to figure out which is which I 1 more wrinkle is that there's some global configuration that's so here member we've got the structure guy on the left and the right of the function blocks the topic that will run about that already so that pretty much makes a big and mill that there's wasted space but note that wasted there's 22 single configuration that's the middle but I I we know 3 of them control of the VT for the I O banks of the device has to while banks but the original X c 2 c 3 tho only had 1 I have to the public the they are now bitstream compatible with see 32 but as for more integration that's this the way they did this is they have 1 global that if this bit is set then it will use the old school actually 32 configurations and the newer ones are 2 additional per bank that to outages just that that was a and with with the golden the bank if you were raised to device and you don't program the last couple of use as then you will get the default FCTC 30 behave to behavior in which you've got 1 IOC for everything if you're compiling the 32 I now you said that has a constant 1 leave the other 1 set the way want and now you get the ability to set the the free-thinking upon which effective then there are other miscellaneous things that are there are the comments in the bitstream world about helper for this so I know there's 3 basic in Figure global talking to listed in the rubble set reset 8 configure will lap enable I do not actually know what the coding these bits are this point so there's still some work in progress going on by I now I will get to the actual tossing them for producing i've called that lim crow bar after the flying crowbar project on so come prime the goal is to in general produce open-source tools of logic I don't see a lot of refactoring tweaking it does not do nearly everything that I want it only supported to 32 I right now I we can scale to other devices the 64 cell device is pretty much just a scale of the 32 wipe the captain of look the top layer it just a 2 by 2 grid function graphs instead of 1 by to the wrong 40 routing fabric will have to be decoded separately that should take you on the on real that's why the decoder but again it shouldn't take too long I colleges force that on and the max of integration everything else is pretty much exactly thank the larger devices the 128 to 6 384 and 512 and some additional features to macrocell so addition to figuring out the act around stuff I will actually have to do some additional version of the matter so logic so that means a work in progress but that is a goal of the library is BSD license you can read the code from there I do not recommend the use in its current state it's more of something to look at and understand how that shit works and maybe in a couple of months so has some effect is stable but you can actually use I but I do have a quick usage example
here so we have a bunch of I O pins we select PET 6 38 37 whatever then we get the I these for each of these we can now select the I standards are somehow put enables were knocking used termination run out maturities Robertson Maxell but I was able to figure out I There are still quite a few clocking remain unknown at this point and then there's quite more going on the this I can pull off all source anybody's interested in looking at it but the end result is that I can go from an in-memory technology map netlist creative like this as first-class objects I can in place around out there produce about this string plastic device that work I can also go the other way around I can go from a program device I can read out that straight I can to ask your schematic in a concerted inside Verilog as long as you're not using a feature the having yet reverse the functionality of of the remaining thing in the forward toolchain is I would like to integrate with you for synthesis take the output of Oasys do technology mapping on that and feed into limp robot at that point I will have a full Verilog to bitstream torturing I'm not quite there yet most coming close the other tools made of a
view of use plan ahead from Xilinx I called mind very inventively as planned for the flanker for so it is a floor planner in physical layout of your I currently only have rendering for the entering the global routing the or a I do have the functionality reversed have not written the code actually tried but I can actually open up a bit stream and look at the individual settings in the end or I the macrocells cells also i've over some of the that's not all of them but I have not yet written code actually render the appearance of as you know check boxes for OK this is Mitch's enable the odd the terminator is enabled the st status well and so on so those or in the west with this point the configuration better known you can ask them from the crow bar you cannot do so the going I said here's 1 more view of St
plants showing edges to map the product terms shrink down to small columns you can see the full of a routing matrix here of this is the actual the apparent that doesn't show up as well the projector again as I would liked but you can see the actual argmax much bits in there when you zoom in in the tool you can actually see it the signals coming in from the global routing going out through these years into the and but of i before you get the demo I do wanna thank god donot pasta from so prime a giant this I he did most of the large-scale optical imagery for this he also did at the gas etching which is very helpful I then rate governed by bright-colored RPI run the material science lab the cleaner respective ways but they were quite helpful when it came time to getting access to the election microscope in the clear and so on on as far as I know I am the only computer science to their level ran election steps but I did actually get access to the clean your room the Sandy fed I was trained on all of them and they were quite helpful and then the sole Kontron team in general was by handy when it came to just getting feedback and sadly checking in does this look OK to you or I I can't quite make out what this connects to do you have any idea is or just sharing process suggestions OK worse in ways I can give this has to be little more even something like that so now that's had time to get to compare them about but tell
but at the I have my J. tidying connector right here this is my dad bought it has a small STI 232 on their them using for j tag the and so when that bill is I'm going to create a bitstream using the crow bar that will act as a NOT gate and a buffer and then I'm going to that bank inputs going there with the STI chip and we should see 1 0 then the other and the lighting up or but but as he has say red here these ideas of going back and forth the so if we stroll through the output here of the tool of it is you start out initializing we connect to the j catching the OK there is a device 1 note the equal runners are the only device they're actually look that in which v J. had device idea actually includes the package as well as a device so what this means that the best you got a few somewhere the died it says obesity Q of P is a beach yet I've never seen that any of rights every other LPGA American for anything I've used that the device idea was the same for different packages the but anyway so we confirm OK yes the SEI chip has details on it we generate a net list we figure out the actual function
block and that cell locations of the I O buffers for using that runs the theater of
rule banks macrocells function blocks
does global routing repeats the other function blocked as well running for that and firms OK fitting is complete took 62 .
620 microseconds if any of you have users of compilers they're not that fast I then we finished and ring the bitstream that's kind of slow right now to couple MS I'm pretty sure I can optimize this I might use the debug of profound turned on so to realize lot faster faction pumped optimizations then we have to configure
device we therefore OK I guess we are on the right device who got the right number fuses we erase it make sure it's quite and then we just that bang the out as 0 and 1 and it confirms
that the rule that going back to the of the object does have about we expect 1 of MIT's X was Miss X prime the tell us now let's see what happens if we actually want to reverse solemnly civic and up the font size I can people read this and getting no but
but but but but but the mother
at yeah but so this is what happens when we have to try dump into that this is not the same bitstream is a different 1 is picked at random i had sitting around that we have a nice ascaris your schematic showing all the inputs coming in from the POI are coming from the global routing each of these represents 1 connection that's 1 configuration the the traffic is something we have the x and the X bar outputs the nice being created then 56 Protestants the or got out that's going to have a sword right of this table looks a lot nicer activity pay when you find it quite so big but I do actually have details on OK or this output is voting this was configured in it but this input this in but i is now open here somewhere probably and the other 1
Our from a scroll asset so now on the output we have the idea of standard turns out he knows about a dozen have symmetry and pick from at synthesis time in the bathroom is only 2 there is the high voltage and the low voltage so there's 121 . 5 and there's a high voltage so my guess is what this does internally I have not fully reverse-engineered all the I O drivers arbitrary but what I'm pretty sure does is it selects single threshold or another and in the 1st in a data set in the there's about 4 or 5 different set a drivers knee for fast and slow different voltages and so on play I have acted on T and cross-sections merit measure the TV transistor it's like that wasn't a service and then without a bunch of stuff here for the global clock much global set reset months everything all this remains I know I know roughly are I know which bits these are I don't know what the functionality of those sets and then what is the fun part of actually switch to the air test file this 1 has a simple artist safe adder the which will make a lot more since the 32 bit error so the other 1 don't here's RT up but is pretty much the assembly language of our Lord this is a direct analog of the actual POI structure right not attempted to figure out any higher-level structure this is actually equivalent have not tested sizing it I I have to just about last nite here on Locke equals that should be quoted so right now the output as formatted what's emphasized that we fairly trivial effects that capture synthesizing work and I thought about 2 is not really the in there so I'm now before we get the questions adjusting to jump up to a sea plan and show you guys what the actual structure looks like in the 4 planner
it's not about the strongest thing is really sensitive I actually go just 1 click in this guy that probably enough she is you have got initially were bypassing all the inputs for around the 1st cover rose coming off the glove around and then dance here about 1 signal coming in and it's hooked up over here too accident over here X prime goes into the and gate then we go down there and so on to you actually have a full physical layout here at this point
discourager back to the slides for just 1 or 2 more before we get back to and but
the the so what remains as far as future work
we still have to figure out the last the special product terms there still are about 6 or 8 bits in each map cell that we and I get figure out the functionality of a still some on the global devices undergo global that's edited mention there is a little bit more I need to do is pour la devices I started toolchain I'd want to do more work and the compiler my long term goal of this work is to integrate some of this reversing project and so on and create the eye for hardware I would like to yield to go from this for cool run abysses part 6 bitstreams for abstracts foreign everything and at both match up to a device mapped out advice depended technology META-NET less abstract enough to independent something more along the lines of L opt for hardware and then do higher level analysis on that fear OK this combination of X in Marx's looks like a pretty to enter this combination of stuff looks like a 10 to 1 box this combination of stuff will support embrace I would like to visit you have something long was a flock here I do know that a subgraph isomorphism is NP-complete I don't know if I can make a randomized approximate governments fast enough that remains a topic for future work but is on the wish whether is possible or not yeah does this point the questions Chino yesterday and thank again and questions