From Silicon to Compiler
Formal Metadata
Title 
From Silicon to Compiler

Subtitle 
ReverseEngineering the CoolRunnerII Bitstream Format

Title of Series  
Part Number 
18

Number of Parts 
18

Author 

License 
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. 
Identifiers 

Publisher 

Release Date 
2015

Language 
English

Content Metadata
Subject Area  
Abstract 
Programmable logic devices have historically been locked up behind proprietary vendor toolchains and undocumented firmware formats, preventing the creation of a thirdparty compiler or decompiler. While the vendor typically prohibits reverse engineering of their software in the license agreement, no such ban applies to the silicon. Given the choice between REing gigabytes of spaghetti code and looking at clean, regular die layout, the choice is clear. This talk describes my reverse engineering of the Xilinx XC2C32A, a 180nm 32macrocell CPLD, at the silicon level and my progress toward a fully opensource toolchain (compiler, decompiler, and floorplanner) for the device. A live demonstration of firmware generated by my tools running on actual hardware is included.

00:00
Slide rule
Presentation of a group
Digital electronics
Code
Multiplication sign
Tape drive
Function (mathematics)
Black box
Traverse (surveying)
Number
Web 2.0
Helmholtz decomposition
Logic programming
Energy level
Endliche Modelltheorie
Computerassisted translation
Firmware
Computing platform
Computer architecture
Social class
Demo (music)
File format
Block (periodic table)
Software developer
Mathematical analysis
Electronic mailing list
Bit
Directory service
Compiler
Process (computing)
Software
Block diagram
Drill commands
Logic
Smartphone
Whiteboard
Quicksort
Musical ensemble
Table (information)
Window
Reverse engineering
Surjective function
03:41
Axiom of choice
Open source
Execution unit
Online help
Food energy
Medical imaging
Spherical cap
Singleprecision floatingpoint format
Logic programming
Energy level
Computer architecture
Exception handling
Enterprise architecture
Block (periodic table)
File format
Binary code
Bit
Line (geometry)
Compiler
Configuration space
Natural language
Bijection
Pattern language
Quicksort
Reverse engineering
05:40
Computer program
Digital electronics
Multiplication sign
View (database)
Canonical ensemble
Product (business)
Subset
Number
Prime ideal
Propagator
Term (mathematics)
Natural number
Operator (mathematics)
Representation (politics)
Nichtlineares Gleichungssystem
Series (mathematics)
Data structure
Logic gate
Form (programming)
Computer architecture
Identity management
Sequence
Visualization (computer graphics)
Network topology
output
Summierbarkeit
Right angle
07:36
Computer program
Building
Evelyn Pinching
Video projector
Multiplication sign
Function (mathematics)
Field (computer science)
Product (business)
Power (physics)
Number
Quadratic equation
Medical imaging
Mathematics
Term (mathematics)
Cuboid
Arrow of time
Series (mathematics)
Social class
Personal identification number
Covering space
Scaling (geometry)
Block (periodic table)
Electronic mailing list
Flipflop (electronics)
Vector space
output
Finitestate machine
Pattern language
Right angle
Reverse engineering
10:29
Functional (mathematics)
Shift register
Function (mathematics)
Mereology
Theory
Power (physics)
Heegaard splitting
Term (mathematics)
Semiconductor memory
Contrast (vision)
Data structure
Logic gate
Form (programming)
Capability Maturity Model
Area
Pairwise comparison
Block (periodic table)
Cellular automaton
Data storage device
Total S.A.
Flipflop (electronics)
Bit
Process (computing)
Symmetry (physics)
Logic
Configuration space
output
Right angle
12:37
Computer program
Computer file
State of matter
Correspondence (mathematics)
Adaptive behavior
Flash memory
GEDCOM
Heat transfer
Function (mathematics)
Mereology
Metadata
Bit rate
Semiconductor memory
Analogy
Compiler
Cuboid
Convex set
Series (mathematics)
Data structure
Booting
Address space
Physical system
Shift operator
File format
Block (periodic table)
Cellular automaton
Mathematical analysis
Bit
Line (geometry)
Personal digital assistant
Logic
Configuration space
output
Right angle
Quicksort
Figurate number
Row (database)
Relief
15:36
Area
Gray code
Cliquewidth
Block (periodic table)
Code
Cellular automaton
Multiplication sign
Flash memory
1 (number)
Bit
Total S.A.
Insertion loss
Cartesian coordinate system
Product (business)
Logic
Term (mathematics)
Configuration space
output
Right angle
Address space
Row (database)
17:03
Greatest element
Game controller
Functional (mathematics)
Multiplication sign
Correspondence (mathematics)
Function (mathematics)
Heat transfer
Mereology
Number
Subset
Prime ideal
Fraction (mathematics)
Term (mathematics)
Semiconductor memory
Network socket
Programmable readonly memory
Selectivity (electronic)
Data structure
Information security
Address space
Block (periodic table)
Cellular automaton
Structural load
Bit
Line (geometry)
Type theory
Personal digital assistant
Logic
Buffer solution
output
Configuration space
Right angle
Figurate number
Quicksort
Row (database)
19:57
Block (periodic table)
Cellular automaton
Image resolution
Multiplication sign
Inverse element
Theory
Medical imaging
Bit rate
Symmetry (physics)
Vector space
Buffer solution
Cuboid
Quicksort
Inverter (logic gate)
21:26
Game controller
Group action
Divisor
Block (periodic table)
Correspondence (mathematics)
Cellular automaton
Bit
Function (mathematics)
Mereology
Frame problem
Power (physics)
Subset
Sparse matrix
Logic
Matrix (mathematics)
Right angle
Quicksort
Inverter (logic gate)
Set theory
Resultant
Row (database)
23:13
Computer program
State of matter
Code
Multiplication sign
Function (mathematics)
Computer configuration
Videoconferencing
Matrix (mathematics)
Cuboid
Social class
Personal identification number
Boss Corporation
Block (periodic table)
Maxima and minima
Bit
Lattice (order)
Connected space
Data mining
Message passing
Hexagon
Drill commands
Order (biology)
output
Configuration space
Right angle
Ranking
Quicksort
Writing
Directed graph
Row (database)
Point (geometry)
Dataflow
Slide rule
Implementation
Functional (mathematics)
Open source
.NET Framework
Streaming media
Power (physics)
Program slicing
Energy level
Data structure
Firmware
Booting
Macro (computer science)
Set theory
Mathematical optimization
Addition
Focus (optics)
Graph (mathematics)
Matching (graph theory)
Information
Cellular automaton
Physical law
Flipflop (electronics)
Line (geometry)
Logic
Table (information)
27:33
NPhard
Computer program
Group action
INTEGRAL
State of matter
Code
Multiplication sign
Source code
1 (number)
Cyberspace
Function (mathematics)
Disk readandwrite head
Mereology
Logic synthesis
Logic gate
Personal identification number
Mapping
Block (periodic table)
Sound effect
Maxima and minima
Bit
Entire function
Connected space
Radical (chemistry)
output
Configuration space
Right angle
Figurate number
Quicksort
Arithmetic progression
Resultant
Point (geometry)
Slide rule
Game controller
Functional (mathematics)
Open source
Complementarity
Number
Product (business)
Revision control
Propagator
Term (mathematics)
Robotics
String (computer science)
Data structure
Mathematical optimization
Form (programming)
Capability Maturity Model
Default (computer science)
Addition
Standard deviation
Scaling (geometry)
Cellular automaton
Projective plane
Flipflop (electronics)
Compiler
Logic
Graph of a function
Object (grammar)
Library (computing)
35:44
Point (geometry)
Functional (mathematics)
Video projector
Code
View (database)
Multiplication sign
Streaming media
Product (business)
Bit rate
Term (mathematics)
Matrix (mathematics)
Cuboid
Energy level
Set theory
God
Demo (music)
Cellular automaton
Feedback
Planning
Physicalism
Bit
Radical (chemistry)
Process (computing)
Computer science
Configuration space
38:03
Functional (mathematics)
Uniform resource locator
Block (periodic table)
Cellular automaton
Buffer solution
Electronic mailing list
output
Right angle
Function (mathematics)
Inverter (logic gate)
39:39
Functional (mathematics)
Block (periodic table)
Compiler
Fitness function
Right angle
Rule of inference
Mathematical optimization
Number
40:19
Computer animation
output
Configuration space
Right angle
Object (grammar)
Function (mathematics)
Table (information)
Rule of inference
Connected space
41:46
Standard deviation
Functional (mathematics)
Cross section (physics)
Assembly language
Computer file
Multiplication sign
Bit error rate
Sound effect
Planning
Bit
Function (mathematics)
Mereology
Logic synthesis
Thresholding (image processing)
Measurement
Web service
Computer animation
Symmetry (physics)
Different (Kate Ryan album)
Analogy
Software testing
Right angle
Data structure
Set theory
43:42
Covering space
Point (geometry)
Slide rule
Logic gate
44:32
Point (geometry)
Functional (mathematics)
Flock (web browser)
Matching (graph theory)
Mapping
Cellular automaton
Projective plane
Combinational logic
Mathematical analysis
Bit
Subgraph
Line (geometry)
Mereology
Compiler
Product (business)
Isomorphieklasse
Term (mathematics)
Computer hardware
Energy level
Cuboid
Figurate number
Abstraction
46:15
Computer animation
00:01
Hong who uh who were there a thank you but if the issue here is a half half a year if you know that good but my talk is called from so counterpart and compiler and it's pretty much back they're
00:24
going to start off with what we're doing and why we're doing this the little bit of architectural background on program logic for those of you who have not done work on programmable logic for and I jump to a block diagram of device will start with the a highlevel overview of the solar continental drill down into more interesting stuff down to transistorlevel and gatelevel circuit analysis at the end I do have a live demo of firmware produced by my tools running alive silicon deadly there are no cute cat pictures this presentation sorry but little background about me I pretty much like to build and break everything I will do web stuff they pay me enough I mostly like to live down in lowlevel a embittering 0 0 firmware lowlevel board design RTL design and now getting down at the transistor level I just finished up my PhD a few weeks ago during that time I designed and created what I believe is the 1st ever college class on semiconductor versus an airing I tried quite hard to find another 1 that I could borrow notes the slides from as far as I can tell nonexistent but and I am also obviously a significant that contributors to a conference of the the the guys walking around with the captured their shirts say hi the this is is I have only been with active since January our models were presented on before I joined the company but they've supported me continuing the work so I diagnose them about this later so as far as what Raksi trying to do and why probable logic is really everywhere these days I you may not know it but a lot of especially high end networking AVE all sorts of stuff has probable logic in it because 8 6 these days the tape courses so high especially for leading and process nodes it usually is not actually effective may customs O'Connor unless you making a huge number of their way for a smartphone or something for anything else you're probably better off running on at the Jet the and the problem is therefore black boxes nobody really knows what happens when you compile your bitstream onto advise you know OK well we got these lookup tables we've got these RAM blocks but was actually going on underneath how do I know that the archaeal I give the compiler is actually equivalent to what the actual devices behavior we don't know or how do I do development on a platform that is not x 6 or x 64 windows or 1 of very short list of Linux distros they support so I unlocked and let's say we think we found a compiler but all we just pop the codon ensemble look at note there is no decompose the bitstream so if you think it's generating that code sorry screwed and of course reverseengineering this is recon the vendors want people to think Bitstream reversing is hard and they advertise their closedsource proprietary format is being impossible to traverses there it's not tell as far as the methodology here I just decided earlier today to take a look at the size of the directory I had installed the Xilinx tools 18 gigabytes ever let you but I don't have time to look through 18 gigabytes of Zephaniah and I there are much better thing they can spend my time on N plus the license agreement says I'm not supposed reverseengineering the software Evergreen output sort of
03:41
tell open source that is necessary but the so this is our
03:47
target is the Xilinx X C 2 C 32 way I chose it for a bunch of reasons 1 is that is very cheap for about a dollar 15 each undertaking in single units as of our couple weeks ago I think the place just 1 of to adopt something but they're still cheap enough you can afford to kill large numbers of and that's that you have your choice of a nice big Q of the package that's easy enhances our has plenty of plastic around the patterns with the cap while keeping by wires impact or you can do it should still be due to us that it is 1 of the things passenger revolve plant and take a look at the sort of it's a nice friendly 118 enterprises are for metal you and say so you can actually read out most of the upper middle layers optically you don't have to bother going into outer microscope except for the lowest layers so that helps it's a lot easier to use after to microscope this senator like workers should a lot images quickly the bitstream is also not that be around 12 k so there's not really that much they're reverse and it also fairly simple architecture you not gonna find block RAM then embedded on course market stuff like that it's just pure program logic and the vendor tools are free which always helps establishing so the bitstream format for this thing at a high level the help of the compiler is a genetic problem violence basically the equal of i have access but instead of being axisaligned it's binary lines and you on zeros so this is the exact data like it's written to the chip not in the actual order of that later but is there is a onetoone correspondence between these 1 zeros and the configuration that's on devices and the nice thing is that as you can see on the right side here the Bitstream generated by the Xilinx tools if you're so inclined does have comets turns out they're not very useful because OK blockbusters EIA well what is the look at energy there is not 1 references OK but really help me much so let's give a little bit
05:42
into the architecture of what is the p of the actually is so the of any digital equations can be expressed in a form known as sum of products that it's a canonical representation of visual equations in which you take a series of terms you that wise on them together either the input or its complement you just read the complement as coming into your driver license so you've got a and B and C or C and the data so you can express any arbitrary digital equation this why to so look at how life's an accepted you
06:15
that's so if we have a large say 32 input and get and we have a marks the input to each gate or at the at the gym the gate that lets we select between a constant 1 or the input my circuit I can effectively mocks any subset of these imports and just leave the others 1 which is the identity for the why the and operation so I can aren't together any subset of the possible inputs to this and get the same thing can be done for or obviously have you 0 I the identity of the white or the what this means is you can create a date that has a huge number of inputs and I can pick any subset from both at 1 time of matter so this leads to a natural structure for program a lot right you make a great of gates with through 1 marks each input you've got inputs coming in and out of going down and then just have a series of the dates in sequence standing stuff together so this is actually render from I 2 weeks you've got the input coming in and we select either X or X prime in that with the coming in from overhead taking up of that now here we skip the and gate this is morphological views or not actually showing the actual martyrs chip is more of a schematic view of what the circuit logically does I'm also rendering it as a cascaded sequence of gates and and in practice the actual implantation of silicon is usually a tree because nobody wants war and propagation delay when you could have a log instead so if
07:38
we take this and we want to build a full provides price what we do is we take a bunch of signals coming from our registers that's the worst thing we take a bunch of signal for input patterns then we feed these into 1 on the right we invert them so we have 2 times how many inputs we take em product terms out of that the product and then go to probable or at the output of that gives us with a score are outputs then we feed those either into 1 output pin or into a flipflop when the state machines or something like that the so this is really all it takes to build a simple program a lot devices are S P the problem is as POV scale poorly you end up having art quadratic scaling with the number of inputs nobody wanted I'm still in size scales quadratically with the amount of work it can build a terrible so how can improve this well turns out we can make a great of small of these I so I did not show up as well as I hope in the projector but we got 1 s of the block here about 1 here across 4 switch the metal so now we can create a bunch of outputs from 1 as the of the bunch up from the other and then just feed them all into 1 big rotten fabric pick out whichever signals I'm interested in passing on this half the chip or that have the chip and then feed them into the field don't this is the building so now let's look at the specific CPLV that wouldn't targeting and so there are 32 GPI opens which are full input output plus 1 input only 10 I'm pretty sure this is because they intend the pack reside in a 44 pinch your P package all the other stuff for j tag power and 32 GPI arrows use 43 pins the 1 have 1 and the so they just through 1 extra signal and the well rounded you can't drive and just serves as an extra input and then the remainder of the device is to function box which are basically asking all these each 1 has 16 GPI opens 16 flipflops of the 65 signal global routing so 32 I O's 32 flipflops plus 1 input only 65 we pick 40 of those the those 20 vector and arriving at 56 by 16 or I this is all about that was not document it is OK we can make technology mathemat list we know OK I have these inputs and together these outputs were together now how actually make that should do what I want but
09:59
don't panic Amalekites I am not going to talk about the capping imaging this has been beaten to death a talks like recon last year and I think the year before that and probably the year before that so we're not gonna cover that there's a lot of this stuff up on silicon prawn there's out during the lecture notes from the class I taught RPI last year there's a lot of material there but this talk is about reverseengineering not all sample preparation were not going to be teaching you how it works were teach you how to actually reverses you this specific device
10:30
so here is the metal for a review device we can see that there is a roughly a left right symmetry is not exactly the here something that's not near here but up here it looks like devices pre symmetrical so 1st impression we probably got the 2 function blocks leftrightsymmetric on devices the let's
10:49
go down a little bit after we've actual of all of the metal and power wires were now looking at the input layer this is after a process known as which stains ptype doping either brown or upgrades and election microscope are mainly though it's useful because it provides contrast theories of gates tell you you've got pairwise maturate here and here those are probably function part to something down the middle that's probably around then there's a large memory right here pretty obviously the from where the bitstream started then often we actually follow the bond wires from the guy out of his the package you can see these are the j tag pins TI TMS TCK I believe toptobottom BTL pain is right there so we can conclude that the jĆ”tĆ©k shift register probably runs left to right across this configuration area and somehow allowed him to write this E prime here this also few small pairs as 1 of there is 1 of their I think is 1 of the upper left is a total of 6 I have not yet figured out what the still it was necessary to reverse the portions advice and he needed so that remains future work it so I think closer look at the function block so we know the 16 matter cells each megaseller contains 1 flipflop in stores the output of 1 term of your A. and has a little bit more glue logic and the fact that we're on so there is 61 symmetry here 68 and copies just by looking at this we can be pretty sure we're looking at the matter then we can be some structures appear here this looks metric this looks symmetric this looks measured by different so as it turns out the end right not actually 1 solid blocking and or a split have that got 20 signals here 20 signals here collectively that forms for E. then they and the outputs of those 2 individual blocks together that gives you are part of terms and then that goes in the right here in out cells so now let's take a quick look
12:40
at a configuration that structure before we dive into detail by analysis so the programming documentation does talk a little bit about how the devices put together it turns out that even though the GED format is supposed to be something you can actually just feed to the proper remote device that's not the case here the bit watering the GED file is actually a virtual addressing in which they abstract away all of the quirks the Basilicata so for example all the and right this in all the or edits on the same ordering GED file for each 1 turns out half the actually and or a box on the device are mirrored left right yeah so you actually have to do address translation before you can take a bitstream generated by their tools and flash the check what would they do sort of document this there's a big excel file they published 2 people who were making people adapters that that is just a big grid of output and input and just has a intreate cells is OK which can from the judge file goes this physical address so does really tell you much about it but together on the actual structure is 48 rose by 260 columns there's 1 extra row configuration metadata I'll get to that later ah but it stores the 7 locked it's which some of them are always 1 entirely sure what purpose they serve the remainder of the 1 of 4 Analog devices and 0 for lot ice then there's 2 Don bits which indicate OK this this stories about we have a legal firm measured shifts been fully flash and it turns out also only 258 the 260 columns are usable the remainder are what we call transfer that's which as far as I can tell you just put them as a constant 0 just indicates this relief from the accident program we a about Africa programming are there is some documentation this in from Xilinx but didn't really give enough detail figure out OK what are they doing this and the other into which we get from this that since it is the convex then f f is going to the state of memory which it is black so therefore we expect most of the memory on device is going to be active while so most things should be turned off when the data is high and 1 that as well we don't know exactly what that does that so
14:47
now think about the actual die structure so you can see that the configuration memories not actually 1 block of 11 right here 1 here 1 here 1 here and 1 hiding all the way over here and as it turns out that the size of the system to about the center of and they go directly up the corresponding logic so it's pretty obvious without the memory that configures that part of a check the data just flows straight up during the boot process so this configures the and your rate is configured overriding this figure these maps of this configures these matters I we can confirm this if we look up at the metal tool you can actually see the lines coming off a sensible far prom going up and vertically into the Orion you trace it out all the way you can actually see it right in the individual configuration as for himself are there are a few bit lines here here for example that are connected I believe metal for reasonable metal to liquid federal out 1 wire so now think with the main logic
15:38
right if we actually look at we're the airstream cells are and count how many rows high each individual block we see that the raise 20 rows tidy or a zebras high ever have 20 resigned in each macrocells 3 side to like is a pretty good idea but we know what left to right which bits in the bitstream could figure which logic just by what's physically proximate to it and we can make a pretty good guess that since we have a 2 D S from right here in the configuration area we have a 2 D flash right we probably have either positive on the bottom address this axis and Gray Code going on a little more complex than that but logically we should have the other thing going across the
16:18
so if we take a closer look at the end right we know that they're 56 product terms we know that there so 340 rows and 40 companies abroad for a total of 80 if we actually count how many from cells are will see there's 1 12 that's why 256 times till this 2 losses 20 rows each to the conclusion is that since we know the story inputs coming in from the crossfire with got to block to reach 20 rows high each row probably corresponds to 1 relevant and then we know that the width of the right is double 56 just probably 2 bits per product terms 1 selects x ones left not actor maybe some survivor code and it turned out is 1 coated I just figure this out by experiment was pretty easy just try 1 see if it works not try something else is only 4 possibilities the so the or
17:05
a turns out to be 56 and terms 16 outputs if we count configured to still want all that's why at the same coming up at the prom but it's only 8 rows time and is only 56 inputs to see or a is not have an X and X prime and put my conclusion is that we still have a onehot selecting for 1 particular input but we do have to actually raises the or a interleaving want figuration I got a little tinkering with pretty simple to figure out the actual that ordering to health include the macrocells
17:34
cells there are 27 configuration that SuperMac self looking at Cedars 160 atom so 160 as from cell and so on there's a 9 by 3 great it turns out this number 3 grid does not actually control 1 macrocell the bottom 2 rows of this RAM control the bottom 1 this goes to the other 1 this awful that Mr. look at the socket but it turns out that yes it is 27 so yes it is not by 3 no the number 3 structures that I cannot correspondence please so this makes intuitive sense the promise of bottom I use while case and that's why that we know 1 of Mr. transfer so the other 90 bits go directly up to here I figured out a significant fraction the functionality not quite or there's still some parking stuff from a unsure on right now are these also configure IL buffering and stuff like that so there are lines coming out from the east to the side of ice and stronger buffers now I will as as I jump and the security bits we know there are 19 done bits and lock bit somewhere on the devices we know that the physical address of these is in the right hand man so memory it's in the top row there's 960 from cells right here don't appear to be hooked up and he actual logic or I have not actually tried fitting these I don't know for certain that these allotments but it is pretty obvious especially when we know of those 9 bits for them have to be held load a lot advice there's a fourinput nor it right here and the right there bimodal happen like that 1 and now to also now we get to the global routing so we know between the left and the right halves of the end rise there is something we know that it is 20 configuration the type 3 chaff we know 16 bits wide we know anything more than that the day she has about 2 sentences part global around tell there is given no idea whatsoever how it works and what we do now is that of those 65 signals coming in we know that even go to the left function block 20 forever right both subsets may may not have any relationship and we know since 16 that's why we probably have a bit selecting workers left a but like workers right but it is questionable how do you make is 65 to 1 March of 8 that's if you just had 0 255 you'd only 7 that it 155 it's to what sort of strange perverse coder using here the data she is of exactly know how well done
19:59
very so if we jump in the election microscope and take a look at the implant whereas the datasets at you see here's here short raise those are pchannel these are and telling you that it's possible to lower the rate theories are the individual channels the fats this is zoomed out the original image is a lot higher resolution but I can't really fit the practice cell I This is the amplifier liar what because the was that the some sort of symmetry going on both horizontally and vertically if we jump of the
20:26
for we see that there are 6 small busses of 11 signals each the rightmost 10 by 5 times 11 plus 10 was that anybody who divide to embarrass 65 signals on the well routing I think without them but so I spent quite a while instigate vectorizing this my automated tools are not quite as well developed as I had hoped so it did take me a while but I do we now have a full vectorization of the entire global routing matrix from implant where all with for of the little gray boxes are not actual laying out the standard cell outlines so that any other inverter here another inverter here another inverse here and so on so we can see it there 6 identical blocks going left to right here there's 2 we're blocks over there that are identical to each other at 1st glance but not identical to rats look at this study interested so those big drivers any side are pretty clear the buffer driving POI was gonna stand out 112 we need a fairly big buffer are there is a believe 20 some
21:28
fingers on this inverter so that looks like I think driver i it's actually a threestage inverted because these lowlevel stuff in the probable logic doesn't actually have the drive current implies that much bigger Parsons so they actually have a threestate driver inverting it inverted again increase the current inverting it again increase the current once again taxi driving up of those 8 blocks each 1 contains 2 s from cells so it seems to make sense that we've got the blocks we know that we have 8 bits controlling the best output we have a pitch control right up so we probably have 1 bit per block controlling what was left and when the control workers right then the 6 identical blocking the 6 groups of of wiring up a metaphor tell it's probably some sort of corresponds going on here again we don't know what it is but we know of some relationship if so now let's take a look up metal frame so we can see there's power ground routing here will order all that for now but there are a set of 1 2 3 4 5 6 the has part about what's more interestingly there are only 6 years per row but they're not on the same plate tell there it is
22:37
if we actually traced out for all the rows each 1 of those the as is under each and our 4 groups there is exactly 1 via in each row column intersection but not the same place so now I finally realize what is going on the right matrix is not actually for crossbar is a sparse crossbar bond which of the 4 year olds each RAU can only pick 2 sets of the 65 and because if a subset for each 1 and the end result is that using all the subsets you can select any unique subset of 40 of the 65 but you don't actually need of automatic divide crossbar the so now let's take
23:15
a look at how the information works we have a pretty good how just how it is structured a high level we don't actually know the implementation to know I do apologize for some the missile alignment this was a quick tracing and so not all of the videos and the where's line at exactly this is not meant as a meeting called chip from Smith is something I can figure how works from so you can see there's a big passtransistor here we've got the signal coming in from the upper layer but here and here and then we've got I output the marks here there's disaster so that goes through a twoinput NOR gate and drive that so if we put this all together what we see is that each row is indeed an 8 1 try the box much so there's 1 pass transistor that's about each of the 6 possible outputs for out then there's a single discrete en masse nesting of the 3 p master less we drive constant 1 a constant 0 as well I turns out that the driver constant was active high all restaurant of low so this means that a blank bit stream of FF will cause past vital put all the outputs a welldefined state tell this make sense and all the other signals are active while there's also 1 additional signal i've called OK I have no idea of exactly where it comes from but I'm pretty sure that its use during the boot process to basically point device don't driving the outputs it'll consume less power when the chip is idle and also the French boss fight between drivers aren't fully configurable that have the device last and have not flash during the prices we want the driving signals of and necessary another thing much of the driving your firmware the the so as long as the single tidy tie out the and the rows are again not identical but it turns out we can actually do the routing using max flow which is create a source node free to the next 1 around we create a directed graph with paths for each of the legal connections to each role in Chris sink node with however many nets that 1 a dry out and it is manslaughter out everything the but so as it
25:10
turns out the structure is pretty much that I do have a full schematic this is not it I've had a for K the bottom 2 art make it fit on the slides but we've got a single configuration that we got the law which eventually draws the neck of logic and because that's really how it's functioning upper that goes to a single P must polling high that and must polling lower then got much in copies your mountaintop mines so those are the actual hex codes for in the bitstream for selecting that the current time so we make that role be set to a zeroth wanted to write to this is sort this out on it and I do also have a table in the source right tools that includes all the Mac settings for the slices so now can fully control the rank matrix but this 1 less than we know the ordering of the inputs we do we know how to select 1 of the input and write something we don't know which input is actually hooked up to which thing on this box don't have 30 again the are there were a couple of options are considered but it turned out that 1 the simplest wiser to make a few educated guesses about how things were structure so for example all of the macro cells in function block 1 they're flipflops are probably in a continuous or in the past we don't know where the class the arbiter probably contiguous all of the input pin for function block to are probably contiguous as well so let's figure OK i the inputs the function what you are probably in the US tell is not really that many orderings for the global input pin and these 4 sets of inputs but this right during which 1 works so I went to campus optimal focus landing in drill a few holes in the insulator over some wires and lay down some nice gigantic 20 microns where programs and then I just started driving signals on the each match up here because I'm using is just a counter going from a to megahertz input dividing down to i believe 8 2 hertz LED and then I have 1 signal for that so for set is on a specific flipflopped of constrained it was like OK here function block to macrocell 5 so but fate if we probe function but to macrocell 5 I have put this thing on funding what to matters of fighters before hearts and without program a signal is therefore square wave 0 yes is right because right so don't take too long to figure out the actual bit watering I
27:34
so here's the basic somebody actual layout got gates on metal 1 polymethyl 2 is vertical routing at trent deadlines OK then part of rat in M 3 then M 4 is the input because it turns out the actual ordering is a GP function what 1 global input GPS function what to the flipflops left right so now we know pretty much enough to configure the POI unfortunate there's more to it because it turns out all of the product terms or nearly all of them are dual purpose they can be used a generalpurpose logic you can feed into your eyes but each of the 16 macrocells also has 3 product terms have separate dedicated connections to it the use use for set reset clock enable tho cover some as the next slide form on a special connection the entire function block and the last 4 or they can tell from reading datasheet they have no special purpose but then she does tell us the that it has roughly what they do they don't tell us which it is suspected that allow the so I will just briefly cover what these terms are so we've got pop function block we have a local clock that we can use so we don't wanna waste global clock resources for the whole chip but we also want have a perfect flat clock because they're both increased you would and use a lot hard terms that we really don't need then we have the kids and research and dedicated output enables than per macrocell dot product terms a B and C very much of a mess so far terminate 1 of several legal sources for set reset there's also a a of the control term said recently used instead of the 1 others product B which is 1 of several possible sources for the i above frappe enable and product term which can be used for a couple things you've got a clock enable its use as a clerk if you want a per macrocell clock prefer just 1 flipflopping you're clocking off 1 we're clock nothing else but you can use that at all Sadruddin actually a but turns out that this is quite a conventional CPR the structure the output of the POI is x sort of productivity or complement or constant 1 constant 0 this is a silent private optimization that is again document in that issue but they don't tell you how to configure industry and the tension here is to try to do efficient got x or you don't have to the do not x or y are not not x and y or not y and x teachers use the effort the question is OK that 56 sixparty terms which 1 of these days but well turns out we can configure POI as much as you want we already know how to configure that reversed all that for that we figure the global routing as much as you want there and it's fairly easy to generate a bitstream that is known to use part terms the from the 2 all all we have to do is you emphasize something with an acts on it or terms that this action optimization little if you are not using the war right the so we just have y equals a and B it turns out that if we set the the number 2 acyclicity alright happened constant 0 and then we have or about with apart from saying we no longer have or a or prokopec she she about 500 because 2nd all the propagation y by doing this the compiler trying to optimize the does this by default for not using your therefore were just by creating an equation like y equals x in your source code you can trivially produce the best remedies apart from so all we have to do is say OK let's make some with users product from the will span these black max of x into all the outputs and to start the inputs that I've product and those from the head all turns out it is your base term 3 and plus 10 unfortunately I have things called customers that kept me from working on this product as much as I would like so I never actually figured out what product terms a and B are I'm pretty sure that the reader just above or just below par term say I don't know which the control turns out if you remember Potter C is 3 plus 10 so the low number 10 None of these terms are actually being used for anything up in PKB saved to the control terms are probably among the 1st couple terms but again have not actually had time to figure out which is which I 1 more wrinkle is that there's some global configuration that's so here member we've got the structure guy on the left and the right of the function blocks the topic that will run about that already so that pretty much makes a big and mill that there's wasted space but note that wasted there's 22 single configuration that's the middle but I I we know 3 of them control of the VT for the I O banks of the device has to while banks but the original X c 2 c 3 tho only had 1 I have to the public the they are now bitstream compatible with see 32 but as for more integration that's this the way they did this is they have 1 global that if this bit is set then it will use the old school actually 32 configurations and the newer ones are 2 additional per bank that to outages just that that was a and with with the golden the bank if you were raised to device and you don't program the last couple of use as then you will get the default FCTC 30 behave to behavior in which you've got 1 IOC for everything if you're compiling the 32 I now you said that has a constant 1 leave the other 1 set the way want and now you get the ability to set the the freethinking upon which effective then there are other miscellaneous things that are there are the comments in the bitstream world about helper for this so I know there's 3 basic in Figure global talking to listed in the rubble set reset 8 configure will lap enable I do not actually know what the coding these bits are this point so there's still some work in progress going on by I now I will get to the actual tossing them for producing i've called that lim crow bar after the flying crowbar project on so come prime the goal is to in general produce opensource tools of logic I don't see a lot of refactoring tweaking it does not do nearly everything that I want it only supported to 32 I right now I we can scale to other devices the 64 cell device is pretty much just a scale of the 32 wipe the captain of look the top layer it just a 2 by 2 grid function graphs instead of 1 by to the wrong 40 routing fabric will have to be decoded separately that should take you on the on real that's why the decoder but again it shouldn't take too long I colleges force that on and the max of integration everything else is pretty much exactly thank the larger devices the 128 to 6 384 and 512 and some additional features to macrocell so addition to figuring out the act around stuff I will actually have to do some additional version of the matter so logic so that means a work in progress but that is a goal of the library is BSD license you can read the code from there I do not recommend the use in its current state it's more of something to look at and understand how that shit works and maybe in a couple of months so has some effect is stable but you can actually use I but I do have a quick usage example
34:34
here so we have a bunch of I O pins we select PET 6 38 37 whatever then we get the I these for each of these we can now select the I standards are somehow put enables were knocking used termination run out maturities Robertson Maxell but I was able to figure out I There are still quite a few clocking remain unknown at this point and then there's quite more going on the this I can pull off all source anybody's interested in looking at it but the end result is that I can go from an inmemory technology map netlist creative like this as firstclass objects I can in place around out there produce about this string plastic device that work I can also go the other way around I can go from a program device I can read out that straight I can to ask your schematic in a concerted inside Verilog as long as you're not using a feature the having yet reverse the functionality of of the remaining thing in the forward toolchain is I would like to integrate with you for synthesis take the output of Oasys do technology mapping on that and feed into limp robot at that point I will have a full Verilog to bitstream torturing I'm not quite there yet most coming close the other tools made of a
35:47
view of use plan ahead from Xilinx I called mind very inventively as planned for the flanker for so it is a floor planner in physical layout of your I currently only have rendering for the entering the global routing the or a I do have the functionality reversed have not written the code actually tried but I can actually open up a bit stream and look at the individual settings in the end or I the macrocells cells also i've over some of the that's not all of them but I have not yet written code actually render the appearance of as you know check boxes for OK this is Mitch's enable the odd the terminator is enabled the st status well and so on so those or in the west with this point the configuration better known you can ask them from the crow bar you cannot do so the going I said here's 1 more view of St
36:32
plants showing edges to map the product terms shrink down to small columns you can see the full of a routing matrix here of this is the actual the apparent that doesn't show up as well the projector again as I would liked but you can see the actual argmax much bits in there when you zoom in in the tool you can actually see it the signals coming in from the global routing going out through these years into the and but of i before you get the demo I do wanna thank god donot pasta from so prime a giant this I he did most of the largescale optical imagery for this he also did at the gas etching which is very helpful I then rate governed by brightcolored RPI run the material science lab the cleaner respective ways but they were quite helpful when it came time to getting access to the election microscope in the clear and so on on as far as I know I am the only computer science to their level ran election steps but I did actually get access to the clean your room the Sandy fed I was trained on all of them and they were quite helpful and then the sole Kontron team in general was by handy when it came to just getting feedback and sadly checking in does this look OK to you or I I can't quite make out what this connects to do you have any idea is or just sharing process suggestions OK worse in ways I can give this has to be little more even something like that so now that's had time to get to compare them about but tell
38:03
but at the I have my J. tidying connector right here this is my dad bought it has a small STI 232 on their them using for j tag the and so when that bill is I'm going to create a bitstream using the crow bar that will act as a NOT gate and a buffer and then I'm going to that bank inputs going there with the STI chip and we should see 1 0 then the other and the lighting up or but but as he has say red here these ideas of going back and forth the so if we stroll through the output here of the tool of it is you start out initializing we connect to the j catching the OK there is a device 1 note the equal runners are the only device they're actually look that in which v J. had device idea actually includes the package as well as a device so what this means that the best you got a few somewhere the died it says obesity Q of P is a beach yet I've never seen that any of rights every other LPGA American for anything I've used that the device idea was the same for different packages the but anyway so we confirm OK yes the SEI chip has details on it we generate a net list we figure out the actual function
39:34
block and that cell locations of the I O buffers for using that runs the theater of
39:40
rule banks macrocells function blocks
39:43
does global routing repeats the other function blocked as well running for that and firms OK fitting is complete took 62 .
39:51
620 microseconds if any of you have users of compilers they're not that fast I then we finished and ring the bitstream that's kind of slow right now to couple MS I'm pretty sure I can optimize this I might use the debug of profound turned on so to realize lot faster faction pumped optimizations then we have to configure
40:10
device we therefore OK I guess we are on the right device who got the right number fuses we erase it make sure it's quite and then we just that bang the out as 0 and 1 and it confirms
40:20
that the rule that going back to the of the object does have about we expect 1 of MIT's X was Miss X prime the tell us now let's see what happens if we actually want to reverse solemnly civic and up the font size I can people read this and getting no but
40:37
but but but but but the mother
40:45
at yeah but so this is what happens when we have to try dump into that this is not the same bitstream is a different 1 is picked at random i had sitting around that we have a nice ascaris your schematic showing all the inputs coming in from the POI are coming from the global routing each of these represents 1 connection that's 1 configuration the the traffic is something we have the x and the X bar outputs the nice being created then 56 Protestants the or got out that's going to have a sword right of this table looks a lot nicer activity pay when you find it quite so big but I do actually have details on OK or this output is voting this was configured in it but this input this in but i is now open here somewhere probably and the other 1
41:47
Our from a scroll asset so now on the output we have the idea of standard turns out he knows about a dozen have symmetry and pick from at synthesis time in the bathroom is only 2 there is the high voltage and the low voltage so there's 121 . 5 and there's a high voltage so my guess is what this does internally I have not fully reverseengineered all the I O drivers arbitrary but what I'm pretty sure does is it selects single threshold or another and in the 1st in a data set in the there's about 4 or 5 different set a drivers knee for fast and slow different voltages and so on play I have acted on T and crosssections merit measure the TV transistor it's like that wasn't a service and then without a bunch of stuff here for the global clock much global set reset months everything all this remains I know I know roughly are I know which bits these are I don't know what the functionality of those sets and then what is the fun part of actually switch to the air test file this 1 has a simple artist safe adder the which will make a lot more since the 32 bit error so the other 1 don't here's RT up but is pretty much the assembly language of our Lord this is a direct analog of the actual POI structure right not attempted to figure out any higherlevel structure this is actually equivalent have not tested sizing it I I have to just about last nite here on Locke equals that should be quoted so right now the output as formatted what's emphasized that we fairly trivial effects that capture synthesizing work and I thought about 2 is not really the in there so I'm now before we get the questions adjusting to jump up to a sea plan and show you guys what the actual structure looks like in the 4 planner
43:40
of
43:43
it's not about the strongest thing is really sensitive I actually go just 1 click in this guy that probably enough she is you have got initially were bypassing all the inputs for around the 1st cover rose coming off the glove around and then dance here about 1 signal coming in and it's hooked up over here too accident over here X prime goes into the and gate then we go down there and so on to you actually have a full physical layout here at this point
44:23
discourager back to the slides for just 1 or 2 more before we get back to and but
44:36
the the so what remains as far as future work
44:46
we still have to figure out the last the special product terms there still are about 6 or 8 bits in each map cell that we and I get figure out the functionality of a still some on the global devices undergo global that's edited mention there is a little bit more I need to do is pour la devices I started toolchain I'd want to do more work and the compiler my long term goal of this work is to integrate some of this reversing project and so on and create the eye for hardware I would like to yield to go from this for cool run abysses part 6 bitstreams for abstracts foreign everything and at both match up to a device mapped out advice depended technology METANET less abstract enough to independent something more along the lines of L opt for hardware and then do higher level analysis on that fear OK this combination of X in Marx's looks like a pretty to enter this combination of stuff looks like a 10 to 1 box this combination of stuff will support embrace I would like to visit you have something long was a flock here I do know that a subgraph isomorphism is NPcomplete I don't know if I can make a randomized approximate governments fast enough that remains a topic for future work but is on the wish whether is possible or not yeah does this point the questions Chino yesterday and thank again and questions
46:20
that