We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Black box reverse engineering for unknown/custom instruction sets

00:00

Formal Metadata

Title
Black box reverse engineering for unknown/custom instruction sets
Title of Series
Part Number
2
Number of Parts
20
Author
License
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Have you ever come across a firmware image for which you couldn’t find a disassembler? This talk will cover reverse-engineering techniques for extracting an instruction encoding from a raw binary with an unknown/custom instruction set. The main focus is on static techniques and features of firmware images that you can use to your advantage–but some dynamic techniques will be covered as well.
Group actionRevision controlProcess (computing)Black boxData miningRight angleBinary codeResultantNumberLetterpress printingSet (mathematics)Image registrationReverse engineeringGodProof theoryAreaMultiplication signJSONXMLUMLComputer animationLecture/Conference
MereologyComputer animation
2 (number)Data structureDisassemblerTouch typingPoint (geometry)Computer architectureString (computer science)Multiplication signVector spaceSampling (statistics)Mobile appModule (mathematics)Functional (mathematics)Frame problemType theoryPower (physics)CoprocessorCodeFirmwareSet (mathematics)Real-time operating systemAsynchronous Transfer ModeTerm (mathematics)Virtual machineReverse engineeringQuicksortRaw image formatSemiconductor memoryComputer hardwareMereologyFluid staticsAssembly languageImage resolutionStatistical hypothesis testingFocus (optics)Binary codeFrequencyCovering spaceNumberFamilySoftwareComputer programmingNumbering schemeSound effectLevel (video gaming)InformationMicrocontrollerInformation securityBlack boxDataflowRule of inferenceAnalogyControl flowNeuroinformatikMathematical analysisOperator (mathematics)BitSelectivity (electronic)Mixed realityTransmissionskoeffizientCore dumpMultiplicationSeries (mathematics)Different (Kate Ryan album)MultilaterationNoise (electronics)StatisticsSoftware testingEntire functionComputer configurationChainProduct (business)Endliche ModelltheorieCheat <Computerspiel>Symbol tableCausalityHazard (2005 film)ThumbnailService (economics)SpacetimePhysical systemWater vaporProjective planeBuildingTwitterFigurate numberKeyboard shortcutConstructor (object-oriented programming)Dynamical systemMatching (graph theory)
Semiconductor memoryVector spaceFlow separationRight angleBlock (periodic table)Table (information)Series (mathematics)DiagramForceCASE <Informatik>Computer programmingPower (physics)Boundary value problemPattern languageForm (programming)CodeCompilation albumBitNumberQuicksortAddress spaceException handlingBeat (acoustics)CoprocessorCompilerWordDisk read-and-write headBootingBuildingDifferent (Kate Ryan album)FirmwarePoint (geometry)Medical imagingArchaeological field surveyPolygon meshComputer fileData structureMereologyModule (mathematics)Multiplication signComputer configurationSampling (statistics)Regular graphTheory of relativityHydraulic jumpPrice indexHeuristicProcess (computing)Endliche ModelltheorieSpacetimeGame controller1 (number)Surjective functionSelf-similaritySlide ruleHausdorff spaceRepresentation (politics)Element (mathematics)Analytic continuationComputer architectureSystem callXML
MereologyQuicksortHistogramBinary codeHydraulic jumpComputer programmingPattern languageMathematical analysisPhysical systemFluid staticsProcess (computing)Set (mathematics)BitVirtual memorySequencePoint (geometry)SpeicheradresseAddress spaceComputer fileCoprocessorNoise (electronics)Module (mathematics)FirmwareSampling (statistics)NumberMobile appDistribution (mathematics)Matching (graph theory)Logical constantCondition numberVector potentialTheory of relativityAbsolute valueGoodness of fitConfiguration spaceInterface (computing)CASE <Informatik>MalwareFlow separationWritingCodeDifferent (Kate Ryan album)Semiconductor memorySystem callCore dumpOperator (mathematics)SpacetimeFunctional (mathematics)Control flowRule of inferenceStack (abstract data type)Similarity (geometry)Validity (statistics)Mechanism designComputer hardwareGraphics tabletThumbnailTrailCountingHeuristicScripting languageWindowBound stateData structureFormal languageStructural loadTranslation (relic)Sign (mathematics)State of matterArc (geometry)CompilerGodLine (geometry)Exception handlingAnalogyPlotterError messageHypothesisEntire functionMultiplication signDynamical systemDifferential (mechanical device)Right angleSelectivity (electronic)Reading (process)Interior (topology)Heat transferMetropolitan area networkOrder (biology)Formal grammarProgrammer (hardware)Social classBranch (computer science)Electronic program guideCausalityEndliche ModelltheorieDiagram
Right angleoutputWindowLevel (video gaming)State of matterBitBlock (periodic table)Core dumpSpacetimeEquivalence relationAssembly languageComa BerenicesKey (cryptography)Computer hardwareDeterminismSound effectFunction (mathematics)Functional (mathematics)Condition numberDisassemblerOrder (biology)Operator (mathematics)Multiplication signSoftware testingCoprocessorWordCryptographyCuboidDifferent (Kate Ryan album)Computer programmingFlow separationPointer (computer programming)Endliche ModelltheoriePoint (geometry)Semiconductor memorySuite (music)DebuggerQuicksortTuplePredictabilityBus (computing)Physical systemData structureOracleStructural loadFehlererkennungBinary codeReverse engineeringData storage deviceCausalityAddress spaceSet (mathematics)AreaControl flowSoftwareSummierbarkeitCASE <Informatik>MultiplicationRandomizationInterface (computing)Vector spaceMoment <Mathematik>Group actionNeuroinformatikHookingValidity (statistics)User interfaceMereologyOcean currentTransmissionskoeffizientEntire functionFlagResultantLoop (music)Hydraulic jumpMeasurementPower (physics)Crash (computing)1 (number)Similarity (geometry)Dynamical systemCodeLibrary (computing)BootingTelecommunicationDifferential (mechanical device)WritingFigurate numberWhiteboardSoftware bugComputer animation
Lecture/ConferenceComputer animation
Transcript: English(auto-generated)
So, we have a short announcement from Travis Goodspeed, who will also introduce the next talk.
How'd he do? So, as is the tradition at Recon and a few other neighborly conferences, we have the International Journal of Proof of Concept to get the fuck out. This is release number 12. We zero-index, so this is our 13th release, and I believe our third or our fourth at
Recon, which is always generous enough to print these for us and to ensure that the printing is good. They're by the registration desk. Just swing by and grab one, but don't ask for permission, and don't slow down, and dear God, don't mob them because their job is hard enough.
The next talk is by David Karn, a buddy of mine from way back. Oh, God, it has been that long. Yeah. So, he's going to be telling you how to reverse engineer a black box instruction set. By this, it's not like an instruction set that you don't know.
This is one that nobody knows, but for which you have an example binary and the ability to make changes and observe the results in that binary and nothing else. So, without further ado, David Karn. Thank you.
There. I'm on VGA right now. The HDMI wasn't detecting.
Nothing. All right. Well, let's see if I can get it not mirrored. Otherwise, I won't have my speaker notes, and that won't go so well.
There we go. All right.
All right. So, as Travis most kindly introduced, my name is David Karn. The talk I'm doing is reverse engineering instruction encodings from raw binaries, and this talk came about because I was doing some hardware reverse engineering on a custom core, and
a number of my software reverser friends were asking about how does one approach a problem like this. And so, I don't have any fancy, awesome software release for you today, although I will be putting the disassembler and assembler that I'm talking about here up online, but it's not that cool. The target isn't that cool.
And all the techniques that I'm going to be talking about are relatively well known, but a lot of people apparently haven't seen them before, so hence the talk. In fact, they're so well known that on Monday, I was telling some friends on IRC about the talk I was going to be doing, and they said, hey, didn't someone do something just like this at Recon 2012?
And much to my dismay, it turns out they did. So I'm going to begin with a citation to Chernov and Trozina, Recon 2012, reverse engineering of binary programs for custom virtual machines. And while I'm on the subject of citations, I'll also mention effects of FinoElite, building custom disassemblers, which was presented at 27C3 for
reversing engineering step seven stuff. But my focus as opposed to those two is going to be more on microcontrollers and low-level systems, things that are directly coupled to the hardware, and where they were built for custom reasons because they needed custom hardware functionality as opposed to trying to deter analysis or to have a custom byte code.
And life's a little bit different down at those low levels. You find interesting things that you don't find in standard VMs. And in particular, this target that we're going to be looking at today is not really amenable to a lot of automated analysis techniques or wide statistical techniques. The plain text or image size that we have is just so small, and
the fact that it's a mix of code and data tends to really reduce the signal to noise ratio for any kind of bulk statistical or bulk guess and test automated method. So today's example, well, I'm gonna talk about a couple techniques. The first of which is cheating, because there's no point in reverse
engineering an entire custom core just to find out you've discovered the 8051. Second is using a firmware structure to your advantage, and then I'll touch briefly on static techniques and then followed by some dynamic techniques. And I've only got a 30 minute slot, so I'm gonna be moving really, really fast and covering this at a very high level. If you wanna know more details, please pigeonhole me after or
at lunch or something like that. And Chernov and Trozeny covered the static techniques very well, so I'm going to just sort of cover one example of recovering code flow from that. So today's example is the ADF7242, and that's an RF transceiver IC. It's made by analog devices, and
the family includes sort of multiple variants for different frequency bands. And inside of it is a custom core that interacts directly with the RF hardware, and it's interesting for some reasons I'll talk about later. But first I should mention that reversing this has no particular importance in the security scheme of things at all. So I'm not claiming this is an important security finding by being
able to reverse engineer or break this. The only point of interest for going after this thing is I originally started on it because Mike Ryan was interested in using this for a better Zigbee sniffer. And it's interesting because it can be interfaced with a computer with only a low cost SPI cable,
like any FTDI cable that you might have laying around. And since you can execute firmware on the chip, you can do real time operations like real time selective jamming or real time channel hopping to follow a channel hopping transmitter without having to deal with and compensate for USB latency. And of course, finally, it's interesting because it exists.
And a binary out there that's in a custom instruction set is sort of reverser bait. So as part of this project, I created a disassembler assembler for the ADF7242 and the similar families. And I'll post it later for anyone that wants it, if there is anyone. So the first hint we have from the data sheet is that the radio control and
packet management of the part are realized through the use of an 8-bit custom processor and embedded ROM. And this is about all the information we have about it. That there's a packet manager with a processor, that it addresses two memory spaces, which is a program RAM and ROM in one memory space, and a bunch of data for various uses in the other.
So that brings us to technique one, which is cheating. And as a rule of thumb, until proven otherwise, a custom core isn't custom. Most of the time, it usually just ends up being an 8051, a 10 silica extends a core, or a Synopsys ARC series core.
And the extends in ARC cores are really commonly found because they allow a processor designer to sort of check a bunch of options and have a core created for you that you can synthesize in your product. And get a tool chain that knows how to use it. They support adding custom instructions as well. But they're based around a common core that
hex-rays has a disassembler for some of it, I believe. There's definitely disassemblers out there for some of those, and those serve as a starting point for 99% of what you'll need. And of course, you have your friend's binwalk-a, which can sometimes identify an architecture for you if you're lucky. Strings is great, datasheet, press material,
don't neglect this stuff before you dive right into the fun technical work. And it is as an example of why strings is better than one might think. There was a DSP that I was looking at once upon a time that was effectively a black box. This chip, you just gave it a blob that the manufacturer said you had to. And they mentioned Extensa, which tells you it's an Extensa core, that it uses the vector DSP instruction set, as well as the RTOS it uses.
So don't neglect the simple stuff. But back to the sample that we have. We have a datasheet for the part, a loadable firmware module, and an app note describing what that loadable firmware module does. So that loadable firmware module extends the functionality of the 7242, so
it does things like implementing addressing filtering. So it can automatically say, hey, this packet coming in, is this one that's of interest to the processor? Does the CRC match? Is it of a frame type I want? Okay, then interrupt the processor but not otherwise, so it's great for low power modes. And this loadable firmware module we have is only 1369 bytes.
So it's a very small sample in terms of having something to look at to figure out how it's behaving. And what we wanna know is, first of all, what kind of machine are we dealing with? Is it a stack machine? Is it a register machine? What are the data path sizes inside of it?
If it's a register machine, how many registers does it have, and how large are they? We'd like to know whether the instructions are register to register or memory to memory. We also like to know whether the memory layout is one unified address space that covers everything, or whether it's separate address spaces for those two blocks that I described before.
And we actually already have some hints from the diagrams that I'd shown before. And one example is that it showed an 8-bit data path coming from the program ROM. And that right away tells you it's probably not a processor that's using a weird, for example, like the PIC series, a 14-bit instruction word.
Because if you're building custom silicon, and if we assume the data sheet is telling the truth, there's no reason to actually use an 8-bit path from the ROM when you can just use a 14-bit one just as easily. So before I go on to the next slide, does anyone here remember bank switching code?
Is anyone unfortunate enough to be still writing code for something that does bank switching or requires it? I guess there's a couple of us here, but I guess most everyone's lucky. Bank switching is when you swap a bank of memory in and out, because the processor address space isn't large enough to encompass the entire code that you want to run on it. And that shows up really, really well when you have a firmware file.
For example, in this image that I'm showing you right here, you can see regular structure at power of two boundaries. And that's a real clear indication that whatever target you're looking at is using banking of some kind. A good heuristic is to count the number of zeros or f's or
repeating bytes right before power of two boundaries. And the lowest power of two that has that is probably the one that you're seeing banking occur at. But actually, these days, compilers have gotten so good at allocating code in these situations that I had to sort of beat the compiler over the head to make an image that would show up here. And so I recommend the heuristic mesh rather than visually.
But there's other structure in firmware files that we can use to our advantage. And that's that they have to have entry points. So this is a loadable module. It's not a piece of firmware that runs right on the microcontroller, so it might be a little bit different. But still, either the ROM that's gonna be talking to this loadable module, or
if it's just firmware, the raw firmware for the processor, the processor still needs to get to the part of the code that you want it to execute. So it's a very common feature, and you actually heard some of this being talked about in the last talk, to have a vector table. And the vector table tells the processor, hey, here's where you're going to execute in the case of a certain interrupt, or
in, for example, at reset. And it's very common to have this at the start or the end. And sometimes this table takes the form of a number of instructions that will perhaps jump to the code, or it might be a tightly packed table of addresses. And I did a brief survey of a handful of randomly selected embedded
architectures, some of which are common, some of which are less common. And as you can see, the vast majority allocate a contiguous vector table at the start or the end, with the exception of one which actually only ever boots from a bootloader, so it's a bit different. Some of them can relocate it after the start, but it still needs to be at a fixed point at power on, which is generally one of those two places.
Usually it's tightly packed, so it's not spread out over the firmware. It's usually tightly packed at one end or the other. And as I mentioned before, some of them represented at its addresses, others represented at its instructions. And if we go back to our sample, and this is the first time I've showed you the actual sample we had.
Well, what do we see right at the start of the file? It's a pattern of sort of what I would call stride two, a stride two pattern right at the start. Two-byte chunks that are self-similar on a two-byte boundary, followed by a series of three-byte chunks that are similar on that pattern. And if we assume that all of this is a vector table, well, it'd be a bit of an odd one, because you don't normally see a vector table
that has different sized elements in it. It would be harder for the processor designer to implement. So let's think about what we know about vector tables. Well, if it's addresses, it probably doesn't make sense. We'll just look at the first part. Because if we look at those as big-endian encoded addresses,
well, they only differ by a single byte for each value, so that can't be branching to meaningful code. Or if it's a little-endian encoded address, then they're spaced by 0x100 or 256 bytes. And then it runs off the size of the module we just loaded. So that can't be a sensible explanation either. And that leaves instructions for the other option. They probably aren't absolute jumps,
because we'd still have the problem about how you encode the jump destination. But what if they're relative jumps? And if these are two-byte relative jumps to a three-byte instruction, perhaps an absolute jump, then that would actually make sense for the one-byte difference. Because relative jumps are usually added to the program counter in some way.
And that turns out to be exactly what the case is for this particular processor. It's a two-byte relative jump to a three-byte absolute jump that goes somewhere else in the program. And this is a trampoline pattern. It's very common in embedded systems, means the same thing as trampoline. And in any other system you've looked at, it's one jump through another to get to the final destination that the first may not be able to reach.
And if we look at the last few bytes on the absolute jump, well, we can start deciphering those as well. The lowest 12 bits all seem to differ. And they actually end up having address values that all point within the firmware module that we've loaded. They're all within the size bounds of what we're gonna load.
So this makes sense. It's a good sign we're on the right path. But we can be a bit easier than this. Rather than trying to reverse it or analyze it from the bit patterns, the simplest approach is just to make a histogram. Just take the last two byte values and plot it on a chart and see where they go. And you get sort of two main groupings in this distribution.
And you see they're spread out over at sort of around zero and around the 32k mark. And you look at something like this and you say, well, I really don't think that this processor has a 16-bit address space. There's no need for it to be using a full 64k of address space. What's probably more likely is that the top, the 15th bit, is actually a selection.
It differentiates between two different instructions that both take an absolute address. And so the hypothesis that the 15th bit is part of the encoding, not the address. And if we follow that hypothesis, we see something that looks much more reasonable.
And I forgot to mention earlier, these histograms are of all byte patterns that start with an OF byte. So this is basically based on every single OF sequence in the file. And it's a very small file, so you don't see a lot of noise. Normally, you'd have a lot of random noise on the baseline. But this is such a small sample that you don't see it.
So this is all things that match. And you see it collapses right down into a nice distribution. And that's all within the address space of the program ROM, or within the size of a potential program ROM, plus the loadable module. So we have a pretty good guess at now, relative jump, absolute jump, and
now possibly what we think might be a call. Because if you look at, basically, silicon costs money. Unlike a malware VM where they're trying to deter you from analyzing it, the silicon is all about how can I build it in the simplest and cheapest way that it'll work. And an absolute jump is really similar to an absolute call. The only difference is that the absolute call has to push the return address
onto the stack. So it makes sense just to have a single bit that says, hey, you're gonna have to push the return address when you execute this. And once we have a call, we can find ret and the value for return. And so functions normally look something like this. You've got a function, a return address, followed by another function. And indeed, this is one of the heuristics that was proposed by Chernov and
Trozina at Recon 2012, except they were going the other way around, using ret to find call, as opposed to vice versa. But unfortunately, a lot of embedded systems aren't quite that simple, or as simple as the picture I've shown. For example, on ARM, you have constant pools where you have read-only data
allocated, or in some cases, rewrite data I've seen out of one compiler, allocated right after the function body. And in some cases, you also have alignment bytes. And that might be because the processor requires it, because that's what the ABI calls for, or sometimes that's just what the compiler does for no particularly good reason.
So thankfully for us, the ADF7242 is pretty simple. And so this is a histogram of the byte value immediately before a call target. So you look where a call points, and then make a histogram of all of the byte values immediately before that. And there's one byte that sticks out as being by far more prevalent than
any other immediately before where a call may point. And that's byte value OXA, which is also the new line or return character. And if the analog devices guy that originally built this core is out there, it's particularly nice that you made a return instruction,
the return character, it's very convenient. So while this works for the naive example here where you don't have any spacing between functions or paddings, a heuristic that works quite well, and I've used on other systems where it was a bit more complex, is to do sort of a weighted histogram where first you eliminate the padding and
you can figure out what that is cuz the padding is usually the same. And then weight the count of the byte based on how far it was back, so you have a ten byte look back window or something like that before the target. And then weight those in, and that usually ends up with red in the top three or so. So that's worth trying as well. But there's other function structure that we can use to our advantage.
Functions generally need to save and restore callee save state. And some embedded arcs save it for you, but most are pretty simple. And this is where on this particular target that I stopped being able to use fancy histograms and analyses because the sample size is just too small. Everything just gets lost in the noise floor and
there's no point in looking anymore. But so the saving of state is usually something like a push and a pop pair at the start and the end. And some manual analysis just looking at functions identifies a pair of byte values that either always occur together or never occur at the start and the end of the instruction. So you have a pretty good clue that that is involved in saving state of
some kind. And I'm not gonna walk through the whole trial and error process of this entire core because A, I don't have the time. And B, you'd all be bored out of your minds. So I'm gonna sort of leave the static analysis process here for now, but and just touch on a couple other points before I move on to some dynamic stuff.
Remember, we can cheat. In the loadable processor module app note that they had for this particular firmware file, it had a number of memory locations. And those memory locations were used to configure the address matching. So you knew that those memory locations would contain an address. And therefore, those addresses or memory locations had to
be loaded somewhere inside this so that they could be used. So we can go constant hunting, go hunting for the binary value of the constants of those memory locations. And that's an immediate real quick jump to move immediate, followed by move register indirect, and you find those right away. And nothing says that the processor has to use the same memory mapping as your
external configuration interface does, which is overspy in this case. But it's unlikely that they would do anything different because if you do something drastically different, you need two separate sets of address decoders for the same blank of memory. And extra address decoders cost silicon, and silicon costs money.
And they're using a custom core in the first place, probably because they want it to be cheap. Or they need something very high performance that they couldn't do with something off the shelf. So this is the end of what I'll talk about for static methods. But using similar guess and check and validate mechanisms as I described before got us conditional jumps, well, conditional relative jumps and
always relative jumps, absolute call, absolute jump, return, as well as move immediate, register to register transfers, register indirect memory, read and write. As well as some ALU operations, which would be add, subtract, compare, bit, test, bit, clear. And all that came out through sort of guess and check style methodologies.
So remember when you're doing this kind of thing, is that code is sane. Values shouldn't be written to a register, and then promptly overwritten. The ALU and registers shouldn't behave in one way for the vast majority of instructions, and then differently for one instruction alone.
There shouldn't be a huge number of distinct encodings for the same operation that you think is occurring. And values shouldn't be presumed to teleport between registers or from registers to memory or vice versa. Ironically, the ADF7242 breaks through out of those four rules. So you can't take them as a hard and fast rule,
because this core is very tightly coupled, some hardware around it. But those are some general rules of thumb you can use to figure out if you're on the right track. And in fact, while I was putting this deck together, I was reminded of an attempt to decipher a linear A. And linear A is a as yet undeciphered ancient Greek script that was used for
writing down their spoken language. And a reviewer of one proposed translation remarked that the quality of a new translation could be judged by how many, or rather, how few, new gods hitherto unknown to history that the translation proposes. And I think the same can be said for binary analysis,
except replace gods with knops. If you've got five or six different things that you think are probably a knop because you can't figure out what else they might be, you're probably on the wrong path. So this is our particular, all you need for a test bench. This is all you need. I've put the DigiKey parts up there if you want it, but
you can find them both on mouse or any other distributor. All you need is an FTDI cable and the little dev board you can get. So it's really easy to hook up to your computer. I've actually got it with me today. If anyone wants to see it after the talk, just come bug me. And as an aside, dumping the internal ROM on this was really, really easy. So this is the documented command set of the spy interface that you use to
interface with the processor. And I sort of wonder what this undocumented area of the command set was, if anyone has any guesses about what that might get one access to. Anyway, it was very easy to dump the ROM, so nice to say. So moving on to dynamic methods, and I'm gonna have to move really fast,
cuz I think everyone wants to go for lunch in six minutes, and I know I won't keep you through that. It's basically make yourself an oracle and then proceed to query it until you learn all sorts of interesting things. A processor has state, and this is an example of the state that I knew about at the point I stopped my static reversing of it. I had three registers of various sizes.
I, of course, knew the instruction pointer and the RAM. And when you execute some instruction on the processor, well, obviously something happens to the state, otherwise it wouldn't be a very useful instruction. So observing this would be trivial if we had a debugger, because you could just single step it and then dump all the values.
But in this case, we don't even know if a debugger exists. And if it does exist, we don't know how to use it. So the goal for a test bench to try and go after these things is to set up as much state as possible on the core, using the instructions we already know about, which is move immediate. Run an instruction that we don't know what it does, collect all the state that we can, and
then compare it against our model of what that instruction should have done. And for the ADF7242, this looks like just generating a test bench program, compiling on the host with a little assembler, uploading and running that test bench on the hardware. And that test bench sets up the state with the move immediate
instructions, as I said. And then the test bench writes the state to that RAM window that I discussed very briefly, cuz I'm moving pretty quick in this 30 minutes. Writes that state, the host reads it back, and then compares it. And one challenge that you have, especially when you only know a little bit about the processor, is that it can be very hard to retrieve some state without clobbering others.
For example, I had found that the processor had a condition flags register because I saw some things working with it, moving nibbles there that were then tested and branched upon. And I knew how to read the condition flags register, but to do so would clobber a register. And similarly, saving that register would trash the condition flags register. So I couldn't get both at the same time.
And it's obviously a simple solution to that is just to make two test benches, one that gathers one part of the state, and one that gathers the other, and then put it back together in software. So you can see the entire state that changed. And when you run the test bench step like that, you then have to characterize the output. The first output is no effect, and that can be that you're actually a knob.
But really, when you see a whole bunch of things that have no state, it probably means that there is some state that you don't know about. There's something that you haven't realized is in there yet that it's actually being changed. Sometimes it's a constant effect, and the most simple of that is a move immediate or a clear instruction. But it also could be unset or unchanged input state. If there's input state that you're not setting up ahead of time, and
it's always remaining constant under your test bench runs, then that would also appear to be a constant instruction. There's deterministic effects, for example, an ALU operation where you add two input registers that you control and you can predict what the output value will be. There are non-deterministic effects, which only appear non-deterministic
because hardware usually is deterministic. So unless you think for a good reason that there's rdrand or rdtsc equivalents down in your particular target, it probably means there's input state that you're not controlling yet. And finally is a crash condition, and that means that you've either found a new piece of control flow that you can't, that you don't know what it's
doing, or you've found an instruction that's either unimplemented that results in a processor hang, or you've found something weird. And I'll talk about one of the weird things in this chip that I came across in a little bit. And I strongly recommend building a validation test suite, and that's model each instruction. And then what you do is you have the test suite run such that it
compares the execution of the model versus the execution in the hardware. And you should run each test with multiple random input vectors and make sure that you're always predicting the correct output. And something that's really important to do is make sure that you're not trying to predict the entire state, just predict the differentials. So that as you discover a new state over time that you don't necessarily
know about at the beginning, then you can just rerun the entire test bench with that new state being set up to something random, and see if your predictions still hold. For example, that would allow you to detect the difference between an add and an add c. Initially, you might think both they're initially add, they're both add operands. But if there's a carry bit that you don't know to set and
you find it later on, then rerunning the test bench will identify, hey, I missed something there, and I should go back and check it. So I'll talk a little bit about the weird and wonderful things I found there. I think I'm down to one minute, so I'm gonna get through this real quick. It ends up having two separate address spaces.
Program memory is addressed with a 13-bit program counter, and data is addressed with an 11-bit data. Sorry, the RAM for data storage is an 11-bit address bus. Internally, it has these registers, and these are the ones I know about so far. And I'm fairly certain those are most of them, but
you never know with this kind of system. So there's a couple of registers which I'm calling general purpose registers, though they're not really general purpose. I don't have a better name for them. There's two 10-bit registers and a 16-bit register. R1 is usually used as the value in the load store operation,
whereas R2 is usually used as the address. A and R1 are usually used as a pair for ALU ops. So I originally called it A for accumulator, but it turns out not to be that after a bunch more reverse engineering. And there's also two really weird registers, one of which is a packet pointer register, which is sort of a register indirect access with hardware base offset based on the current packet.
It's busy spitting out the RF transmitter. And there's also a specialized loop counter register, which is only ever used by three instructions, which is set loop counter, read loop counter, and decrement and jump if not zero. So that particular register isn't touched by anything else. So it's sort of weird down at the instruction set level, but it gets weirder.
There's some directly coupled IO instructions. For example, there's an instruction that appears to allow turning on and off the power amplifier or setting it to a particular level. There's an instruction that directly enqueues some bits to be transmitted. And there is also blocking instructions to match those. For example, block the core until this particular hardware function
completes, like for example, transmitting a byte out the RF port. And ironically, because of the way that that's built and it's encoded, you can actually block the core until any condition bit is set. So you can actually do block until not zero, which is not an operand you ever find in the disassembled binary,
because it's not terribly useful. Once the core is halted, waiting for it to become non-zero, nothing's ever going to make it non-zero. There's also specialized communications instructions, for example, bit reverse and CRC baked in. And you see a lot of these kind of weird instructions in these special purpose processors.
In other ones I've looked at, I've seen instructions for accelerating S-boxes for cryptography, Viterbi decode acceleration, bit shuffle and extract, stuff for accelerating forward error correction. So there's a lot of really interesting things you find down there. And so the state of the tools today for this particular core,
I have a disassembler that covers the vast majority of the instruction space. The ones that I don't know about never appear in the disassembly I have. So the best guess I have is that they're either unimplemented encoding space or they do something that I haven't found state for yet. I have a basic assembler, as well as the disassembler of course, and
a loader and IO library for playing with this thing. And I'm gonna post it soon at github.com slash davidkarn. It's not going to be while I'm at recon cuz I didn't bring my SSH keys. So thanks for coming out to listen and I guess any questions or does everyone just want to leave for lunch?
Lunch it is.