We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

GNU poke, an extensible editor for structured binary data

00:00

Formal Metadata

Title
GNU poke, an extensible editor for structured binary data
Title of Series
Number of Parts
44
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Producer

Content Metadata

Subject Area
Genre
Abstract
GNU poke is a new interactive editor for binary data. Not limited to editing basic entities such as bits and bytes, it provides a full-fledged procedural, interactive programming language designed to describe data structures and to operate on them. GNU poke is a new interactive editor for binary data. Not limited to editing basic entities such as bits and bytes, it provides a full-fledged procedural, interactive programming language designed to describe data structures and to operate on them. Once a user has defined a structure for binary data (usually matching some file format) she can search, inspect, create, shuffle and modify abstract entities such as ELF relocations, MP3 tags, DWARF expressions, partition table entries, and so on, with primitives resembling simple editing of bits and bytes. The program comes with a library of already written descriptions (or "pickles" in poke parlance) for many binary formats. GNU poke is useful in many domains. It is very well suited to aid in the development of programs that operate on binary files, such as assemblers and linkers. This was in fact the primary inspiration that brought me to write it: easily injecting flaws into ELF files in order to reproduce toolchain bugs. Also, due to its flexibility, poke is also very useful for reverse engineering, where the real structure of the data being edited is discovered by experiment, interactively. It is also good for the fast development of prototypes for programs like linkers, compressors or filters, and it provides a convenient foundation to write other utilities such as diff and patch tools for binary files. This talk (unlike Gaul) is divided into four parts. First I will introduce the program and show what it does: from simple bits/bytes editing to user-defined structures. Then I will show some of the internals, and how poke is implemented. The third block will cover the way of using Poke to describe user data, which is to say the art of writing "pickles". The presentation ends with a status of the project, a call for hackers, and a hint at future works.
Text editorSystem programmingDedekind cutSheaf (mathematics)String (computer science)Computer fileScripting languageFile formatLatent heatComputer programmingProgramming languageGraphics softwareSoftwareRevision controlCore dumpBeta functionPOKEComputer programTerm (mathematics)Computer fileText editorData structureCore dumpMultiplication signSheaf (mathematics)Arithmetic progressionScripting languageArithmetic meanCompilerSoftware bugPoint (geometry)Binary fileAbstractionDemo (music)InformationChainingTotal S.A.TouchscreenPhysicalismPatch (Unix)Uniform resource locatorHacker (term)Gastropod shellAssembly languageLinker (computing)Object-oriented programmingLibrary (computing)Field (computer science)Right angleHardware description languageHazard (2005 film)BitFunction (mathematics)Content (media)Computer clusterDifferent (Kate Ryan album)Computer animationLecture/Conference
Gastropod shellHacker (term)Core dumpExecution unitData structureComputer filePOKEProgramming languageMereologyBinary fileBeat (acoustics)Core dumpReading (process)BitGastropod shellComputer programComputer animation
Limit (category theory)Gastropod shellCore dumpWechselseitige InformationExecution unitPersonal identification numberProgrammable read-only memoryRevision controlComputer configurationImplementationInclusion mapFile formatPOKEConvex hullComputer filePOKEExpressionDemo (music)NumberField (computer science)Data structureConstraint (mathematics)CodeEmailProgramming languageRight angleType theoryComputer animation
Gastropod shellRevision controlCore dumpHill differential equationNo free lunch in search and optimizationType theoryDescriptive statisticsPOKECore dumpComputer fileVariable (mathematics)Hash functionComputer animation
Gastropod shellHill differential equationVirtual machineRevision controlMotion blurFlagInformationPointer (computer programming)Link (knot theory)String (computer science)Inclusion mapNo free lunch in search and optimizationConvex hullTape driveExecution unitVariable (mathematics)Sheaf (mathematics)Element (mathematics)Computer fileSystem callTable (information)Array data structureEmailNumberSequenceDifferent (Kate Ryan album)Constraint (mathematics)Real numberUniform resource locatorRight angleField (computer science)FlagFile formatDecimalException handlingCASE <Informatik>Multiplication signString (computer science)Stability theoryObject-oriented programmingError messageVideo gameFunctional programmingStreaming mediaType theoryPOKEDecision theoryPointer (computer programming)Content (media)Level (video gaming)Line (geometry)Demo (music)SpacetimeReading (process)FluxPersonal identification numberComputer animationSource code
Array data structureIntegerString (computer science)Programming languageMaß <Mathematik>Parity (mathematics)Operations researchInterior (topology)POKEObject-oriented programmingDifferent (Kate Ryan album)File formatBitType theoryBeat (acoustics)Computer programElectronic mailing listCharacteristic polynomialLine (geometry)Computer configurationOrder of magnitudeNumeral (linguistics)Military base1 (number)QuicksortData structureString (computer science)Normal (geometry)Term (mathematics)Programming languageOperator (mathematics)IntegerMultiplication signExecution unitOrientation (vector space)Radical (chemistry)CASE <Informatik>Strategy gameRight angleAlgebraBasis <Mathematik>Array data structureComputer filePersonal identification numberMetreSemiconductor memoryComputer animation
Gastropod shellInformationChi-squared distributionLink (knot theory)Tape driveConvex hullSineTerm (mathematics)POKEData conversionOrder of magnitudeType theoryObject-oriented programmingIntegerMathematicsPhysicalismSlide ruleWebsiteExecution unitRight angleComputer animation
Data conversionExecution unitProgramming languageArray data structureType theoryString (computer science)Element (mathematics)FlagField (computer science)Mach's principleObject-oriented programmingDemo (music)POKEType theoryArray data structureString (computer science)Data structureBitInterior (topology)NumberParameter (computer programming)Multiplication signInfinityBlock (periodic table)IntegerSemiconductor memoryComputer fileFile formatProgramming languageExecution unitElement (mathematics)Auditory maskingInheritance (object-oriented programming)Point (geometry)Sheaf (mathematics)Default (computer science)Algebraic closureFunctional programmingConnectivity (graph theory)Variable (mathematics)Social classLengthField (computer science)Computer programQuicksortEigenvalues and eigenvectorsRight angleFile systemEmailStrategy gameSet (mathematics)Source codeSound effectoutput1 (number)Uniform resource locatorData conversionSign (mathematics)Programmable read-only memoryComputer animation
Field (computer science)Mach's principleType theoryProgramming languageFrame problemConstraint (mathematics)Field (computer science)POKEProcess (computing)Type theoryError messageScripting languageLevel (video gaming)Codierung <Programmierung>View (database)BitWordConstraint (mathematics)Exterior algebraRight angleCondition numberFunctional programmingConstructor (object-oriented programming)ExpressionSparse matrixAxiom of choiceQuicksortBlock (periodic table)NeuroinformatikData structureKeyboard shortcutHidden Markov modelRecursive descent parserArray data structureIntegerFrame problemFile formatScherbeanspruchungMappingSystem callMP3ParsingSpacetimeMereologyPersonal identification numberPoint (geometry)Network topologyLatent heatCodeCASE <Informatik>Normal (geometry)String (computer science)CodePointer (computer programming)Computer fileoutputComputer animation
Polymorphism (materials science)Interior (topology)Programming languageType theoryBlock (periodic table)Texture mappingVariable (mathematics)POKERead-only memorySpacetimePOKEMappingVariable (mathematics)CodeLevel (video gaming)Multiplication signElement (mathematics)Computer fileArray data structureIntegerComputer animation
Interior (topology)Gastropod shellCore dumpExecution unitHidden Markov modelDifferent (Kate Ryan album)SpacetimeSound effectMathematicsEqualiser (mathematics)Array data structureIntegerElement (mathematics)Centralizer and normalizerPOKE2 (number)Goodness of fitComputer animation
Variable (mathematics)POKERead-only memorySpacetimeTexture mappingProgramming languageType theoryArray data structureData typeParameter (computer programming)String (computer science)File formatFunction (mathematics)Transport Layer SecurityArchitectureTransformation (genetics)Mathematical analysisCompilerCode generationPhase transitionParsingCodeMacro (computer science)SpektralmaßAssembly languageLoop (music)AlgorithmIterationInstallable File SystemDatabase transactionCache (computing)Object-oriented programmingProblemorientierte ProgrammierspracheOvalCore dumpLocal GroupRule of inferenceFunctional programmingUniform resource locatorQuicksortArray data structureDifferent (Kate Ryan album)Object-oriented programmingCompilerExpressionAssembly languagePOKEProgramming languageSpacetimeHand fanCodeDisassemblerMultiplication signCartesian coordinate systemVirtual machineCore dumpTexture mappingRight angleProcess (computing)Semiconductor memoryString (computer science)Run time (program lifecycle phase)Computer fileNormal (geometry)Level (video gaming)File systemDefault (computer science)Line (geometry)MereologyCoroutineData structureOperator (mathematics)Type theoryParameter (computer programming)Variable (mathematics)MappingElectronic mailing listCASE <Informatik>Message passingQuantum stateLengthSource codeBitFront and back endsMachine codeMathematical optimizationReading (process)AbstractionVirtualizationComputer architecturePhysical systemComputer animation
Gastropod shellCore dumpHill differential equationSystem programmingLocal GroupRule of inferenceOvalVariable (mathematics)Type theoryFunction (mathematics)File formatComputer fileComputer networkProblemorientierte ProgrammierspracheParameter (computer programming)Core dumpRight anglePOKEFunctional programmingSource codeTheory of relativityComputer fileComputer animationLecture/Conference
Mountain passSuite (music)Core dumpArray data structureComputer fileProgramming languageType theoryAlgebraic closureVariable (mathematics)Texture mappingConstructor (object-oriented programming)Set (mathematics)Pattern matchingImplementationSpacetimeProcess (computing)String (computer science)Module (mathematics)Physical systemHome pageHacker (term)Source codeNetwork topologyComputer fileSoftware testingRaster graphicsFunctional programmingHacker (term)Multiplication signSuite (music)Game controllerTheory of relativityMassSet (mathematics)Electronic mailing listFreewareInformationEnumerated typeSequenceType theoryDirectory serviceSearch treeArithmetic progressionSource codeLoop (music)Projective planeoutputHome pageQuicksortAuditory maskingLink (knot theory)Programming languageBeat (acoustics)Computer animation
System programmingCodierung <Programmierung>Binary fileFunctional programmingMereologyCASE <Informatik>Text editorComputer filePOKEDescriptive statisticsProgramming languageOrder (biology)PrototypeData structureWritingLibrary (computing)Multiplication signProjective planeComputer programRight angleLine (geometry)Constraint (mathematics)Flow separationOpen sourceBitMoment (mathematics)IntegerGraphics tabletLipschitz-StetigkeitCodeUtility softwareParsingDifferenz <Mathematik>Patch (Unix)Repository (publishing)Entropie <Informationstheorie>Lecture/Conference
System programmingWebsiteLattice (order)Computer animation
Transcript: English(auto-generated)
Hi, my name is Jose, and I came here today to talk to you about a program I have been working on, you know, by myself in the last couple of years, and finally I made it basically to do something useful, and I was so happy about it that I published it like two or
three weeks ago, and well, this is it, it's called POKE, and it is a sort of program editor for binary data, to which you can describe the structure and then edit it in terms of the abstractions that you are defining. I know this is not, maybe not that
easy to grasp at first, but that's why, okay, I am going to do a little demo and everything. So, first of all, POKE is not finished. I mean, you cannot really use it to do useful things. Actually, it helps me a lot in my daily work, but this is work in progress. Work that you are, by the way, welcome to join. I will give you
pointers about if you are interested in contributing at the end of the talk. So, why writing something like this? Well, this is an excerpt of a real, of one of the many, many, many, many scripts that I have to do my work. I work on the GNU
toolchain. I am a compiler hacker, mainly, so I work on GCC, on binutils, linker, assembler, and whatnot, and then I find myself, you know, very often in the need to vandalize L files, you know, and object files, and libraries,
and executables, so I can reproduce bugs in, for example, the linker, right? So, I find myself very often in the need to edit, you know, binary files that have some structure. For example, L files. For me, it's very common, and I use things like this. I used, in the past, things like this. So, for example, using object
dump, you know, to get the contents of some information, you know, about the offset of the text section of an L file with object dump, to parse the output, to somehow operate it with a shell script, and then finally to use dd, the dd command, to patch the object file or to get information from it.
Okay, this works. Yeah, sure it does, but it sucks. Why? I mean, look at it, you know? I mean, it's crap, basically. It works, but it is fragile, and it breaks, you know, so often. Also, it is very specific, obviously. If I wanted to do
something slightly different, I would need to write another script. Not good. So, at some point, I was like, okay, you know, this is it. I'm not gonna continue like this, because my amount of scripts, it's increasing all the time, they are breaking all the time, and I am investing so much time of my
work, you know, instead of doing real work to maintain my infrastructure scripts. So then I decided back in 2017, during the summer, I was like, okay, enough. I'm gonna write myself a binary editor, you know, that should be generic. And I did not know, you know, where I was getting into, because
initially, you know, I was like, okay, something simple, you know, it should work. So, you know, I mean, it took actually a while, because initially it was like, okay, very easy. I want to be able to describe the structure of binary
data. For example, of L files, right? They have a header, they have relocations, they have this field, this other field, things like that. So then it was, okay, most of the data I want to describe, usually it is described already by some C library, C header, you know. So it should be, you know, the way I want to describe the structure of the data, it should look
like C structs. It should not be that much different. But of course, I'm sure that you all know that C actually is not very good when it comes to describe physical layouts of data, because everything is undefined, right? And then the C compiler can introduce padding, can introduce alignment, can reorder bit fields, you know, which are less than
one octet, things like that. So it should be, okay, it should be C struct plus something, right? Something extra. Then after one month or a couple of months thinking about it, I found some existing stuff. One is
called Datascript by Gottman Buck, who is a professor from some American university, for which one, who wrote back in 2010, 2011, a paper about something he called Datascript, which is very similar to what I wanted, actually. But it was not that much satisfactory for some reasons I will talk about after. And then there is an aberration, which is called
010 Editor, which is proprietary, which means that it is unusable and it's very bad for the freedom of everyone and we're not going to talk anymore about it. So then I spent a long time, you know, like saying, okay, how can I have a description language that at the same time it is
flexible enough and at the same time allows me to edit data in a transparent way and so on. You will see it working now. And then, well, finally, I got something that makes sense and something that makes sense and something that me in my general stupidity I'm able to implement. So this is the program.
This is how it looks like. Now. I just told you what POC does in a very abstract way. Probably you are still like, okay, what? So it's demo time. All right. I'm going to use POC very fast because we don't have time. I'm going to use POC very fast to POC at relocation in an L file,
which basically corresponds to some real stuff that I have to do, you know, like often. So, okay, this is so I can use and not install POC, you know, but, so this is POC. Oh, sorry. First we need an L file.
We create an L file and then this L file, it has a relocation. I'm using read elf, you know,
which is part of binodils. This is not POC yet. Okay. So let's POC it. So I just opened it with POC and what can I do? Okay. First I can take a look. This is the dump command. You know that basically tells me the bit and bytes and whatever that is in the shell file. Okay. Nothing very, very exciting yet,
but I'm always talking about a structure binary data. So what is the structure of this binary data? This is an L file. How can you define in POC the structure of the data you want to edit? Well, using POC with big P, which is a programming language, which happens to be a full fledged programming language where you can describe
data and operate with it. So of course I have already written a file for elf, which is called elf PK. The files containing POC code, I call them pickles.
And basically you can see here that in the language, which is spoke, you know, you can define the structs, right? You can define the structs, you can define types, you can define things like that. I will explain this later, but very fast for the demo. So here there is a struct which is elf 64 E HDR. This is the structure, you know, of an elf heater, right?
And you see here that you don't always, you don't only specify the different fields, but you can actually also specify, you know, like constraints. Like for example, this is a constraint, which is an arbitrary poke expression that tells that the elf magic number should be like that. So let's poke it first.
I have to load the elf pickle. This basically, you know, like passes, you know, the elf pile, this elf description through poke. Now poke knows about those types. So again, dump. Well,
this will be an elf file at the beginning of the file. How do I get it? Well, I map it. This weird thing with the hash B is an offset, which is zero bytes. I will explain more about it later and it gives me, it gives me the value.
Of course I can put it in a variable. So it's a HDR is a, it's a struct variable that basically contains the elf heater that is at the beginning of the file. Of course, once I map a value and put it in a variable, I can access, you know, the different fields and it connects also, you know, update them. Okay.
What happens if I try to map an elf file and starting at the first byte of the file instead of the zero byte of the fire? Oops.
I get a constraint violated exception. Why? Because the constraints which define, you know, which are defined on the, on the, on this specific extract, which in this case is the elf heater. They are not satisfied with the data, this offset in the file. So then I get a constraint violation that are right exception. Okay.
But I have a heater. So what is our goal to vandalize that relocation? How can I get to a location in an elf file? Okay. I have, I have the elf heater. I don't know how you're familiar with this format, but you have the elf heater and then in the elf heater you have a field which is called E S H off for section heater offset.
Which contains the offset in the file to the beginning of the section heater table. Okay. The section heater table, as its name implies is basically a sequence of different things of, of heater entries, right? Of section heaters. How many of them? Well,
it's also in the heater is called as it's numb. So how can I get it from the file? I'm up at what offset? Oh, sorry.
Um, what are the entities that they want to map here? Section heaters. I have this, another instructs definition here. So this is basically map this number of section heaters, you know, uh, at this specific offset from the file. Okay.
What happened? The HDR. There you go.
So this is an array. I can put it also in a variable. So this is an array. The size of the array is that invites, it contains, okay, let's put it in decimal. It contains 11 sections, right?
So with section are we interested in the one containing relocations, right? How can we identify that section? Well, by the section flags or by the name, for example, how can we get the name of a, of an elf section? For example, let's pick the first section, right?
Or the seventh section, for example, it's its name. But what is its name? It's its name in another file. It's not a realist drink is the offset of the string in the, in the L file, a string table, you know, that gives you the string. Okay. You see that this is not very nice, you know, like the format, but this is the kind of stuff we have to work, you know,
like usually with, so usually in elf and in many formats, it is a pin in the ass. Every time you need to know what is this string? Oh, it is in the string table. Okay. What is string table? Well, usually it is pointed by the heater, blah, blah. So that's why in poke, you can also define functions like this one here.
This is a poke function that given us an elf heater and an offset. It looks in the, it looks for the extreme table and give you, you know, the proper string. So for example, you can call it like this and pop.
This is the section we were looking for. This is not by chance. You know, I mean, I have done this before, before the talk. I mean, so this is the section we are interested in. So we know that we're interested in S H T R seven. Okay, fine. Is it steer seven? It has a name, a type of flux. What do we want?
The section heaters in the elf files. They have a pointer, which is another offset. Yes. It's always like this to the contents of the section in the file, where it starts. So it's there where it starts in SH offsets.
So what do we want to map at SH offsets? We want to map in this case relocations, because we know that this section contains relocations. So there is a struck definition for relocations too. It's just five lines that you write, you know, to describe it. How many of them? Well,
here you see one of the peculiarities that usually you find in object files and in object formats elf is not telling you how many elements in a section you have. It's telling you how much space occupies. You know, the elements in, in the section, fortunately poke allows you to map a race,
not only by number of elements, but also by size. And it does the right thing. So here you can pass this. It's a size. It's okay. Here we have an array of one relocation because we only have one
relocation. All right, so we can do the like, uh, my relocation. It is this array, and this is the relocation I want to vandalize. Let's do it. Let's put another end of six, six, six.
Done. We get out, we do read elf and mission accomplished. All right, so this was the demo. Now this is what you can do with poke. Now you may say,
okay, this was, you know, like a very stupid, okay, maybe it was stupid, but you can do something slightly different on completely different with a completely different object format by writing a pickle of 50 lines. And this saves a lot of time, at least for people like me.
So you saw here that they was using a pickle and loading it and you know, and using sort of a language. The language is called poke, right? And I know I'm going to tell you very fast, very quickly, the different characteristics of it, but only the interesting ones. What it makes makes it different to other programming languages, right? First, the language has support for values like any other, you know,
language. You can specify integers in different numeration bases. You have a string which are new terminated. You have a race. You don't have multi-dimensional race, but you can have a race of a race and you have a structs, right? Nothing special here.
But then let's see the first characteristic that makes poke, you know, special. When I designed this program, one of the first problems I found was, should I make it byte oriented or bit oriented? Right?
I mean, option a, okay, I'm going to make it byte oriented. Why? Because 99% of the formats around are byte oriented, right? So when it comes to specify opposites and things like our sizes or whatever should be in bytes. Okay, fine. Cons of this approach. Well that if you are one of the 1%, you know, who has is unfortunate enough to have to implement deflate for example,
or any other bit oriented format, then this program is not for you. I'm sorry. I didn't want that. Option b, okay, wanted to make it general, but if I make it bit oriented, it's going to be a real, real pain for 99% of the users because you can imagine, you know,
I mean you will get sick multiplying by eight everywhere, right? So I was like, okay, bytes, bits, bits, bytes. Okay. That's why not bits. I know it was getting crazy, but then fortunately in front for we do from time to time, and I do with some friends, we are called, we call ourselves the rabbit heart. We do like hacking weekends. And then when one of those hacking weekends,
I told my friends, Hey, look, okay, I have this problem. So then we brainstorm and then we come with an idea, which is United values, which is in Polk. You have like normal in any other normal programming language, like pure magnitudes, like 23, 23 what? 23 nothing,
23. And also you have only four offsets, right? What they call offsets in book because usually you edit files, but sizes memory, you have United, a name units, you know, United types of which I call values, which they call offset values. So you can specify something like this, like eight beats,
23 bytes to kilobytes. That was the initial idea. Now this has many advantages and actually it's a pretty, it's very, it's like, Hmm, but you know, to have a list of predefined, um, uh, units, you know, it's, it's, it's limited.
So why not allowing, you know, to specify any arbitrary unit? So for example, this is an offset of eight units of eight beats each, eight bytes basically. And this is a two units of three beats each. Okay. But then I thought, well, why are they stopping here? Why can't you be able to specify offsets and sizes also in terms of your own
types? So for example, this is a value, which is what 23 packets, a packet being, you know, that the stuff that you are defining just before that. Of course, this only works for a data structures in poke, which are whose size is known at compile time, right?
Because that's not always the case, but it's useful enough so you can operate in terms of buckets. Of course, this is, those are the operations that makes, you know, like a little algebra of offsets. If you add an offset to another offset, what do you get? I know that offset, if you multiply an offset by an integral, but the magnitude,
you get another offset. If you divide, you get a magnitude, right? If it's like, if you divide meters by meters, what do you get? A pure magnitude. And you also have the rest, the modulus, which is another offset, obviously. The offsets are beautiful also because it allows you,
you know, to think in terms of units, like when you are doing like physics, for example. So for example, how many, if you have, if you define this type packet, which is a struct of an integer and a long like in the example in these slides,
right? How many bytes are in one packet? Well, you divide in one packet. How many bytes? Oops. Yeah. Okay. 12 bytes in poke.
When you write a unit and in 23 packets, two hundred seventy six, five, if you write a United value, you can omit the magnitude if it is one. So if I, if I write it like this, how does this look like?
Like if you are doing physics or whatever, you know, and you are working with units like in your, your maths or your physics, right? I think it's quite cute and nice and also allows you to operate and do conversion without having to have size off and things like that. Right?
It's, it's very nice. Um, also it has a very nice side effect, which is that often in object formats and in object files you have fields which give you the, an offset in the file or the size of something else in some specific unit.
Some formats, for example, this is enough. This is the size of the section pointed with the safe by the section heater. It happens that the size is in bytes. This is how you are specifying polka type or an offset type, right? So if you, what's using C python or whatever to edit your object file,
you need, you need to remember what is the unit every time before writing into it, right? And you have to convert two bytes to convert two bytes every time. So if you are working in relocations for example, and you want to write then relocations in this section from your C program or your python program or whatever,
then you have to remember the unit and you have to do the conversion in poke. You can just assign because folks know the unit of the destination of the assignment and also of the source. So it will do what it will do it for you.
So this was the first thing of the language, which is different to probably what you have seen until now. Um, of course this was the values. Now poke has types. It has a integral types, which is for signing the years and unsigned interiors. Another thing that is different in poke that in most of the primary languages in other primary languages, normally you have interiors which by default are, I don't know,
maybe 32 bits, right? Or maybe 64 bits or whatever, or a in poke at an integer can be of any number of bits from one to 64 and actually plan to expand it to infinite number of bits. And I'm not talking about in bit masks or anything like that.
I'm talking about proper values of seven bits or three bits, one bit, five bits, whatever. Okay. And you can operate with them accordingly. So you can define those types using this syntax, which I think is pretty readable. Also in the offset types, you have offset types, which is a proper value in poke. You know, offsets are a first class citizen in poke.
And then one string type because there is only one day for strings. And of course you also have components strict S types, which is what you use to define the structure of the data you want to use. Arrays are picky in poke because to be honest, when I started it, I always thought, okay, Arrays will be easy, you know,
and it is a structs that are going to be painful to implement and design and extracts were easy. Arrays were the complicated ones, surprisingly. Basically in poke, you have three types of arrays, the array types in the language. You have what unbounded arrays,
which is an array of not defined number of integers. For example, like in this example, then you can, you have a race bounded by number of elements, which can be constant like two integers or can be variable. You, you know, poke is a lexical scope to block oriented language. So it has closures and you can do all sorts of often speakable things with
it. And, um, or it can be variable. Or as we saw in the demo of it in DL file, you can also bound an array type by size. So for example, this array can contain the same number of interiors than that array array of this type of an array of that type, which is two, right?
But this one is bounded by size and that one is bounded by number of elements. And we will see in the next slide or in the one of the next is later that has an impact when you map it. And also of course, you know, the queen of poke, the struct, all right, which is what you use to actually define your, uh, your data structures.
I will go very fast with, with, so this, and I'm very sorry, but I have no time. Um, first, okay. A packet, you know, okay. This packet consists in, in a file or in memory of a bite, which is a magic number of an unsigned integer of 32 bits, which is the length of what follows. And then an array of bytes of data length. You see, you can use,
you know, fields that has been with just before or before to define the data after that, which is the data. This will be the typical definition of a variable length, uh, uh, packet, for example. Also, although this is not implemented yet,
you can pass arguments to the extracts because a struct is also a closure. We poke, you know, you can actually also define variables inside of it and functions. You can pass a arguments that sometimes it's, it's useful. Also, it is very typical in, in,
in object files in this in object formats that this tracks, they have holes in it. You know, the structure of something, it has holes in it. For example, typical example, there are files who have to heaters one at the beginning of the file, one at the end, or if you have a heater and then you have an offset to whatever other data,
you know, think about, for example, a struct, a poker struct, which is, which is an extended to extended to file system, which is a heater or a super block that points to a super block. And then the super block point to the different super blocks. It is a sparse, there are holes in it, in it, right? So in poke, you can specify, I'm sorry, I don't have a pointer. In poke,
you can specify at the offset of a field using what they call a level and note here, you can put any expression that evaluates to an offset, right? And this is how I fix this problem of byte, bit of byte. And this offset, for example, in this case is part of the extract itself.
You see, it is very flexible. I know the syntax is very bad, but this is one of the syntax words that they will fix as soon as they can get rid of the bison parser I'm using at the moment and use a recursive descent within by hand because they want to use the normal C syntax of a label,
you know, like a prefix and a colon, but for syntax issues, you know, and LALR grammars, it's not a choice right now. Okay. Also you can have Pinterest tracks, which is exactly what we understand by unions in C.
I call them pinned because it's like if the fields inside, they are pinned, you know, to the same, it's like, you know, a little tree, right? Why? Because in a Pinterest structure, usually in poke one, this field starts immediately after the first one, right? But if the structure you define it to be pinned, the different fields start at the same offset in iOS space.
I call iOS space the file that you are editing. So it's like a C extract, right? So this is basically also from elf and it's telling you that you have an SD info, which is, you can interpret it either as an I'm sending the year of 32 bits of, or as a 28 beats ST bind or four beats
SD type. All right. It's like a C union. And then you may ask, why did you not call it union? Because spoke has union types too. What is the poke union? This is a concept I got from data script and I really love it.
Basically you have seen that a book extract definition is basically the specification of a decoding process. Because if you look at it from an astronaut point of view, poke is nothing else than from the normal process of the coding, computing with the data and encoding back basically subtracting you from the
encoding under the coding and you can focus on the, on the, on the computing. So when you write in poke extract, you are basically in a sort of a declarative way. You are teaching poke, you know how to decode data, right? The unions give you conditionals.
So in the stroke type, in a strike type, you can use constraints, which is to every field. You can specify an arbitrary poke expression, which has contained course to functions and what not, you know, anything you can imagine as some mappings, although that is a very obscure, I am not sure I want to get into very much yet, but
under, you know, to specific constraint associated with the fields. So how do you look conditionals in poke in your data structures with unions? So in unions you have different fields too, like in this one. So how does it work?
Poke will try to decode every alternative in the union. If certain from the first one and then the first alternative for which no constraint is violated is chosen. And this is recursive. You know, I mean, the constraints should not be immediately in the field of the, of the,
of the union. The union can have a strike. We can have a strike and so on. Any constraint that fails, that invalidates this union, you know? So for example, here you have a, you have an example which is from the tag format in MP3 files, you know, artists, the name of the song and you know, things like that.
It uses this format. So here, for example, you have an ID of this frame who's is four charts. The first one cannot be zero. Then you have a size. And then what comes next? It depends. It depends. If the first bite here equals T,
then what comes next is depend of the value of size. If it is bigger than one, it is two fields. You know, this idea is string zero and then an array of size minus one. Otherwise there is an array of characters of size, size of, which is called frame data. If the ID zero T it's not,
if the ID zero is not T, then it comes an array of size charts and frame data. Now you may wonder is, how is this different to this? Well, it is different because you have this, this happens only if ID zero is not T and this
happened if ID zero is T and size is not bigger than one. I know it is a bit, you know, it takes a little bit to get used to, to, to those unions. But when you do, oh yeah, but what do you do? You know, it's quite nice. Okay. Pokes are possibly more fig types. Um, so you can, you know,
write the generic code. Um, it is lexicaly scoped. You have variables and whatnot. Um, and then mapping and this I, it's worth it to waste time on this. So I was, okay, map this. I mapped that. Okay. Um, in poke you have variables like this bar a, like for example,
this is an array of three elements. You can, I can access them, right? But if I open a file, I can have also, I mapped three in the years and the, at this offset. So B is also an array of three integers and a is an array of three
integers, right? So what is the difference? Well, the difference is that a is not mapped and B is mapped. So if I do a equal 10, okay, change the value of the second element of a. But if I say B one equals 10,
I change it in the variable as well, but I change it, you know, in the IO space. So it has a side effect. So B has an offset. A have not an offset. Good. So for example, you know, okay, this is it, right? So the central idea of poke is that you should be able to work
with normal, not mapped values and mapped values transparently so you can write, and actually you can do it, you can write, I don't know, um, a function that sorts for locations and you can sort an array of locations in memory, like in normal variable. But if the,
if the variable is mapped in some file or some memory, it will also sort it, you know, in the backend. This is that sounds very simple. This is what took me, you know, like months, you know, to actually get it right because it was a schizophrenia, you know? Okay. What is my value of the type? No, it is the type. No, it is the variable. No, it is the value. No, it's the variable. No, no, no.
And I think I got it right. It is values which are mapped or not and only extract and only complex values. So this is the mapping, what they have been telling, right? You use the map operator, which is like that. Now in POG, you have functions,
right? Um, it is again, basically a scope. It is nice. It supports a optional arguments. It supports a variable length or argument list to as an array of any. Also, I am a huge fan of alcohol 68 and one of the things I like more about level 68 is that if a function doesn't get argument,
you can actually use it the same way that it's like if it was sort of a variable, which I like much. And this was about the language. So now, unfortunately, I wanted to tell you how it works internally, but it's going to be super fast. So this is the architecture of the, of the, of the thing, of the application.
This do you have a command part, which is the read line, you know, and all the things boring. Um, then you have a compiler, right? Which actually compiles pickle, a poke, sorry, into, into a beautiful machine, which is the poke virtual machine. And it is the virtual machine through distractions that access the IO space in
this case, the file. So here, this is the structure of the compiler at the right. You have the disassembly of, of the poke virtual machine instructions, which is a stack machine because I love the stack machines of the expression that you see there. Please ignore the prologue and the appeal. Those are the different passes and facets of the compiler.
It does constant folding, some optimizations. I mean, it's not a toy. It's actually a very cute compiler. I invite you to see it in the source code. I have, I made myself a macro assembler to not go crazy while writing the runtime and the code generator. Actually I wrote an AWK assembler,
so I can write the runtime of poke and also, you know, the code generator routine, uh, routines, you know, like in proper assembly like this. Yeah. Then the system basically abstracts what you are editing from a space of bytes into a space of IO objects,
which can be those interiors or whatever. Um, the important detail of this is that it doesn't have to be a file. All right. It will be a process that you access the memory using P trace or whatever. It will be a file system because you maybe want to edit your extended door too or whatever. Right. It doesn't matter.
Anything that can be addressed by bytes can be poked. Um, then poke is extensible. Why? Because you can extend the application in poke. So for example, you saw, okay, this is a very simple syntax. We can feel very proud of, but I don't have time to explain it to you. Anyway. Um,
the dump command they have been using is basically written in one poke function, right? You will see that the, the, um, the arguments, you know, they all have a default value. So for example, you can say dump from the offset size,
besides, right? So you can pass arguments like that. And those arguments to define the new command. Oops, I'm sorry. No, to define the new commands, basically, um, you do it by writing a poke function. What happened here?
Yeah, there you go. A poke function here. All right, so that's all you have to do to extend it. Now pickles are poke source files containing a collection of related goodies type definitions, functions, whatnot, like elf.
It, I have a test suite. It needs more tests, but, um, and it sort of works. This is reassuring, right? I hope it is. So what works most of what I have tell you today is working in poker. Um,
only one kind of fire devices, which is files and things like that, but we need more commands and we need, and also there is a lot before our first release can be done. You know, supporting unions, for example, which is work in progress, support for sets for enumeration to beat mass, beat maps, and things like that. More control sequences in the language. I have a four loop, but you know, it is so cheap to add control,
you know, control sentences that, well, I don't know. Um, and then after the first release, this is a list of big projects I want to do, you know, um, in general, but there is still a lot of work to do. So those are the links for the project. If you want, you know,
to contribute, I have a homepage, you know, um, um, a many list and also we have an IRC challenge in free notes. And if, uh, if you think that book will be useful for you and you want to have fun, please see the hacking file, you know, in the source directory in the source tree because it contains a lot of
information that will be useful for you. All right, so done in time. Yes. Yes. We can have six minutes for questions.
How do you handle padding and a bit endianness? Um, endianness at the moment, you basically, you can set the endian to little, big or host basically. Um,
I want to add a function that you can put in a, in a struct constraint that will change the endianness, you know, runtime, you know, when it is the coding, but that's not implemented yet. According to padding, there is no padding. I mean, in poker, if you have an integer, which is, you know,
like seven bits, it's seven bits. And you know, in that sense it is bit oriented. It doesn't part, it doesn't align what you describe if what you poke. Yeah. Yeah. Okay. Um, did you take a look at kite? I structs already.
So do you know this project? K Kita is okay. A I T T a I. Okay. Because they, they, um, it's an open source program as well. Yes. But it's meant completely the other way around to define some, um, binary structures and it's meant to compile into several languages,
reader. So part of this, you can take their definitions, convert it to yours as well. So I mean, yeah, we have a big repository of binary file definitions. Okay. No, but I will take a look because I want to translate all that. Yeah. Very nice. Yes. But it's oriented to, to write the encoders and decoders,
right? Decoders only encoders. Okay. Yeah. Is it possible to invoke any of these things as a library for say, if you needed to parse files and, um, use the, um,
the pickle definitions that are available with poke? It will. Yes. Actually poke, it started as an editor, you know, that's the main idea. But right now, I really think that it will be a very nice foundation for writing prototypes for binary utilities. Like one of the things that I'm going to write,
I'm very soon is a proper is diff and patch for structure binary data, right? Based on poke descriptions. Yes. More questions. Um, just to make sure that I got it right,
would it be a tool where you can just define a binary, um, Shima once and then use it for writing and reading binary files? So all that you do with your files, you could use this lip poke for,
well, um, from, from poke language in this case. Yes. I mean, could be also many people is asking me, Hey, can I have a book to see that can I have poke to generate an encoder like disorder tool is doing or an advocate there? Yeah. Why not? We can have a pickle written in poke itself. Do you know that? Right.
See out. Yeah, sure. Yeah. That will be welcome. Actually will be useful. We can still have one question. Okay. Well, thank you again.