Dissecting media file formats with Kaitai Struct

Video thumbnail (Frame 0) Video thumbnail (Frame 1319) Video thumbnail (Frame 9682) Video thumbnail (Frame 11627) Video thumbnail (Frame 16424) Video thumbnail (Frame 18458) Video thumbnail (Frame 21610) Video thumbnail (Frame 22938) Video thumbnail (Frame 24887) Video thumbnail (Frame 26749) Video thumbnail (Frame 28094) Video thumbnail (Frame 30252) Video thumbnail (Frame 39098)
Video in TIB AV-Portal: Dissecting media file formats with Kaitai Struct

Formal Metadata

Dissecting media file formats with Kaitai Struct
Title of Series
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Production Year

Content Metadata

Subject Area
Media file formats grow progressively more and more complex every year andsupporting them all requires tremendous effort of all the FOSS developers.It's a problem that concerns not only low-level library developers, but higherlevel software as well: for example, audio sequencer or video editor developerwill still need solid understanding of underlying media file format structureto be able to debug any problems with it (like non-standard chunks inserted bysome properitary software). We'd want to present Kaitai Struct, a newfree/open source solution for file format dissecting, visualization andparsing. It is "write one - run everywhere" solution, where one needs tospecify declarative file format spec once, and then compile it into ready-madeparsing library in a large variety of supported target languages. And ourvisualization tools make Kaitai Struct work like "Wireshark for media files". Kaitai Struct started as an in-house tool in 2014 and was initially releasedas open-source project to public at March-April, 2016, supporting only 2target languages: Java and Ruby. Since then, we've collected 400+ stars aGitHub, hundreds of praising testimonials, got about a dozen of contributors,implemented support for 8 languages, got a handful of useful tools, likeconsole visualizer, GUI visualizer, [WebIDE], etc. Kaitai Struct is frequently compared to proprietary template-enabled hexeditors (like 010 Editor, Synalize It! or Hexinator), but goes one stepforward: it's not only about highlighting entities in hex dump, but also itcan automatically generate working API from spec, which accelerates work offile formats considerably and greatly reduces human factor errors whendeveloping parsers by hand. One's guaranteed to get exactly the same parsingresult both in visualizer and using the compiled API. And, what's important,it's free and open source. Some other comparable projects include BinPAC (but it's C++ only), Preon(which is Java-only), PADS (which targets only C & Haskell), and Construct(Python only). In comparison, Kaitai Struct offers cross-language support, andincludes visualization tools. For media file dissection, we have a growing collection of well-known mediafile formats (including MP4 / QuickTime .mov, AVI, GIF, JPEG, PNG, TIFF, etc),and other interesting file formats (like executables, byte-code, networkprotocols, etc, etc). We hope that open media software developers would findKaitai Struct to be a helpful ally in their arsenal of tools to deal with thediverse world of modern file formats.
Complex (psychology) File format Different (Kate Ryan album) Data storage device Software developer Raw image format
Parsing Code Ferry Corsten Decision theory Correspondence (mathematics) Mereology Proper map Semiconductor memory Single-precision floating-point format Core dump Error message Descriptive statistics Vulnerability (computing) Programming language Email Touchscreen File format Software developer Process (computing) Hexagon Textsystem Data storage device Graphics software Quicksort Reading (process) Reverse engineering Web page Slide rule Service (economics) Open source Streaming media Black box Binary file Hypothesis Latent heat Object-oriented programming String (computer science) Authorization Representation (politics) Boundary value problem Software testing Data structure Task (computing) Algebraic variety Condition number Graph (mathematics) Information Debugger Expert system Interactive television Audio file format Computer file Word Software Personal digital assistant Network topology Statement (computer science) Table (information) Communications protocol Buffer overflow Library (computing)
Presentation of a group Run time (program lifecycle phase) Clique-width Open source Java applet Code Function (mathematics) Graph coloring Declarative programming Dimensional analysis Product (business) Web 2.0 Latent heat Goodness of fit Object-oriented programming Compiler Touch typing Algebraic variety User interface Scripting language Programming language Email Touchscreen File format Structural load Audio file format Line (geometry) Compiler Computer file Message passing Word Ring (mathematics) Visualization (computer graphics) Personal digital assistant Network topology Website Table (information) Logische Programmiersprache Chi-squared distribution Library (computing)
Greatest element Code Correspondence (mathematics) Flash memory Field (computer science) Web 2.0 Latent heat Object-oriented programming Bit rate Core dump Audiovisualisierung Descriptive statistics Hydraulic jump Software developer Line (geometry) Flow separation Symbol table Computer file Compiler Hexagon Visualization (computer graphics) Personal digital assistant Network topology Graphics software Right angle Video game console Resultant Window
Dataflow Implementation Game controller Parsing Code Direction (geometry) Streaming media Mereology Declarative programming Field (computer science) Number Attribute grammar Usability Latent heat Programmschleife Robotics String (computer science) Representation (politics) Integer Data structure Imperative programming Hydraulic jump Condition number Installable File System Email Digitizing Content (media) Instance (computer science) Line (geometry) Sequence Computer file Electronic signature Type theory Subject indexing Personal digital assistant Order (biology) Right angle Iteration Data type
Programming language Code Different (Kate Ryan album) Direction (geometry) Expression Integer Division (mathematics) Mereology Element (mathematics) Attribute grammar Number Power (physics)
Web page Complex (psychology) Context awareness Parsing Run time (program lifecycle phase) Open source Code Decimal Streaming media Mereology Code Field (computer science) Web 2.0 Different (Kate Ryan album) Semiconductor memory Computer programming String (computer science) Operator (mathematics) File system Videoconferencing Representation (politics) Flag Diagram Data compression Data buffer Error message Exception handling Programming language Graph (mathematics) File format Expression Audio file format Bit Computer file Compiler Process (computing) Hexagon Visualization (computer graphics) Integrated development environment Repository (publishing) MiniDisc Quicksort Musical ensemble
hello everyone thank you for joining me today my name is Michael Jackson and I'd like to talk about detecting media file formats today with the tool entitled fraud so what's the idea media file formats raw complex and more complex everyday media software developers had to deal with the multitude of different media file formats some of them are well documented but still pretty complex to process some
of them are proprietary and undocumented and need to be reverse engineered it's even more complicated task and requires quite a few harder to jump for example one need to do the proper blackbox reverse engineer to be included into free and open-source project without major legal problems one need to do lots of testing making some hypothesis or proving them right or wrong and doing some decisions making some proprietary format step by step exploring it and doing some kind of description of specification of such problem basically the nation that such a developer must or take is gone from white representation of file format in a string loading into memory and sometimes going back from memory to string that is we have some kind of a stream that needs to be decided until some objects laid out in memory usually in some of tree or a graph of holsters typical development workflow for such a process involves writing some parsing code with certain programming language then you write some expert debugging code to ensure that it actually works because you need to somehow prove that it works you are the dump it to the screen check some assertions run with it with a debugger or something like that then you just basically debug it till you drop because parsing binary formats is well not exactly Trillo tests there were quite a few pitfalls turtlecom like going over some boundary inside some soft structure dealing with the engine is dealing with byte alignment dealing with a few other things like assertions checks very specific formats some special cases conditional reading for writing etc etc as soon as you finish such a big task you get some sort of parsing library that lots of objects from stream into memory but what then if you want to support some other programming language you just basically need to redo the whole whole process from the start doing basically the same code in some other programming language comparatively actually almost every media form of library Arvind comfort has these dumping tools on this slide I've listed some of them then there they're not just for random reason they're there for reason they are needed by developers of these two libraries two debugger there too so to see if they're real work needless to say that Herson file format libraries can be really devastating like dangerous almost every such an error such as buffer overflow such as reading beyond some part of structure interpreting something wrongly because of human errors in writing the code etc etc I always almost always remotely exploitable their frequently provide arbitrary code execution especially if we are talking about buffer overflows in libraries written in languages such as C they leak information they usually can lead to denial service errors for example in belief in Jesus in 2010 there were 22 vulnerabilities and quite a few of them are very dangerous in a lib tag for example there are 4 lunar abilities but there's too dangerous as well.if will revert to the start and see what format file format specifications exist we'll discover that there is no single Universal exit statement actually it will take a look at the documents provided provided by file format authors there are quite a few things invented to describe a file format such as for example C structures as we see here with elf headers such as interactive tables as we see here with Network tables protocols such as even more intricate tables as we see here with some random page describing Microsoft Word document format that Knapp some bytes dates and
try to explain values network protocol engineers have something better to rescue they've got Wireshark that is the universal accepted to be the tool of trade that allows to detect the packets and see what's inside in some kind of tree format basically have the dump you can point at any byte on the dump and see what well is in protocol in the packets it's corresponds to and vice versa but what about the same stuff for media files it's a bit complicated there are quite a few proprietary tools available such as wonder one editor or hex inator Sinha lies it as some of you may be familiar with but generally there is no universally accepted or at least a tool that supports enough popular formats to dissect and to build upon well so
basically we've tried to fulfill this hole and go actually one step ahead of it so I'd like to present a chi touch drug project which is declarative file format specification language all the words in this phrase are actually meaningful the emphasis is on the declarative it means that we do not actually specify how to read the format but we specify what is inside the format and it's harder to implement in some cases but it gives us quite a few advantages I'd like to show a bit further in the presentation we can compile our case why file that we've set up for with the file format specification into rhythm 8 Persis libraries in quite a few target programming languages that I'll demonstrate further and quite as well we can visualize down from debauch over this file format specifications using several tools that were built around the kite - frog project such as visualization tools such as web ID the Tao demonstrate further as well case why format is yama based and that's actually a good thing because it's very easy to you write your own tools for example it's quite a snap to write a tool that would embed one case why file into another case why file it's generally a matter of writing a script in five or ten reliance and it's quite easy it's black but not least it's freely and Liber we use gplv3 for cut the compiler and actually generated code uses some runtime libraries that we supply as well and they are MIT or a pashtun licensed so even if the compilers GPL it's possible to use the Proceedings of the compiler ring proprietary products as well we support eight target languages right now it's it was bloody shop Java JavaScript from PHP Python and Ruby as a bonus we support output to brettly's our demonstrated further it's quite interesting site project as well as experimental features right now we are building swiss support we're developing support for exporting case white passes were shardis setters to be able to load the same declare formats into the web interface and see there and some quite a few other probably interesting target performance
so how does it look the natural API generated by kite a struct looks something like that here we have demonstration of gif file give file although that generally the kite a struct file declares the tree of objects here for example we have the header the logical screen descriptor the global color table etc etc and generally it goes down to traversing this tree of projects from some start that for example this code in Java starts with gif dot from file that loads some that purse some gif data from the file and then you just do this file dot something dot something do something and extract the data right away for example this is one liner that shows the screen width and height this which is actually their dimensions with visible fragment of gif image right away in one line of code
this is our web ID probably it's much better to just demonstrate it right away this is probably now the main working place of a developer that wants to get it has his or her hands dirty with kyta struct here we have simultaneously an editor to see and edit the case why file its own upper left corner in upper right corner there is a hex dump over some loaded file here we have a Microsoft avi file and it's corresponding from description in as you may have seen in such editors like 101 editor or hexan 8 or other proprietor editors is possible to select any bike in the stream and go exactly to some value in their object tree in lower bottom corner to see what this byte corresponds to and as well one can traverse open and close or before objects in the object tree as well and see how it looks like it's further interactive changing a single symbol in the case white code recompile everything conservatively and tries to rare implement rate rate reports the file in any way that you've just specified so for example if you add some lines of code that add a percent of some new field it would just appear right away you don't need to basically just doing anything for those who want some more
console hardcore style parsing to the result so a console visualizer here we have the GPX file loaded into it it doesn't look just as flash as the web one but it works just as well it doesn't feature any separate editor of course you're expected to have your own editor on console or whatever you want to using some other window so it focuses just on visualization you have the same tree you have the same binary jump you can't reverse it and see if the file specification you've just answered matches what you expect to see or not
this is how our case why files look like basically it's llamo it allows us to set up some fields some field types and that is important because it's declarative and not imperative on the Left we see declarative specification on the right we see what it compiles usually to into some kind of four imperative code note that we do not have things like while loops we do not have things like direct ifs any conditional controls any jumps any basically any code flow that is imminent in imperative implementations we just use we just described the file structures if there is some repetitions we enter it with repetitions if there are some conditional parsing to enter is conditional parsing and it brings up quite a few possibilities interesting interesting possibilities that are
possible we have quite a few built-in data types such as integers floats on a line between digital's and big fields strings robot arrays announced and of course we allow to define user-defined data types we have sequential parsing parsing those one by one in sequence we have out of order parsing something called instances so you can seek in a file actually to do some parsing of other parts of fire by some index or offset we have calculated attributes to ease representation of something that we've got from the file in some more popular form we got checking for magic signatures such fixing content account or for example headers have conditional parsing we have type switching on a condition something like switch we have repetitions until the end of stream repetitions and predefined number of iterations or until some conditions in
that we have powerful expression language that could be used almost everywhere and that's a good thing because it actually compiles into direct expression code in some other languages for example this one shows how there we can parse the attribute named foo lam that allows us to specify unsigned integer for bytes long in the first place and then we parts as many elements as we need calculating the number of elements as full n minus four divided by six so it is how it compounds J C++ this
is how it compares to Python you can see that the cord is quite different and see it's count compares to JavaScript another difference is that for example JavaScript doesn't have the integer division so we invent it with mustard
for and that's another one that I'd wanted to demonstrate interesting stuff about the graph is visualization basically we compile stuff to breathless and this is what we've got it's a human readable diagram that one can pass to this colleagues one can pass to other people to just take a look at the format and implement it for example in some other programs we've got ground repository of formats including tones of formats for light now it's quite interesting you can find it at our github page and see for yourself if anything interested feel there are quite a few image file formats video file formats audio file formats archives documents executables file systems etc so thanks for your attention like to see if any questions arises the
question was about handling incorrect values in complex expression language expressions so basically there is no internal checking in construct it just can pass the expression as I show you and in runtime it will probably arise some sort of exception or error and this would be specific to a particular target language that you compile this code to please [Music] yes since yeah sorry the question was about parsing bitstreams with more complex context like - codes etc so yeah since working no one Oh about six we have support for reading bit strings it's slowly growing probably it's not where optimal two parts difference per se to fulfill simple operations like unpacking something or on compressing something probably it's more efficient to use some special some special processing onto the whole byte string here but it also can be done dancing any major problems here could you repeat it a bit aloud right now the idea repeating a question yeah the question was about reading something from a stream not from the whole file from the disk right now the API allows to basically do reading parsing from two sources from a file on disk or from arbitrary arraign memory if you can organize the person in some way that would be for example chunked based that would parse one chunk and then stop it's no problem to go with this child a string that you would somehow buffer memory and add to this buffer again and again and recall the parts so I guess that would be okay there are several possible question was about adding connotations to Yahoo file to have more human readable representation of whatever's going on there there are quite a few possibilities to do so we are allowed to add some annotations to Phil's to be parsed as offerings into the target code so for example if you load it into IDE you just see whatever they come in for the field are we allow in web ID there are several syntaxes that allows you to mark up some formatting for the representation for example choosing the hex representation by no representation of decimal representation that says four etc and last but not least you can do calculated values that allow you to represent something in more human readable way as well thanks the big flags are parsed using the big person incentives usually and you usually get them as separate fields that you can basically touch in everywhere you won't want