Dissecting media file formats with Kaitai Struct

FOSDEM VZW

Yakshin, Mikhail (GreyCat)

Formal Metadata

Title

Title of Series

FOSDEM 2017

Number of Parts

611

Author

Yakshin, Mikhail (GreyCat)

License

CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/41981 (DOI)

Publisher

FOSDEM VZW

Release Date

2018

Language

English

Production Year

2017

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Media file formats grow progressively more and more complex every year andsupporting them all requires tremendous effort of all the FOSS developers.It's a problem that concerns not only low-level library developers, but higherlevel software as well: for example, audio sequencer or video editor developerwill still need solid understanding of underlying media file format structureto be able to debug any problems with it (like non-standard chunks inserted bysome properitary software). We'd want to present Kaitai Struct, a newfree/open source solution for file format dissecting, visualization andparsing. It is "write one - run everywhere" solution, where one needs tospecify declarative file format spec once, and then compile it into ready-madeparsing library in a large variety of supported target languages. And ourvisualization tools make Kaitai Struct work like "Wireshark for media files". Kaitai Struct started as an in-house tool in 2014 and was initially releasedas open-source project to public at March-April, 2016, supporting only 2target languages: Java and Ruby. Since then, we've collected 400+ stars aGitHub, hundreds of praising testimonials, got about a dozen of contributors,implemented support for 8 languages, got a handful of useful tools, likeconsole visualizer, GUI visualizer, [WebIDE], etc. Kaitai Struct is frequently compared to proprietary template-enabled hexeditors (like 010 Editor, Synalize It! or Hexinator), but goes one stepforward: it's not only about highlighting entities in hex dump, but also itcan automatically generate working API from spec, which accelerates work offile formats considerably and greatly reduces human factor errors whendeveloping parsers by hand. One's guaranteed to get exactly the same parsingresult both in visualizer and using the compiled API. And, what's important,it's free and open source. Some other comparable projects include BinPAC (but it's C++ only), Preon(which is Java-only), PADS (which targets only C & Haskell), and Construct(Python only). In comparison, Kaitai Struct offers cross-language support, andincludes visualization tools. For media file dissection, we have a growing collection of well-known mediafile formats (including MP4 / QuickTime .mov, AVI, GIF, JPEG, PNG, TIFF, etc),and other interesting file formats (like executables, byte-code, networkprotocols, etc, etc). We hope that open media software developers would findKaitai Struct to be a helpful ally in their arsenal of tools to deal with thediverse world of modern file formats.