Implementing Domain Specific Languages with LLVM

Video in TIB AV-Portal: Implementing Domain Specific Languages with LLVM

Formal Metadata

Implementing Domain Specific Languages with LLVM
Title of Series
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Production Year

Content Metadata

Subject Area
FOSDEM (Free and Open Source Development European Meeting) is a European event centered around Free and Open Source software development. It is aimed at developers and all interested in the Free and Open Source news in the world. Its goals are to enable developers to meet and to promote the awareness and use of free and open source software.
Programming language Slide rule Standard deviation Graph (mathematics) Problemorientierte Programmiersprache Firewall (computing) Multiplication sign Maxima and minima Computer programming Rule of inference IP address Formal language Compiler Latent heat Befehlsprozessor Kernel (computing) Term (mathematics) Calculation Interpreter (computing) Statement (computer science) Problemorientierte Programmiersprache Pattern language Physical system
Integrated development environment Compiler Continuum hypothesis Combinational logic Right angle Quicksort Representation (politics) Cartesian coordinate system Mereology Mathematical optimization
Point (geometry) Slide rule Machine code Parsing Computer file Code Formal language Intermediate language Mathematical optimization Compilation album Computer architecture Scripting language Programming language Compiler construction Standard deviation Bit Cartesian coordinate system Parsing Compiler Compiler Universe (mathematics) Interpreter (computing) Object (grammar) Quicksort Representation (politics) Mathematical optimization Writing Library (computing)
Bytecode Functional (mathematics) Implementation Link (knot theory) Just-in-Time-Compiler Code Code Bit System call Formal language Machine code Compiler Intermediate language Read-only memory Function (mathematics) Operator (mathematics) Statement (computer science) Interpreter (computing)
Bytecode Run time (program lifecycle phase) Just-in-Time-Compiler Code Volume (thermodynamics) Bit Cartesian coordinate system Code Parsing Formal language Abstract syntax tree Compiler Fluid statics Linker (computing) Linker (computing) Interpreter (computing) Object (grammar) Implementation
Code Multiplication sign Execution unit Source code Set (mathematics) Fluid statics Preprocessor Different (Kate Ryan album) Semiconductor memory Linker (computing) Single-precision floating-point format Core dump Social class Compiler construction File format Block (periodic table) Structural load Bit Control flow Sequence Mikroarchitektur Virtual machine Message passing Compilation album Website Quicksort Procedural programming Representation (politics) Block (periodic table) Data structure Slide rule Dataflow Game controller Functional (mathematics) Computer file Link (knot theory) Virtual machine Branch (computer science) Intermediate language Latent heat Population density Representation (politics) Maß <Mathematik> Compilation album Mathematical optimization Form (programming) Condition number Module (mathematics) Operations research Execution unit Code Stack (abstract data type) Line (geometry) Binary file Compiler Population density Function (mathematics) Statement (computer science) Social class Object (grammar) Library (computing)
Point (geometry) Dataflow Functional (mathematics) Game controller Implementation Divisor Link (knot theory) Multiplication sign Stack (abstract data type) Latent heat Programmschleife Operator (mathematics) Representation (politics) Integer Error message Hydraulic jump Computer architecture Exception handling Condition number Operations research Arm Block (periodic table) Structural load Data storage device Bit Stack (abstract data type) Control flow System call Sequence Frame problem Category of being Pointer (computer programming) Befehlsprozessor Function (mathematics) Spacetime
Dataflow Slide rule Functional (mathematics) Game controller Control flow Infinity Fluid statics Programmschleife Steady state (chemistry) Semiconductor memory Single-precision floating-point format Oval Social class Form (programming) Block (periodic table) Software developer Debugger Constructor (object-oriented programming) Mathematical analysis Bit Multilateration Control flow Variable (mathematics) System call Type theory Message passing Process (computing) Loop (music) Pointer (computer programming) Resource allocation Function (mathematics) Block (periodic table) Resultant Spacetime
Type theory Functional (mathematics) Oval Bit Branch (computer science) Integer Computer programming Local ring Software bug
Standard deviation Functional (mathematics) Logical constant Block (periodic table) Ferry Corsten Structural load Multiplication sign Bit Parameter (computer programming) Computer programming Element (mathematics) Revision control Radical (chemistry) Pointer (computer programming) Befehlsprozessor Oval Semiconductor memory String (computer science) Integer Quicksort Address space Resultant Row (database) Asynchronous Transfer Mode
Aliasing Operations research Validity (statistics) Length Code Mathematical analysis Bit Element (mathematics) Type theory Pointer (computer programming) Mathematics Array data structure Casting (performing arts) Pointer (computer programming) Type theory Oval Different (Kate Ryan album) Data structure Integer Data structure Mathematical optimization Physical system
Source code Compiler construction Slide rule Standard deviation Code Structural load Source code Virtual machine Compiler Bit Code Compiler Software bug Programmer (hardware) Interpreter (computing) Video game Mathematical optimization Compilation album Address space
Slide rule Dataflow Game controller Parsing Code Multiplication sign Cellular automaton 1 (number) Compiler Instance (computer science) Computer programming Formal language Revision control Operator (mathematics) QR code Square number Statement (computer science) Source code Operations research Cellular automaton Expression Computer program Operator (mathematics) Instance (computer science) Control flow Variable (mathematics) Formal language Parsing Compiler Performance appraisal Statement (computer science) Website Problemorientierte Programmiersprache Whiteboard Electric current
Web page Ocean current Regulärer Ausdruck <Textverarbeitung> Code Range (statistics) Tape drive Cellular automaton Video game Turing-Maschine Computer programming Formal language Number Boundary value problem Data structure Game theory Initial value problem Cellular automaton Expression Data storage device Range (statistics) Abstract syntax tree Compiler Statement (computer science) Video game Iteration Game theory
Shift operator Computer file Compiler Bit Mereology Mereology Computer programming Compiler Subject indexing Pointer (computer programming) Operator (mathematics) Interpreter (computing) Representation (politics) Social class Social class
Module (mathematics) Execution unit Functional (mathematics) Module (mathematics) Inheritance (object-oriented programming) Electric generator Debugger Variable (mathematics) System call Formal language Type theory Type theory Function (mathematics) Single-precision floating-point format Compilation album Social class Integer Block (periodic table) Social class Data type
Building Constructor (object-oriented programming) Structural load Code Software bug Formal language Type theory Flag Library (computing) Social class Logical constant Link (knot theory) Electric generator Clique-width Constructor (object-oriented programming) Bit Sequence Formal language Data management Message passing Compilation album Normal (geometry) Block (periodic table) Row (database) Data buffer Slide rule Implementation Functional (mathematics) Module (mathematics) Inheritance (object-oriented programming) Regulärer Ausdruck <Textverarbeitung> Clique-width Revision control Data buffer Mathematical optimization Data type Module (mathematics) Default (computer science) Execution unit Just-in-Time-Compiler Run time (program lifecycle phase) Cellular automaton Expression Boilerplate (text) Code Multilateration Line (geometry) Compiler Word Function (mathematics) Interpreter (computing) Social class Mathematical optimization
Point (geometry) Slide rule Functional (mathematics) Context awareness Implementation Module (mathematics) Parsing Run time (program lifecycle phase) Thread (computing) Structural load Code Insertion loss Parameter (computer programming) Computer programming Different (Kate Ryan album) Spacetime Integer Library (computing) Module (mathematics) Multiplication Link (knot theory) Clique-width Block (periodic table) Run time (program lifecycle phase) Cellular automaton Content (media) Code Bit Formal language Compiler Skeleton (computer programming) Type theory Pointer (computer programming) Oval Function (mathematics) Factory (trading post) Problemorientierte Programmiersprache Block (periodic table) Library (computing) Spacetime Data buffer
Asynchronous Transfer Mode Bit Parameter (computer programming) Element (mathematics) Element (mathematics) Architecture Type theory Loop (music) Pointer (computer programming) Spacetime Integer Quicksort Representation (politics) Data structure
Asynchronous Transfer Mode Code Structural load Data storage device Parameter (computer programming) Element (mathematics) Element (mathematics) Front and back ends Number Architecture Subject indexing Pointer (computer programming) Array data structure Pointer (computer programming) Bit rate Semiconductor memory Statement (computer science) Spacetime Data structure Representation (politics) Data structure Asynchronous Transfer Mode
Dataflow Game controller Regulärer Ausdruck <Textverarbeitung> Dataflow Code Range (statistics) Ordinary differential equation Range (statistics) Bit Control flow Formal language Steady state (chemistry) Read-only memory Operator (mathematics) Statement (computer science) Statement (computer science) Vertex (graph theory) Block (periodic table) Freeware Resultant
Point (geometry) Logical constant Complex (psychology) Slide rule Dataflow Functional (mathematics) Regulärer Ausdruck <Textverarbeitung> Just-in-Time-Compiler Euclidean vector State diagram Code Line (geometry) Multiplication sign Source code Range (statistics) Control flow Parsing Insertion loss Branch (computer science) Automaton Revision control Pointer (computer programming) Sign (mathematics) Interpreter (computing) Flag Error message Module (mathematics) Boolean algebra Twin prime Block (periodic table) Expression Code Range (statistics) Bit Maxima and minima Line (geometry) Statistics Parsing Pointer (computer programming) Loop (music) Error message Personal digital assistant Function (mathematics) Statement (computer science) Interpreter (computing) Block (periodic table) Mathematical optimization Resultant
Complex (psychology) Euclidean vector Code Line (geometry) Multiplication sign Video game Code Parsing Compiler Line (geometry) Statistics Computer programming Formal language Compiler Iteration Personal digital assistant Interpreter (computing) Interpreter (computing) Statement (computer science) Game theory Mathematical optimization
Code Mountain pass Multiplication sign Source code 1 (number) Set (mathematics) Formal language Virtual memory Information Aerodynamics Arc (geometry) Block (periodic table) Interior (topology) Electronic mailing list Bit Mereology Control flow Variable (mathematics) Formal language Parsing Message passing Data management Fluid statics Order (biology) Website Pattern language Block (periodic table) Physical system Web page Functional (mathematics) Game controller Inheritance (object-oriented programming) Module (mathematics) Clique-width Characteristic polynomial Mathematical analysis Revision control Latent heat Cache (computing) Operator (mathematics) Loop (music) Mathematical optimization Default (computer science) Module (mathematics) Operations research Standard deviation Information Run time (program lifecycle phase) Mathematical analysis Code Counting Line (geometry) Compiler Database normalization Loop (music) Personal digital assistant Function (mathematics) Revision control Statement (computer science) Object (grammar) Mathematical optimization Library (computing)
Dynamical system Run time (program lifecycle phase) Concurrency (computer science) Projective plane Code Bit Formal language Compiler Data model Cache (computing) Mechanism design Object model Speicherbereinigung Software framework Block (periodic table) Compilation album Mathematical optimization Speicherbereinigung
Email Slide rule Functional (mathematics) System call Computer file Code Codierung <Programmierung> Multiplication sign Formal language Latent heat Operator (mathematics) Codierung <Programmierung> Message passing Compilation album Mathematical optimization Social class Email Cartesian coordinate system System call Compiler Type theory Message passing Function (mathematics) Free variables and bound variables Pattern language Quicksort Library (computing)
so I have my talk title was implementing domain-specific languages with LLVM so I
thought maybe I should start by explaining what a domain-specific language is and this is kind of a fuzzy term I mean you have really very
specific things the one that I haven't put on the slide which I should have done because we're actually looking at implementing it with LLVM in FreeBSD right now is firewall rules so you write firewall rules in this language that is just designed for things like pattern matching on IP addresses and your kernel will probably interpret them maybe have an ad hoc JIT compiler and you know these things crop up everywhere and maybe not where you'd think of them things like Emacs Lisp JavaScript they're obviously embedded languages and they're obviously programming languages but firewall rules they don't necessarily look like a programming language but actually they are and actually even get some quite big speed improvements by having a decent compiler attached to them and if you're looking at little embedded rotors then they come with maybe a 200 megahertz CPU and you want to connect them to Gigabit Ethernet and if you want to do even very simple firewall rules on every packet that comes in then in an interpreter you are spending a lot of your time just in switch statements doing crazy things and some of them are really specific to a particular use like the graphviz language for doing graph layouts that's maybe not one that you want to do a compiler for but it is a specific language and the unix unix standard systems come with two calculators that have their own little embedded languages a max is demonstrating this rule that
there are two sorts of applications there are integrated development environments and there are things that aren't integrated development environments yet I'm not sure we're in this continuum emacs's but it's it's definitely getting to the formless stage and so what is LVM and how many people were here for Chris latinus keynote last year okay so you people can sleep for the next five minutes all of the most most people I guess if you think of LVM you'll think of clang and you'll think of olv and that's that GCC replacement right and
this is maybe the most user visible Veche in a part of LLVM it is in combination with clang you can take C
and C++ and Objective C code and you can compile it to native code and that's well it's something we all need to do but it's not actually a very interesting thing you take standard programming language you create object code we've kind of been doing that for the last 40 years or so so we know how to do that but the useful thing about our VM is that it really has this very modular architecture so we have libraries for representing our VMs intermediate language which I'll talk about a lot in the next few slides we have languages for doing up to libraries for doing optimizations libraries for writing out assembly libraries for manipulating object code files and they're all all roughly independent there are things like the support library that all of the others use but mostly you can just pick and choose the bits you want so it's it's sort of Lego for compiler writers and this is great because lots of bits of writing a compiler are very tedious fortunately the people who write
compilers disagree about which bits are tedious so all of the bits eventually get written but if you have an application I mean how many people in this room have written a compiler for anything other than a university assignment okay so maybe more than I was expecting but if you have a lang an application that has some embedded scripting facility typically you right a quick and dirty interpreter and then you'll get bored with it and think well okay it works moving on now going to do the bitch that I actually find interesting and the point of this talk is to say really if you have an interpreter and you want to turn that into a compiler LLVM does all the bits that you don't want to do so a bit later I have a worked example that I'll go through and I put together a toy language in wrote an interpreter for it and a compiler and writing the parser talked me about today because I hate parsers writing the interpreter took an afternoon and write and the compiler took an hour so our VM really really is easy to use because it's so modular and it's designed to be extensible so the basic way you use our VM is you will
create our VMs intermediate representation and then you will pass it
off to LLVM and you will say do stuff with this and typically what you'll do is you'll have some support functions so if you have a bytecode interpreter you'll typically have a switch statement and for each byte code you will call out to a c function that handles that one of the easiest ways of using our VM if you have something like that is you just take all of those support functions you compile them with clang into bit code for your compiler you just take each of those byte codes and insert a call to this thing and then you say LLVM i've written some really really ugly ugly horrible slow code just inline everything and make it fast and it does and this is how apples GLSL and their OpenCL implementations work all of the non-trivial operations in GLSL are just written in C and they're the same C functions that they use in their interpreter and they just compile them 12 vmi I using clang and then just emit calls to all of these functions very very simply in their compiling and just spitting them out so this is the typical
thing that you'll start with and I guess
if you have an exist in scripting language in your application or some embedded language you'll have a parser and it will create an abstract syntax tree and you'll then pass that over to an interpreter and maybe there'll be a bytecode stage in the middle but maybe not and the way you
do this with a VM is rather than passing it to the interpreter you transform it into our VM IR and if it's a simple language then you can just pass it straight out to the just-in-time compiler and there's a poly black oh can we turn the volume down on me at all or no okay but you can also do more complicated things so you can have runtime support codes that you write in C or C++ you can compile that with clang you can then optimize that you can link that together with the code that you've generated from your new compiler it was the AL VM linker in the LLVM linker just combines bit code and once you've got the combined bit code you can then do things like inline the C stuff in the code from your language all the other way around and that's actually what we're going to do in the works example later and then you pass that up to the native linker and you can link that with other object code so you can also turn your embedded scripting language into a static compiler with LLVM quite easily so what is OPM's intermediate
representation I've talked about this sort of waving my hands a lot so far that it's LLVM used to stand for low-level virtual machine but it's actually not very low-level or a virtual machine anymore so they they decided that this doesn't stand for anything now but the intermediate representation is a single static assignment unlimited register machine so how many people here don't know what since that single static assignment means okay so SSA form is
something that compiler writers like because it makes a whole load of optimizations useful it basically means each of your registers in LLVM is only able to be assigned to once so in a function rather than having a variable where you assign a value to it and then you do something else and then you assign another value to it each of those assignments would create a new virtual variable and new LVM register our VM is actually not single static assignment on memory so you can totally cheat and not care that it's single static assignment and just write stuff out to memory read it back and then there's a pass that will undo that and create nice SSA form so although it is single static assignment you don't need to actually care about that which is quite nice there are three representations of this intermediate language there's a set of C++ classes and that's generally how you'll use vir in your own code there is a really dense bit code which is a binary representation and that's what you typically use for passing the IR between different tools if you're doing link time optimization your compiler will emit object code but in this bit code format and then your linker will optimize its more and combine it and there's also a human readable LVM assembly format and that's what I'll put on the slides because you can't read the bit code that would just be evil so when you want to start using this stuff you you need to understand more or less how the intermediate representation works and the basic unit of a VM IR is the module so a module is a compilation unit so in C or C++ that's what you get when you take a source code file you run it through the preprocessor and then you have a single preprocessor source file and that will then be turned into an LVN module by your compiler and where you draw the line between what goes into a single module is really language specific but it's basically a load of stuff that is compiled at the same time and optimizations will do things like interprocedural and now assists only on things in the same LLVM module and modules contain functions and I guess most people know what a function is it's a function in the C sense so it's a procedure not the function in the Pascal sense functions contain basic blocks and a basic block is a sequence of instructions with no flow control in it so it's a sequence of instructions which you executed one after the other and any flow control happens at the end of a basic block so if you have an if statement you'll get one basic block with the condition in it and then another basic block with the if body and you'll have a branch either going to that basic block or going to after that one and then yeah it all joins together and they sit blocks contain instructions and the whole LLVM instruction set is documented on the LVM website I'm not going to go through all of that because it would be tedious and take ages but it's it's more or less like a idealized form of any CPU architecture that you're likely to come across and so some of the things are abstracted a bit so you have this our core instruction which does exactly the same thing as the C library a lack of
function it allocates a bit of stack space and on a real architecture you allocate a bit of stack space by you know bumping a frame pointer or something you but this is a really implementation specific thing so in the IR it's abstracted out slightly you have some things that really do map directly to CPU instructions so add and subtract and multiply and divide and actually some of these come in different variants like signed multiply sign divide and they'll have floating points and arithmetic floating point integer variants you have some flow control instructions and these are really like CPU flow control there's no loops in LVM ir it is just you have jump if true jump if false whatever this kind of thing you have a return and you have a call instruction invoke is a bit special invoke is basically a call but
it's designed to interoperate with exception handling so a call always returns immediately after the call so you can put a call instruction in the middle of a basic block and that's fine and invoke instruction has to go at the end of a basic block because it will return to a different basic block depending on whether it returns normally or whether it throws an exception and I'm not going to talk at all about the exception representation of error in LLVM because that would be an entire our talk just by itself and most people would be asleep by the end of it oh and yeah it also provides some intrinsic functions and intrinsic functions are things that should map to a short sequence of CPU instructions but aren't quite instructions so atomic operations fall into this category and they will be on x86 they might be a lock prefixed ad or they might be on arm link load followed by an ad followed by o store conditional but lbm makes you not have to care about this kind of stuff unless you're writing a back-end so one of the things that kind of
confuses people when they start working with LLVM is that registers are values in LLVM and so you have this LLVM value class which most of the instructions inherit from and when you can create an instruction you get something back which
is the result of that instruction as well as representing that instruction so you can pass that to other functions I know some things don't return any value so a call to a function that returns void is just an instruction that then is a null value or it's technically a value of type void which means you can't do anything with it but yeah this this fact that instructions and registers are basically the same thing kind of confuses people a bit and I'll show you an example in a couple of slides of some LLVM IR and pointed some things in that basic blocks I'm not really going to talk about fly instructions very much they're an artifact of this single static assignment form so I said you can't reuse a variable but imagine you have a for loop which loops from say one to a thousand and a single static assignment form you can't have a variable being assigned to more than one so how would you use the loop index variable in this and the way you do that is you have the special v value which says this takes one or more different values depending on what the previous basic block was and I'll actually show you an example of using that a bit later functions have to start with an entry basic block this is just the first basic block that's entered and it's kind of special because it doesn't necessarily have any predecessors every other basic block should be reachable or it will be removed later if you have local variables you can allocate space for them in the entry block and as I say there's this memory to register promotion pass which will construct nice single static assignment form from this and most of the front ends do this even things like clang that are written by LLVM developers because constructing single static assignment form really requires you to know about flow control and do all the flow control analysis and if you're lazy you don't want to do the flow control analysis you want to just let LVM do the flow control analysis for you because you know that's its job you're just writing a front end and front end should be simple so you can just create a a locker for every local variable and let LVM do the promotion for you so this is hello world in our VM
and this is not just printing hello world this is a program that will if you set compile this and you type a dot out Fred it will say hello Fred but otherwise it'll say hello world so at the start we have this branch instruction so well first of all we're comparing this value of Arg C which is just the same as it is in C because this is just a main function with zero actually should be comparing it with one never mind there's a bug in the program but not a bug in the IR and then this will be returning in this register one registers that start with a percent local registers registers that start with an at our global registers but that's yeah just a little detail so this is returning something which is an i1 it says actually doesn't say but the type when you use it isn't i1 and I means integer one means one bit so it's basically a boolean value and then you branch based on that so if this is true
then you go to this label world otherwise you go to the label name and labels are just text strings they can be anything that could be useful yeah yeah can you see this little dot maybe okay the people in the front row can see it so if you go into the world block you have this get element pointer instruction which I'll talk a little bit about later this is basically how LVM represents any of the complex addressing
modes on our CPU so this is taking this global register which is declared somewhere up here this string hello world which is a 12 character I eight which is a char basically an 8-bit integer note that it has an explicit null terminator because values narrow VM they're just like blobs of memory so there's no implicit null termination on strings because there are no strings it's just an array of characters it gets a pointer to the first element and I'll explain that a bit more later and then just calls the C standard put string function and exits if you go to the other version then it does the same sort of thing this time getting the address of the first element in this array or actually the second element the first one is the the program name and then it loads this so this is loading a pointer to a pointer to intake so the result of this will be basically a C string a pointer to intake and then it calls printer with this is the second argument and hello % s is the first argument so this is a really simple LVM program but it demonstrates basically
all of the LLVM that you need to care about LLVM is strongly typed and it's
kind of annoying in a way but it's useful for optimizations it's annoying for front-end writers so you end up
having to have a lot of explicit cast in LVM and these get element pointer things so if you have an array a raisin pointers in LLVM are distinct types arrays of different lengths are distinct types arrays of the same size but different element types are distinct types and this is useful for validation but it does mean you just end up writing lots and lots of cast instructions which the optimizer will strip away structures actually changed a bit without vm3 before LLVM 3 structures that had the same layout were the same now structures that have different names but the same layout has taken to be different but you also still have anonymous structures which will just be merged together and the reason for this change was to allow you to do strict aliasing analysis which I guess most of you have written C and C++ code how many people have had to add ethno strict aliasing to their compiled okay so not many of you that's either very impressive or your build systems doing it automatically and you aren't aware of it so strict aliasing is this great idea that someone understands as committee had which said if we make everybody really really read
the standard in detail then we can make life easier for compiler writers and compiler writers really liked it because there are suddenly a load of optimization opportunities and you can make C codes really fast and everyone else hates it because all of these little tricks that you used to do that the standard didn't really allow but all compilers actually accept it anyway because well that's how the machine works now don't work and your compiler is now suddenly getting into a bit way or in undefined behavior and compiler writers love undefined behavior because it means you can do anything you want and programmers really hate undefined behavior because they compile it on one compiler and they get some behavior and they think that's what's expected to happen and then they upgrade their compiler and suddenly their code doesn't work and they say your compiler it has bugs in it and you say no no undefined behavior haha so I wanted to give you a little example you can grab the full
source code for this tiny compiler an interpreter at the first address the second address is just the syntax highlighted bit that I'm actually going to talk about on the next few slides I have some code but to fit things on the
slide I had to omit a few details so the full version and the full version differs in one very important way and that it actually has comments in it is the second address and if you have a phone that understands these QR code things it will send you to the compiler CC dot HTML thing so maybe you know people following at home can do that and it'll be easier for people who can't see the slides or a board with listening to me and just want to read the code
so everyone had who wanted to do that had got it hadn't they okay so rather than take an existing language which maybe you'd be familiar with or maybe you wouldn't I thought I would make up a language for this talk so this is a really simple domain-specific language that is designed for implementing cellular that cellular automata unfortunately I picked cellular automata which I can't say very quickly so I'm now going to spend the next five or six site slides saying so there they're everywhere so a cellular automaton is a basically a grid containing values and a program runs on each square in the grid and it computes some value based on the current value and based on the values of the neighbors so because I was really lazy writing the parser this language doesn't have named variables it just has ten local registers a zero to a nine and ten global registers G zero to G 9 and when the program starts running all of these are 0 the local registers are reset to zero for every instance of the for every square the global ones are not it also has a special register v which is initially the value of the cell in the old grid and then your program runs and it sets the value of V there are a few simple things so we have some simple arithmetic things and again because I'm lazy when it comes to writing parsers the operator comes first and then there's a register and then there's an expression so for example plus a1 one would increment register a1 we have a couple of more interesting things there is a neighbors statement which then is the only flow control statement in this language and this evaluates the statements in the body once for every neighbor that the program has that in the cell has so if you aren't in a corner it will evaluate this 3 times if you're on it on an edge it'll evaluate it five times and if you're on it in the middle it'll evaluate at nine times and we have a select expression
which takes the register value and it will return a different value depending on what the register value is whether it's a named a specified value if it's in a specified range or if it's not in
any of the specified values it'll be zero so just a few quick examples of this this is a simple program which will flash every cell in the grid so whatever the initial values are after the next iteration it'll be the opposite of this so this says take the value V which is the current value if it's zero set it to one implicitly if it's set to anything else set it to zero and then assign that to V so very simple program a slightly more complicated one this neighbors expression stores the value in the current neighbor in a zero so we're not using a zero there but this just execute this statement once for every neighbor that this has so when you run this you'll get a grid which has three in the corners and five in the edges Nate and all the cells in the middle so this counts the number of neighbors you have by adding one to a zero for every neighbor and then just assigns the value in a 1 to V and a more interesting example I get how many people have not heard of Conway's Game of Life okay so this is this is probably the most famous say there automata program and I wrote an article a couple of years ago where I implemented this and OpenCL and this was about a page of code page and a half of code so this language is quite dense for this kind of thing but this again it counts the number of neighbors that are set to one stores nadan a zero then for the current value if the current value in this cell is zero and it has three neighbors then it's happy and healthy and it stays alive absolutely then it breathe they breed and you get a new value here so it sets it to one if this current value is one and you have two or three neighbors then again it's happy and healthy and it stays alive if it has only one or zero neighbors it gets lonely and dies and if it has more than three neighbors it starves to death and so this this is actually interesting because Conway's Game of Life is turing-complete you can implement a Turing machine in Conway's Game of Life and you can represent tapes with the cells and have them feeding in but this language itself is not you're incomplete so only the fact that you're repeatedly executing this program makes it you're incomplete so this language is right on that boundary which I don't know makes it interesting theoretically maybe not so much practically and this compiler turns it into a abstract syntax tree representation which is a simple structure which just starts with an
operation ID and has two children and what those are depends on what the operation is but typically most of the things just have two values associated with them and we do a little bit of cheating so registers and literals are encoded into pointers and this makes the representation a bit denser and it also means the interpreter can actually be quite fast because loading a register value is just do a shift and then load it from that index so the compiler is just one C++ file and maybe some of you have looked at this C++ DEP DEP C++ file already use a few LLVM classes which I'll go through quickly now and some
parts of this are written in C and are compiled using clang and are then linked into the program so I've kind of gone
through what some of these things are already these are the LVM classes and LVM module is a module the function is well yeah you get the idea global variables a class that represents well yeah obviously I guess the most useful class for people right in front ends is this ir builder class which is just a helper class which will generate LVM instructions for you with just a single function call for each one the type of class i won't deal with very
much because this language only has one type which is a 16-bit integer so we only touch on that very briefly constant
expression is the class that's used for building well obviously constant expressions these last two will come to a little bit later Pass manager builder is the one that we use for building and constructing a sequence of LVM optimization passes which you then want to run an execution engine is the bit that actually does the execution so the
first thing we did in the interpreter there is a function that looks a lot like this that then calls the interpreter again for each cell oh this says 50 when it should say width and height this is not a bug in the version that you can see in the on line that is a bug on the slides so where's the button that works this should say width and this should say height sorry so this this function is just written in C and compiled with clang two-bit code and it has a four word definition of this cell function so what will happen at the end of all of this is that we will emit an actual definition of this cell function and then the LLVM optimizer will inline it here so we don't need to write all of this boilerplate code we don't need to generate lv miR for it we can just copy and paste from the interpreter and it works now wants to do it this way around because all the other examples do it the other way around and have you calling helper functions from your ir and then aligning them but we can do it both ways we can omit something in our LVM ir that we then call from something else and you could modify the implementation of this really easily to for example use live dispatch and run all of these in parallel or maybe just run every row in parallel or something like that and to generate bit code from this we just add this emit LLVM flag decline by default it emits native code but if you add this flag it will emit a VM bit code so this is stuff that you'll see in pretty much any compiler for a small language using our VM we're actually cheating a bit when we construct the LVM module rather than constructing one the normal way with constructors and create functions and stuff we grab a memory buffer containing
the contents of the bit code that we compiled with clang the little runtime library and then just say parse that so we're actually taking a copy of the code that we compiled with clang and rather than linking in with it we're just using this is the skeleton for our new program does that make sense to everyone okay I see slightly confused faces but not everywhere so then we rather than creating a new function we just get the cell function that we forward declared and because this is just a stub function it has no implementation it doesn't have a basic block so we do need to create the entry basic block and this is just standard this a factory function and it goes in the function that we just loaded the name doesn't really matter it's just the first one that's added will automatically be the entry one a lot of these things take this C parameter which isn't on the slides this is an LLVM context and this is what I love um uses for re-entrance e so you can have multiple threads with different lvm contexts but you can also just call get global context and then you get a shared one if you don't care about having a multi-threaded compiler and to be fair most people don't care about having a multi-threaded compiler here we're setting the linkage of the function so this is private linkage which means that it won't be exported outside of this module and that just means that the inliner is more likely to inline it because it's only called in one place if we inline it we get rid of every single need to call it so that it's useful this is just creating a cached copy of the type that we use for registers and then we set the ir builder's entry point insert points to the entry block and then we're ready to start actually creating bit code so the
first thing we need to do for this little language is allocate some space for the registers and if we go back to the runtime a second this cell function this is the one that we're going to be generating where we'll take the global registers by pointer as an argument but it will need to allocate the local registers itself so we create each of the a registers just with a single a locker so this takes the type here so this will create enough space on the stack for a 16-bit integer and this
returns an LVN value which represents a pointer to a 16-bit integer on the stack and we just do this in a loop at the start we store I've missed off the bit where we created a locker for the V as well but nevermind we store the first argument in V and then for the next argument we do this same thing we well yep we then loop over these and store the value 0 so this is a constant integer of this type value 0 in each of the a registers and create one of these for each of the global registers so I've
sort of brushed over the get thing a bit because it's slightly confusing it stands for get element pointer which
yeah it does exactly what it says from the earlier example the we wanted to get a pointer to the first element in an array of care Etta's so this is just like getting this is a rate of pointer decay and see basically so this is a get element pointer with the value 0 in this example we're getting the element pointer in the
array for each index these do get really horribly complicated later on because
they take an arbitrary number of arguments and so they can peer into structures at any interested structures and structures containing a raise of structures containing arrays of structures the really important thing to remember about these is that they don't dereference the pointer so whenever you actually see one of these in code it'll be a get element pointer followed by either a load or a store and the back end will then generate some complex addressing mode instruction but the get element pointer itself does not actually touch the memory arithmetic statements
in this are really easy so we load the old value do an ad or a multiplier or a
divided or whatever and then store the new value and all of the arithmetic instructions are just done by this little switch statement which computes the result so implementing all of the arithmetic operations that this toy language supports which I think six of them there's also a min and a max they're all basically like this and again we're not caring here that LLVM is static single assignment because we're just loading it from the stack doing the operation and then storing it flow control is a little bit more complicated so I'm going to ignore the neighbors instruction if you want to look at that in the code feel free but I'll go through the range instruction so this is creating a fine ode which has one
predecessor for each of the range statements that we have in this expression and then for each one in the block I've had to omit the loop because it didn't quite fit on the slide we first of all get the minimum and maximum values for it specified for the range in as constants because they are constants in the source code and then we have a little bit of boolean logic so we say if the read the register value is signed greater than or equal to the minimum value or signs less than or equal to sorry and signs less than or equal to the maximum value this will then just be a boolean value saying if we've matched this particular case then we create a couple of new basic blocks we have one for calculating the result expression and one for falling through to the next case so this is basically just like a switch statement in C but with a little bit more syntactic sugar it then creates a conditional branch so if we have matched this then we jump to this basic block if we haven't matched it then we fall through and try the next one and then we'll be back here again then it sets the insert point in the range result block that we've just created and then we do something else to actually emit the expression for the result but this is just actually calling back to the compiled statement function and then for the final we just to set its value we say if we've are coming from this block then it's this value and then the fly node will have one value for each incoming block and so this will be in the final thing and then yeah I'm I won't go through this in too much detail because you kind of need to understand the IR but I just want to give you an example that this is a fairly complex flow control operation and I can more or less fit the code for generating it on the slide and if you want to look in more detail the version that you can download has a comment basically on every line which explains how it really works I'm running a bit short on time so I'll skip over it a bit generating code is really really simple you create an execution engine from this module this false flag said is just say whether we should force it to use the interpreter we don't want to force it to use the interpreter because we probably already have a fast interpreter we wanted to use the JIT compiler and this is just a pointer to an error value and if this fails this error value will be set to something or we can print it otherwise we take the execution engine we say give me a pointer to the function with this name and then we get a function pointer back and then you can call that from C code and just a few statistics about
this little toy example the parser was
about four hundred lines of code so that was most of the complexity in this the interpreter was about two hundred lines of code and it's not the greatest interpreter in the world but it's it's yeah it's okay and the compiler was only very slightly more code than the interpreter so it used to be the if you wanted to write a compiler for your language that was ten or a hundred times more effort than writing an interpreter now actually writing the compiler is is really not much more effort than writing an interpreter and in this case well the interpreter is for the examples I ran
the interpreter is only maybe seven or eight times slower than the compiler the reason for that is actually the interpreter is fairly fast for this language there isn't very much optimization you can do on a language where programs are typically only two or three statements long for languages where you're writing quite complex code even if you know I say complex even something that's a dozen a hundred lines long there's more that the optimization can work with so we can we can do a few things to improve on that that I haven't
done here maybe we could improve the quality of the IR we generate for this example not so much because the IR is actually pretty simple because the source language is simple because I needs to explain the language and the compiler fermentation in one 45-minute talk can LVM optimized a bit better what what else can we do well optimizers like having lots of information to work with so maybe rather than having these width and height variables maybe you could emit an in a version that would just work on 16 by 16 blocks and then the compiler could optimize the neighbors statement a bit better because it would know how many neighbors you had maybe you could do a special case for the sides and the corners and that would be maybe adding a dozen lines to the compiler but it will give you a bit of a speed-up I used in the example code the past manager builder and I just said add the standard set of LVM optimizations and no one has really done much effort on working out which ones they should be they're just ones that someone thought made sense at the time and they maybe got tweaked a bit and they don't do very badly for C and for C++ they you know they work about as well as the GCC ones but for languages that have different characteristics maybe altering the order of the passes of adding some extra ones maybe switching them around a bit can give you better results and on the LLVM website there is a list of passes page and this will give you about 200 different passes you can do maybe look at some of these see whether they do things that particularly match your source language or don't and you can write new ones there's a tutorial online for writing a new pass it's pretty
simple you have typically you'll subclass one of these three either modules or function or loop passes depending on the scope of your optimization and there are lots of the passes that exist now our analysis passes so they don't transform the IR they just collect some information about it like is this a loop what else can we do is this value aliased and there are some passes that already exist that are specific to source languages I'm going to mainly talk about object that objective-c because that's mostly what I work on these magic reference counting optimizations a parser LLVM now they use our vm's flow control analysis and they determine where the reference count manipulation operations are redundant or not and if they're redundant they remove them so it's it's pretty language specific and it can also be library specific so if you have a common pattern in some library that you use a lot that could be
made faster you know it's really easy to write an LVN optimization that is specific to your particular framework and just add that to your compilation thing so for example qts signals and slots mechanism someone's looking at speeding that up by taking advantage of the fact that we can just plug extra optimizations in that are only relevant to Qt and for the new step objective-c runtime there are a few that do things like inline caching yeah and if you're
writing your own compiler at OPM does the compile a bit that some of the runtime stuff maybe you can reuse from other projects Lib dispatch gives you cheap from currency the Boehm garbage collector isn't the best in the world but it is you know off the shelf easy to reuse the new step objective-c runtime gives you a nice dynamic object model that's really easy to work with and just one very final thing before they kick me
off the stage a lot of clang which maybe you've come across as a C compiler is also following this same approach with the rest of LLVM so it is also really modular so you can use Lib clang to pass header files you can get the type encodings for those and the names and you can do FFI without any you know and any need for custom header file passes or for defining how things map from C to your language and we do this in pragmatic small talk where you just send a message to this C pseudo class and it knows that that's supposed to be a C function call and so it will check in your headers find a function that has a name matching that message and it will just work by magic and now I'm going to stop speaking before they shout at me
you have to go to the microphone and then they can hear what you're saying we've got time for maybe one or two questions but that's it the earlier question was are these slides online the earlier question was are these slides online which you just answered no are the slides you just presented online I want to annotate them before I put them online have you considered having a application specific language like this for writing compilers yeah actually one of the things that is interesting is if you look at the album optimization passes a lot of them are doing a sort of pattern matching that is really ad hoc that really should be done by some pattern matching language so one thing I'd like to do is have an optimization past language it seems like your code was mostly using the the the api's for LLVM to put to build up something so if you had a language where there were operators that did precisely that it might be easier to write yeah I mean another thing that would be interesting is taking something like clang and just extending it so you write C code with placeholders and then let you fill in the gaps sort of like person yeah and people are running to their next talk I think people are running to their next stop always have the last question


  772 ms - page object


AV-Portal 3.21.3 (19e43a18c8aa08bcbdf3e35b975c18acb737c630)