We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Implementing Domain Specific Languages with LLVM

00:00

Formal Metadata

Title
Implementing Domain Specific Languages with LLVM
Title of Series
Number of Parts
84
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2012

Content Metadata

Subject Area
Genre
Abstract
FOSDEM (Free and Open Source Development European Meeting) is a European event centered around Free and Open Source software development. It is aimed at developers and all interested in the Free and Open Source news in the world. Its goals are to enable developers to meet and to promote the awareness and use of free and open source software.
Computer programmingLatent heatProblemorientierte ProgrammierspracheProblemorientierte ProgrammierspracheTerm (mathematics)Programming languageIP addressCompilerFirewall (computing)Rule of inferenceBefehlsprozessorPattern languageKernel (computing)Slide ruleMaxima and minimaGraph (mathematics)Standard deviationPhysical systemCalculationStatement (computer science)Interpreter (computing)Multiplication signGraph drawingLecture/ConferenceComputer animation
Cartesian coordinate systemRule of inferenceIntegrated development environmentQuicksortContinuum hypothesisLevel (video gaming)Right angleLecture/Conference
CompilerRepresentation (politics)Mathematical optimizationMereologyCombinational logicRevision controlCodeMachine codeCompiler constructionIntermediate languageProgramming languageObject (grammar)Library (computing)BitStandard deviationCompilerComputer fileQuicksortMathematical optimizationSlide ruleComputer architectureCompilation albumComputer programmingWritingJSONXMLComputer animation
BitParsingCartesian coordinate systemUniverse (mathematics)Scripting languageWritingParsingCompilerInterpreter (computing)Compilation albumProgramming languagePoint (geometry)Expected valueLecture/Conference
Programming languageLink (knot theory)Function (mathematics)Read-only memoryJust-in-Time-CompilerMachine codeCodeIntermediate languageBitSystem callOperator (mathematics)ImplementationInterpreter (computing)BytecodeFunctional (mathematics)Statement (computer science)CodeCompilerComputer animation
ImplementationLinker (computing)Scripting languageCartesian coordinate systemProgramming languageBytecodeInterpreter (computing)Level (video gaming)Abstract syntax treeParsingAbstract syntaxCodeVolume (thermodynamics)Just-in-Time-CompilerCompilerLinker (computing)Fluid staticsMultilaterationRun time (program lifecycle phase)BitCodeObject (grammar)Computer animation
Representation (politics)Binary filePopulation densitySocial classVirtual machineCodeCompilation albumMaß <Mathematik>Execution unitData structureFunction (mathematics)Block (periodic table)Stack (abstract data type)Control flowOperations researchIntermediate languageVirtual machineFluid staticsRepresentation (politics)QuicksortSingle-precision floating-point formatForm (programming)VirtualizationCompiler constructionProcedural programmingSequenceModule (mathematics)Line (geometry)Execution unitMultiplication signCodeMathematical optimizationLinker (computing)PreprocessorCompilation albumCompilerFile formatSlide ruleBitComputer fileSource codePopulation densityObject (grammar)Social classMessage passingLink (knot theory)MikroarchitekturDifferent (Kate Ryan album)Block (periodic table)Semiconductor memoryLibrary (computing)Set (mathematics)Statement (computer science)Game controllerStructural loadDataflowFunctional (mathematics)Core dumpCondition numberBranch (computer science)WebsiteLatent heatResource allocationReading (process)Programming languageMathematical analysisComputer animationXML
Stack (abstract data type)Control flowFunction (mathematics)Operations researchBitProgrammschleifeHydraulic jumpSystem callDivisorPoint (geometry)BefehlsprozessorIntegerLatent heatImplementationGame controllerDataflowFrame problemPointer (computer programming)SpacetimeStack (abstract data type)Computer architectureMultiplication signFunctional (mathematics)XMLComputer animation
Stack (abstract data type)Control flowFunction (mathematics)Operations researchSequenceBlock (periodic table)BefehlsprozessorCategory of beingArmSystem callException handlingFunctional (mathematics)Data storage deviceError messageCondition numberRepresentation (politics)Structural loadLink (knot theory)Operator (mathematics)Entire functionComputer animation
InfinitySteady state (chemistry)Control flowBlock (periodic table)Resource allocationFunction (mathematics)Social classSoftware developerFluid staticsDataflowGame controllerBlock (periodic table)SpacetimeMultilaterationSemiconductor memoryResultantMessage passingProgrammschleifeSingle-precision floating-point formatVariable (mathematics)Form (programming)Functional (mathematics)Mathematical analysisLoop (music)OvalType theorySlide ruleFront and back endsResource allocationPoint (geometry)System callControl flowBitConstructor (object-oriented programming)DebuggerPointer (computer programming)Process (computing)XMLComputer animation
OvalComputer programmingSoftware bugFunctional (mathematics)Type theoryBranch (computer science)IntegerLocal ringBitComputer animation
OvalLogical constantBefehlsprozessorElement (mathematics)BitPointer (computer programming)Asynchronous Transfer ModeRow (database)String (computer science)Block (periodic table)Functional (mathematics)Revision controlStandard deviationFerry CorstenRadical (chemistry)Semiconductor memoryIntegerParameter (computer programming)Computer programmingStructural loadResultantQuicksortMultiplication signAddress spaceComputer animation
OvalOperations researchType theoryData structureIntegerPointer (computer programming)Mathematical optimizationElement (mathematics)Casting (performing arts)Data structureLengthMathematical analysisArray data structureCodeDifferent (Kate Ryan album)Validity (statistics)Type theoryPointer (computer programming)MathematicsAliasingPhysical systemBitStandard deviationComputer animation
Virtual machineBitCompilerCompiler constructionProgrammer (hardware)Structural loadMathematical optimizationCodeCompilation albumStandard deviationVideo gameSoftware bugCodeLecture/Conference
CompilerSource codeCompilerInterpreter (computing)Address spaceBitCodeSource codeSlide ruleWhiteboardRevision controlQR codeComputer animation
Cellular automatonProgramming languageComputer programElectric currentInstance (computer science)Operator (mathematics)Statement (computer science)Operations researchControl flowCellular automatonOperator (mathematics)Problemorientierte ProgrammierspracheStatement (computer science)DataflowGame controllerParsingProgramming languageWebsiteInstance (computer science)Slide ruleComputer programmingMultiplication signExpressionPerformance appraisalSquare number1 (number)ParsingVariable (mathematics)Computer animation
Range (statistics)Regulärer Ausdruck <Textverarbeitung>Cellular automatonGame theoryVideo gameRange (statistics)Cellular automatonNumberComputer programmingOcean currentStatement (computer science)ExpressionOperator (mathematics)Data storage deviceTuring-MaschineInitial value problemTape driveData structureWeb pageVideo gameProgramming languageCompilerCodeGame theorySet (mathematics)Abstract syntax treeBoundary value problemIterationComputer animation
BitShift operatorOperator (mathematics)Subject indexingComputer fileRepresentation (politics)Interpreter (computing)CompilerSocial classPointer (computer programming)Lecture/Conference
Social classMereologyCompilerCompilation albumModule (mathematics)Execution unitFunction (mathematics)Block (periodic table)MereologyComputer programmingSingle-precision floating-point formatSystem callSocial classFront and back endsElectric generatorFunctional (mathematics)Type theoryVariable (mathematics)Module (mathematics)DebuggerComputer animation
Module (mathematics)Compilation albumExecution unitFunction (mathematics)Block (periodic table)Inheritance (object-oriented programming)Type theoryData typeSocial classLogical constantRegulärer Ausdruck <Textverarbeitung>Constructor (object-oriented programming)Mathematical optimizationJust-in-Time-CompilerRun time (program lifecycle phase)Programming languageCodeLink (knot theory)Clique-widthLibrary (computing)Structural loadData bufferProgramming languageType theoryIntegerMessage passingMathematical optimizationExpressionSequenceBitBuildingSocial classMultilaterationData managementLogical constantCodeElectric generatorInterpreter (computing)Row (database)Functional (mathematics)ImplementationCellular automatonSoftware bugData bufferRevision controlClique-widthNormal (geometry)Module (mathematics)Constructor (object-oriented programming)Content (media)FlagDefault (computer science)CompilerBoilerplate (text)Slide ruleWordLine (geometry)Computer animation
Block (periodic table)Structural loadRun time (program lifecycle phase)Module (mathematics)Data bufferFunction (mathematics)Module (mathematics)Point (geometry)Type theoryBlock (periodic table)Insertion lossFunctional (mathematics)Context awarenessCompilerSkeleton (computer programming)Computer programmingDifferent (Kate Ryan album)BitParsingCellular automatonImplementationRun time (program lifecycle phase)Library (computing)CodeThread (computing)Content (media)MultiplicationFactory (trading post)Parameter (computer programming)Slide ruleBytecodeComputer animation
SpacetimeRun time (program lifecycle phase)Programming languageCodeLink (knot theory)OvalClique-widthLibrary (computing)SpacetimeProblemorientierte ProgrammierspracheRun time (program lifecycle phase)Cellular automatonPointer (computer programming)Type theoryParameter (computer programming)IntegerFunctional (mathematics)Loop (music)BitResource allocationComputer animation
ArchitectureRepresentation (politics)Asynchronous Transfer ModeElement (mathematics)Data structurePointer (computer programming)SpacetimeQuicksortElement (mathematics)BitPointer (computer programming)Bit rateSubject indexingComputer animation
Representation (politics)ArchitectureAsynchronous Transfer ModeData structureElement (mathematics)Pointer (computer programming)Steady state (chemistry)Read-only memoryStatement (computer science)Data structureElement (mathematics)Pointer (computer programming)Semiconductor memoryStatement (computer science)NumberArray data structureStructural loadAsynchronous Transfer ModeFront and back endsParameter (computer programming)Data storage deviceCodeDivisorComputer animationXML
Steady state (chemistry)Read-only memoryStatement (computer science)Vertex (graph theory)Block (periodic table)Range (statistics)Regulärer Ausdruck <Textverarbeitung>DataflowControl flowFreewareOrdinary differential equationRange (statistics)Programming languageDataflowGame controllerBitCodeStatement (computer science)ResultantOperator (mathematics)Computer animationLecture/Conference
Range (statistics)Twin primeBlock (periodic table)Regulärer Ausdruck <Textverarbeitung>Mathematical optimizationState diagramError messageAutomatonPointer (computer programming)Function (mathematics)CodeExpressionError messageResultantInterpreter (computing)Boolean algebraJust-in-Time-CompilerPointer (computer programming)Equaliser (mathematics)CodeMaxima and minimaModule (mathematics)Logical constantBitRevision controlPoint (geometry)Insertion lossGame controllerElectric generatorFlagRange (statistics)Block (periodic table)Statement (computer science)Line (geometry)DataflowSource codeLoop (music)CASE <Informatik>Functional (mathematics)Complex (psychology)Slide ruleMultiplication signSign (mathematics)Control flowBranch (computer science)Computer animation
Line (geometry)Euclidean vectorCodeInterpreter (computing)ParsingStatisticsCompilerStatisticsParsingLine (geometry)Interpreter (computing)Complex (psychology)CompilerMultiplication signCodeProgramming languageCASE <Informatik>Computer animationLecture/Conference
Video gameGame theoryIterationInterpreter (computing)CompilerEuclidean vectorLine (geometry)CodeParsingStatisticsPhysical systemInformationLoop (music)Interior (topology)Block (periodic table)Revision controlDefault (computer science)Programming languageInterpreter (computing)Mathematical optimizationComputer programmingStatement (computer science)Line (geometry)Multiplication signProgramming languageCodeCompilerMessage passingRevision controlBitCASE <Informatik>Data managementInformationClique-widthImplementationSource codeOrder (biology)ResultantElectronic mailing list1 (number)Standard deviationSet (mathematics)Characteristic polynomialBlock (periodic table)Web pageVariable (mathematics)WebsiteXMLComputer animation
Module (mathematics)Inheritance (object-oriented programming)Function (mathematics)InformationMathematical analysisMountain passMathematical optimizationArc (geometry)Run time (program lifecycle phase)AerodynamicsFluid staticsCache (computing)MereologyCodeOperations researchControl flowProgramming languageInformationSource codeDataflowFunctional (mathematics)Loop (music)Message passingMathematical analysisGame controllerMathematical optimizationOperator (mathematics)CountingLibrary (computing)Pattern languageModule (mathematics)Programming languageParsingDatabase normalizationLatent heatVirtual memoryObject (grammar)Computer animation
Programming languageBlock (periodic table)Concurrency (computer science)SpeicherbereinigungData modelCodeMathematical optimizationCache (computing)Software frameworkMechanism designCompilation albumRun time (program lifecycle phase)First-order logicSpeicherbereinigungDynamical systemCompilerObject modelProjective planeBitLecture/ConferenceComputer animation
EmailCodierung <Programmierung>Function (mathematics)System callMessage passingComputer fileEmailCodierung <Programmierung>Type theoryMessage passingFunctional (mathematics)ParsingProgramming languageSocial classCompilerLevel (video gaming)Library (computing)System callComputer animation
Free variables and bound variablesQuicksortCodeProgramming languageOperator (mathematics)Mathematical optimizationPattern languageSlide ruleCartesian coordinate systemLatent heatMultiplication signCompilation albumParsingLecture/Conference
Transcript: English(auto-generated)
So my talk title was implementing domain-specific languages with LLVM. So I thought maybe I should start by explaining what a domain-specific language is. And this is kind of a fuzzy term. I mean, you have really very specific things. The one that I haven't put on the slide, which I should have done because we're actually looking at implementing it with LLVM in FreeBSD right now,
is firewall rules. So you write firewall rules in this language that is just designed for things like pattern matching on IP addresses. And your kernel will probably interpret them, maybe have an ad hoc JIT compiler.
And these things crop up everywhere and maybe not where you'd think of them. Things like Emacs, Lisp, JavaScript. They're obviously embedded languages and they're obviously programming languages. But firewall rules, they don't necessarily look like a programming language. But actually they are. And actually you can get some quite big speed improvements by having a decent compiler attached to them.
If you're looking at little embedded routers, then they come with maybe a 200 MHz CPU and you want to connect them to gigabit ethernet. And if you want to do even very simple firewall rules on every packet that comes in,
then in an interpreter you're spending a lot of your time just in switch statements doing crazy things. And some of them are really specific to a particular use like the GraphVis language for doing graph layout. That's maybe not one that you'd want to do a compiler for, but it is a specific language.
The Unix standard systems come with two calculators that have their own little embedded languages. Emacs is demonstrating this rule that there are two sorts of applications. There are integrated development environments and there are things that aren't integrated development environments yet.
I'm not sure where in this continuum Emacs is, but it's definitely getting to the former stage. So what is LLVM? How many people were here for Chris Latner's keynote last year? Okay, so you people can sleep for the next five minutes.
LLVM, most people, I guess, if you think of LLVM, you'll think of Clang and you'll think of, oh, LLVM, that's that GCC replacement, right? And this is maybe the most user-visible version of part of LLVM. It is in combination with Clang, you can take C and C++ and Objective-C code and you can compile it to native code.
And that's, well, it's something we all need to do, but it's not actually a very interesting thing. You take standard programming language, you create object code. We've kind of been doing that for the last 40 years or so, so we know how to do that.
But the useful thing about LLVM is that it really has this very modular architecture. So we have libraries for representing LLVM's intermediate language, which I'll talk about a lot in the next few slides. We have libraries for doing optimizations, libraries for writing out assembly, libraries for manipulating object code files.
And they're all roughly independent. There are things like the support library that all of the others use. But mostly you can just pick and choose the bits you want. So it's sort of Lego for compiler writers.
And this is great because lots of bits of writing a compiler are very tedious. Fortunately, the people who write compilers disagree about which bits are tedious, so all of the bits eventually get written. But if you have an application, I mean, how many people in this room have written a compiler for anything other than a university assignment?
Okay, so maybe more than I was expecting. But if you have an application that has some embedded scripting facility, typically you will write a quick and dirty interpreter, and then you'll get bored with it and think, well, okay, it works, moving on now.
Going to do the bits that I actually find interesting. And the point of this talk is to say, really, if you have an interpreter and you want to turn that into a compiler, LLVM does all the bits that you don't want to do. So a bit later I have a worked example that I'll go through,
and I put together a toy language and wrote an interpreter for it in a compiler. And writing the parser took me about a day because I hate parsers. Writing the interpreter took an afternoon, and writing the compiler took an hour. So LLVM really, really is easy to use because it's so modular and it's designed to be extensible.
So the basic way you use LLVM is you will create LLVM's intermediate representation, and then you will pass it off to LLVM and you will say, do stuff with this. And typically what you'll do is you'll have some support functions. So if you have a byte code interpreter, you'll typically have a switch statement,
and for each byte code you'll call out to a C function that handles that. And one of the easiest ways of using LLVM if you have something like that is you just take all of those support functions, you compile them with Clang into bit code. For your compiler, you just take each of those byte codes and insert a call to this thing,
and then you say, LLVM, I've written some really, really ugly, horrible, slow code. Just inline everything and make it fast, and it does. And this is how Apple's GLSL and their OpenCL implementations work. All of the non-trivial operations in GLSL are just written in C,
and they're the same C functions that they use in their interpreter, and they just compile them to LLVM IR using Clang, and then just emit calls to all of these functions very, very simply in their compiler and just spit them out.
So this is the typical thing that you'll start with, and I guess if you have an existing scripting language in your application or some embedded language, you'll have a parser, and it'll create an abstract syntax tree, and you'll then pass that over to an interpreter, and maybe there'll be a byte code stage in the middle, but maybe not.
And the way you do this with LLVM is rather than passing it to the interpreter, you transform it into LLVM IR, and if it's a simple language, then you can just pass it straight out to the just-in-time compiler. There's a poll black code. Can we turn the volume down on me at all?
No? Okay. But you can also do more complicated things. So you can have runtime support code that you write in C or C++. You can compile that with Clang. You can then optimize that. You can link that together with the code that you've generated from your new compiler.
It was the LLVM linker, and the LLVM linker just combines bit code, and once you've got the combined bit code, you can then do things like inline the C stuff in the code from your language or the other way around, and that's actually what we're going to do in the works example later.
And then you pass that off to the native linker, and you can link that with other object code. So you can also turn your embedded scripting language into a static compiler with LLVM quite easily. So what is LLVM's intermediate representation? I've talked about this sort of waving my hands a lot so far,
but LLVM used to stand for low-level virtual machine, but it's actually not very low-level or a virtual machine anymore, so they decided that this doesn't stand for anything now. But the intermediate representation is a single static assignment unlimited register machine.
So how many people here don't know what single static assignment means? Okay, so SSA form is something that compiler writers like because it makes a whole load of optimizations useful. It basically means each of your registers in LLVM
is only able to be assigned to once. So in a function, rather than having a variable where you assign a value to it and then you do something else and then you assign another value to it, each of those assignments would create a new virtual variable, a new LLVM register.
LLVM is actually not single static assignment on memory, so you can totally cheat and not care that it's single static assignment and just write stuff out to memory, read it back, and then there's a pass that will undo that and create nice SSA form. So although it is single static assignment, you don't need to actually care about that, which is quite nice.
There are three representations of this intermediate language. There's a set of C++ classes, and that's generally how you'll use the IR in your own code. There is a really dense bit code, which is a binary representation, and that's what you typically use for passing the IR between different tools.
If you're doing link time optimization, your compiler will emit object code, but in this bit code format, and then your linker will optimize it some more and combine it. And there's also a human readable LLVM assembly format, and that's what I'll put on the slides
because you can't read the bit code. That would just be evil. So when you want to start using this stuff, you need to understand more or less how the intermediate representation works. And the basic unit of LLVM IR is the module.
So a module is a compilation unit, so in C or C++, that's what you get when you take a source file, you run it through the preprocessor, and then you have a single preprocess source file, and that will then be turned into an LLVM module by your compiler. And where you draw the line between what goes into a single module
is really language specific, but it's basically a load of stuff that is compiled at the same time and optimizations will do things like interprocedural analysis only on things in the same LLVM module.
Modules contain functions, and I guess most people know what a function is. It's a function in the C sense, so it's a procedure, not a function in the Pascal sense. Functions contain basic blocks, and a basic block is a sequence of instructions with no flow control in it. So it's a sequence of instructions which are executed one after the other,
and any flow control happens at the end of a basic block. So if you have an if statement, you'll get one basic block with the condition in it, and then another basic block with the if body, and you'll have a branch either going to that basic block or going to after that one,
and then, yeah, it all joins together. And basic blocks contain instructions, and the whole LLVM instruction set is documented on the LLVM website. I'm not going to go through all of that because it would be tedious and take ages, but it's more or less like an idealized form of any CPU architecture that you're likely to come across.
So some of the things are abstracted a bit, so you have this allocer instruction which does exactly the same thing as the C library allocer function. It allocates a bit of stack space, and on a real architecture you allocate a bit of stack space by, you know, bumping a frame pointer or something,
but this is a really implementation-specific thing, so in the IR it's abstracted out slightly. You have some things that really do map directly to CPU instructions, so add and subtract and multiply and divide, and actually some of these come in different variants
like signed multiply, signed divide, and they'll have floating points and integer variants. You have some flow control instructions, and these are really like CPU flow control. There's no loops in LLVM IR.
It is just you have jump if true, jump if false, whatever, this kind of thing. You have a return and you have a call instruction. Invoke is a bit special. Invoke is basically a call, but it's designed to interoperate with exception handling, so a call always returns immediately after the call,
so you can put a call instruction in the middle of a basic block, and that's fine. An invoke instruction has to go at the end of a basic block because it will return to a different basic block depending on whether it returns normally or whether it throws an exception, and I'm not going to talk at all about the exception representation in LLVM
because that would be an entire hour talk just by itself, and most people would be asleep by the end of it. Oh, and yeah, it also provides some intrinsic functions, and intrinsic functions are things that should map to a short sequence of CPU instructions
but aren't quite instructions, so atomic operations fall into this category, and they will be on x86. They might be a lock prefixed ad, or they might be on link load,
followed by an ad, followed by a store conditional, but LLVM makes you not have to care about this kind of stuff unless you're writing a backend. So one of the things that kind of confuses people when they start working with LLVM is that registers are values in LLVM,
and so you have this LLVM value class which most of the instructions inherit from, and when you can create an instruction, you get something back which is the result of that instruction as well as representing that instruction, so you can pass that to other functions.
Some things don't return any value, so a call to a function that returns void is just an instruction that then is a null value, or it's technically a value of type void, which means you can't do anything with it. But yeah, this fact that instructions and registers
are basically the same thing kind of confuses people a bit, and I'll show you an example in a couple of slides of some LLVM IR and point at some things in that. Basic blocks. I'm not really going to talk about phi instructions very much. They're an artifact of this single static assignment form.
So I said you can't reuse a variable, but imagine you have a for loop which loops from, say, one to a thousand, and in single static assignment form, you can't have a variable being assigned to more than once, so how would you use the loop index variable in this? And the way you do that is you have this special phi value
which says this takes one or more different values depending on what the previous basic block was, and I'll actually show you an example of using that a bit later. Functions have to start with an entry basic block.
This is just the first basic block that's entered, and it's kind of special because it doesn't necessarily have any predecessors. Every other basic block should be reachable, or it will be removed later. If you have local variables, you can allocate space for them in the entry block, and then, as I say, there's this memory to register promotion pass
which will construct nice single static assignment form from this, and most of the frontends do this, even things like Clang that are written by LLVM developers because constructing single static assignment form really requires you to know about flow control and do all the flow control analysis,
and if you're lazy, you don't want to do the flow control analysis. You want to just let LLVM do the flow control analysis for you because that's its job. You're just writing a frontend, and frontends should be simple, so you can just create an allocer for every local variable and let LLVM do the promotion for you.
So this is Hello World in LLVM, and this is not just printing Hello World. This is a program that will, if you compile this and you type a.out Fred, it will say Hello Fred, but otherwise it'll say Hello World.
So at the start, we have this branch instruction. So, well, first of all, we're comparing this value of argc, which is just the same as it is in C because this is just a main function with zero, actually should be comparing it with one. Never mind, there's a bug in the program, but not a bug in the IR.
And then this will be returning in this register one. Registers that start with a percent are local registers. Registers that start with an at are global registers, but that's, yeah, just a little detail. So this is returning something which is an i1. It says, actually it doesn't say,
but the type when you use it is an i1, and i means integer, one means one bit. So it's basically a boolean value. And then you branch based on that. So if this is true, then you go to this label world. Otherwise, you go to the label name, and labels are just text strings.
They can be anything. Ah, that could be useful. Here, can you see this little dot? Maybe, okay. The people in the front row can see it. So if you go into the world block, you have this get element pointer instruction, which I'll talk a little bit about later.
This is basically how LLVM represents any of the complex addressing modes on a CPU. So this is taking this global register, which is declared somewhere up here, this string, hello world, which is a 12 character i8, which is a char, basically an eight bit integer.
Note that it has an explicit null terminator because values in LLVM, they're just like blobs of memory. So there's no implicit null termination on strings because there are no strings. It's just an array of characters. It gets a pointer to the first element,
and I'll explain that a bit more later. And then it just calls the C standard put string function and exits. If you go to the other version, then it does the same sort of thing, this time getting the address of the first element
in this array, or actually the second element, the first one is the program name. And then it loads this, so this is loading a pointer to a pointer to int8. So the result of this will be basically a C string,
a pointer to int8, and then it calls printf with this as the second argument, and hello percent s as the first argument. So this is a really simple LLVM program, but it demonstrates basically all of the LLVM that you need to care about. LLVM is strongly typed,
and it's kind of annoying in a way, but it's useful for optimizations. It's annoying for front-end writers. So you end up having to have a lot of explicit casts in LLVM and these get element pointer things. So if you have an array, arrays and pointers in LLVM are distinct types.
Arrays of different lengths are distinct types. Arrays of the same size but different element types are distinct types. And this is useful for validation, but it does mean you just end up writing lots and lots of cast instructions, which the optimizer will strip away. Structures actually changed a bit with LLVM 3.
Before LLVM 3, structures that had the same layout were the same. Now structures that have different names but the same layout is taken to be different, but you also still have anonymous structures which will just be merged together.
And the reason for this change was to allow you to do strict aliasing analysis, which I guess most of you have written C and C++ code. How many people have had to add ethno strict aliasing to their compile flags? Okay, so not many of you.
That's either very impressive or your build system's doing it automatically and you aren't aware of it. So strict aliasing is this great idea that someone on the standards committee had, which said if we make everybody really, really read the standard in detail, then we can make life easier for compiler writers.
And compiler writers really like it because there are suddenly a load of optimization opportunities and you can make C code really fast and everyone else hates it because all of these little tricks that you used to do that the standard didn't really allow but all compilers actually accepted anyway because, well, that's how the machine works now don't work.
And your compiler is now suddenly getting into a bit where you're in undefined behavior. And compiler writers love undefined behavior because it means you can do anything you want. And programmers really hate undefined behavior because they compile it on one compiler and they get some behavior and they think that's what's expected to happen.
And then they upgrade their compiler and suddenly their code doesn't work and they say, your compiler, it has bugs in it. And you say, no, no, undefined behavior. Ha ha. So I wanted to give you a little example. You can grab the full source code
for this tiny compiler and interpreter at the first address. The second address is just the syntax-highlighted bit that I'm actually going to talk about. On the next few slides, I have some code but to fit things on the slide, I had to omit a few details. So the full version, and the full version differs in one very important way
in that it actually has comments in it, is the second address. And if you have a phone that understands these QR code things, it will send you to the compiler.cc.html thing. So maybe people following at home can do that and it'll be easier for people who can't see the slides
or are bored with listening to me and just want to read the code. Sorry, everyone who wanted to do that had got it, hadn't they? Okay. So rather than take an existing language which maybe you'd be familiar with or maybe you wouldn't, I thought I would make up a language for this talk.
So this is a really simple domain-specific language that is designed for implementing cellular automata. Unfortunately, I picked cellular automata, which I can't say very quickly. So I'm now going to spend the next five or six slides saying...
So a cellular automaton is a... basically a grid containing values and a program runs on each square in the grid and it computes some value based on the current value and based on the values of the neighbors. So because I was really lazy writing the parser,
this language doesn't have named variables. It just has ten local registers, A0 to A9, and ten global registers, G0 to G9. And when the program starts running, all of these are zero. The local registers are reset to zero
for every instance of the... for every square, the global ones are not. And it also has a special register, V, which is initially the value of the cell in the old grid, and then your program runs and it sets the value of V. There are a few simple things.
So we have some simple arithmetic things. And, again, because I'm lazy when it comes to writing parsers, the operator comes first and then there's a register and then there's an expression. So, for example, plus A11 would increment register A1. We have a couple of more interesting things. There is a neighbors statement,
which then is the only flow control statement in this language. And this evaluates the statements in the body once for every neighbor that the program has... that the cell has. So if you run it in a corner, it'll evaluate this three times.
If you run it on an edge, it'll evaluate it five times. And if you run it in the middle, it'll evaluate it nine times. And we have a select expression, which takes the register value and it will return a different value depending on what the register value is,
whether it's a specified value, if it's in a specified range, or if it's not in any of the specified values, it'll be zero. So just a few quick examples of this. This is a simple program which will flash every cell in the grid.
So whatever the initial values are, after the next iteration, it'll be the opposite of this. So this says take the value v, which is the current value. If it's zero, set it to one. Implicitly, if it's set to anything else, set it to zero. And then assign that to v. So a very simple program.
A slightly more complicated one. This neighbor's expression stores the value in the current neighbor in A0, so we're not using A0 there. But this just executes this statement once for every neighbor that this has. So when you run this, you'll get a grid which has three in the corners
and five in the edges and eight in all the cells in the middle. So this counts the number of neighbors you have by adding one to A0 for every neighbor. And then just assigns the value in A1 to v. And a more interesting example, how many people have not heard of Conway's Game of Life?
Okay, so this is probably the most famous cellular automata program. And I wrote an article a couple of years ago where I implemented this in OpenCL, and this was about a page and a half of code. So this language is quite dense for this kind of thing. But again, it counts the number of neighbors that are set to one,
stores that in A0. Then for the current value, if the current value in this cell is zero and it has three neighbors, then it's happy and healthy and it stays alive. Sorry, then they breed and you get a new value here, so it sets it to one.
If this current value is one and you have two or three neighbors, then again it's happy and healthy and it stays alive. If it has only one or zero neighbors, it gets lonely and dies. And if it has more than three neighbors, it starves to death. And so this is actually interesting
because Conway's Game of Life is Turing-complete. You can implement a Turing machine in Conway's Game of Life and you can represent tapes with the cells and have them feeding in. But this language itself is not Turing-complete. So only the fact that you're repeatedly executing this program makes it Turing-complete.
So this language is right on that boundary, which, I don't know, makes it interesting theoretically, maybe not so much practically. And this compiler turns it into an abstract syntax tree representation, which is a simple structure which just starts with an operation ID
and has two children and what those are depends on what the operation is. But typically most of the things just have two values associated with them. And we do a little bit of cheating, so registers and literals are encoded into pointers and this makes the representation a bit denser
and it also means the interpreter can actually be quite fast because loading a register value is just do a shift and then load it from that index. So the compiler is just one C++ file and maybe some of you have looked at this C++ file already.
It uses a few LLVM classes, which I'll go through quickly now. And some parts of this are written in C and are compiled using Clang and are then linked into the program. So I've kind of gone through what some of these things are already. These are the LLVM classes, an LLVM module is a module,
a function is, well, yeah, you get the idea. Global variables are a class that represents, well, yeah, obviously, I guess. The most useful class for people writing front-ends is this IRBuilder class, which is just a helper class
which will generate LLVM instructions for you with just a single function call for each one. The type class I won't deal with very much because this language only has one type, which is a 16-bit integer, so we only touch on that very briefly.
ConstantExpression is the class that's used for building, well, obviously, constant expressions. These last two we'll come to a little bit later. The pass manager builder is the one that we use for building, constructing a sequence of LLVM optimization passes which you then want to run.
An execution engine is the bit that actually does the execution. So the first thing we did in the interpreter, there is a function that looks a lot like this that then calls the interpreter again for each cell. Oh, this says 50 when it should say width and height. This is not a bug in the version that you can see
in the online that is a bug in the slides, so where's the button that works? This should say width and this should say height, sorry. So this function is just written in C and compiled with Clang to bit code, and it has a forward definition of this cell function.
So what will happen at the end of all of this is that we will emit an actual definition of this cell function and then the LLVM optimizer will inline it here. So we don't need to write all of this boilerplate code. We don't need to generate LLVM IR for it. We can just copy and paste from the interpreter and it works.
And I wanted to do it this way around because all the other examples do it the other way around and have you calling helper functions from your IR and then inlining them. But we can do it both ways. We can emit something in LLVM IR that we then call from something else. And you could modify the implementation of this really easily
to, for example, use libdispatch and run all of these in parallel or maybe just run every row in parallel or something like that. And to generate bit code from this, we just add this emitLLVM flag to Clang. By default it emits native code, but if you add this flag, it will emit LLVM bit code.
So this is stuff that you'll see in pretty much any compiler for a small language using LLVM. We're actually cheating a bit when we construct the LLVM module. Rather than constructing one the normal way with constructors and create functions and stuff,
we grab a memory buffer containing the contents of the bit code that we compiled with Clang, the little runtime library, and then just say parse that. So we're actually taking a copy of the code that we compiled with Clang, and rather than linking with it, we're just using this as the skeleton for our new program. Does that make sense to everyone?
Okay, I see slightly confused faces, but not everywhere. So then we, rather than creating a new function, we just get the cell function that we forward declared. And because this is just a stub function, it has no implementation. It doesn't have a basic block. So we do need to create the entry basic block,
and this is just standard... What is this? A factory function. And it goes in the function that we just loaded. The name doesn't really matter. It's just the first one that's added will automatically be the entry one. A lot of these things take this C parameter,
which isn't on the slides. This is an LLVM context, and this is what LLVM uses for re-entrancy. So you can have multiple threads with different LLVM contexts, but you can also just call getGlobalContext, and then you get a shared one if you don't care about having a multi-threaded compiler. And to be fair, most people don't care
about having a multi-threaded compiler. Here we're setting the linkage of the function. So this is private linkage, which means that it won't be exported outside of this module, and that just means that the inliner is more likely to inline it, because it's only called in one place. If we inline it, we get rid of
every single need to call it, so that's useful. This is just creating a cached copy of the type that we use for registers, and then we set the IR builder's insert points to the entry block, and then we're ready to start actually creating bitcode.
So the first thing we need to do for this little language is allocate some space for the registers, and if we go back to the runtime a second, this cell function, this is the one that we're going to be generating, will take the global registers by pointer as an argument,
but it will need to allocate the local registers itself. So we create each of the A registers just with a single allocer, so this takes the type here, so this will create enough space on the stack for a 16-bit integer, and this returns an LLVM value
which represents a pointer to a 16-bit integer on the stack, and we just do this in a loop at the start. We store, I've missed off the bit where we create an allocer for the V as well, but never mind. We store the first argument in V,
and then for the next argument, we do this same thing. We then loop over these and store the value zero, so this is a constant integer of this type value zero in each of the A registers and create one of these for each of the global registers.
So I've sort of brushed over the get thing a bit because it's slightly confusing. It stands for get element pointer, which, yeah, it does exactly what it says. From the earlier example,
we wanted to get a pointer to the first element in an array of characters, so this is just like getting, this is array to pointer decay in C basically, so this is a get element pointer with the value zero. So in this example, we're getting the element pointer
in the array for each index. These do get really horribly complicated later on because they take an arbitrary number of arguments, and so they can peer into structures and into nested structures and structures containing arrays of structures containing arrays of structures. The really important thing to remember about these
is that they don't dereference the pointer, so whenever you actually see one of these in code, it'll be a get element pointer followed by either a load or a store, and the back end will then generate some complex addressing mode instruction, but the get element pointer itself does not actually touch the memory.
Arithmetic statements in this are really easy, so we load the old value, do an add or a multiply or a divide or whatever, and then store the new value, and all of the arithmetic instructions
are just done by this little switch statement which computes the result, so implementing all of the arithmetic operations that this toy language supports, which are, I think, six of them. There's also a min and a max. They're all basically like this, and again, we're not caring here
that LLVM is static single assignment because we're just loading it from the stack, doing the operation, and then storing it. Flow control is a little bit more complicated, so I'm gonna ignore the neighbor's instruction.
If you want to look at that in the code, feel free, but I'll go through the range instruction. So this is creating a phi node which has one predecessor for each of the range statements that we have in this expression, and then for each one in the block, I had to omit the loop
because it didn't quite fit on the slide. We, first of all, get the minimum and maximum values specified for the range as constants because they are constants in the source code, and then we have a little bit of Boolean logic,
so we say if the register value is signed greater than or equal to the minimum value or signed less than or equal to, sorry, and signed less than or equal to the maximum value, this will then just be a Boolean value saying if we've matched this particular case.
Then we create a couple of new basic blocks. We have one for calculating the result expression and one for falling through to the next case, so this is basically just like a switch statement in C but with a little bit more syntactic sugar. It then creates a conditional branch, so if we have matched this, then we jump to this basic block.
If we haven't matched it, then we fall through and try the next one, and then we'll be back here again. Then it sets the insert point in the range result block that we've just created, and then we do something else to actually omit the expression for the result,
but this is just actually calling back to the compile statement function. Then for the phi node, to set its value, we say if we are coming from this block, then it's this value, and then the phi node will have one value for each incoming block,
so this will be in the final thing. I won't go through this in too much detail because you kind of need to understand the IR, but I just want to give you an example that this is a fairly complex flow control operation, and I can more or less fit the code for generating it on the slide.
If you want to look in more detail, the version that you can download has a comment basically on every line which explains how it really works. I'm running a bit short on time, so I'll skip over it a bit. Generating code is really, really simple. You create an execution engine from this module.
This false flag is just say whether we should force it to use the interpreter. We don't want to force it to use the interpreter because we probably already have a fast interpreter. We want it to use the JIT compiler, and this is just a pointer to an error value, and if this fails, this error value will be set to something, and we can print it.
Otherwise, we take the execution engine. We say, give me a pointer to the function with this name, and then we get a function pointer back, and then you can call that from C code. And just a few statistics about this little toy example. The parser was about 400 lines of code,
so that was most of the complexity in this. The interpreter was about 200 lines of code, and it's not the greatest interpreter in the world, but it's, yeah, it's okay. And the compiler was only very slightly more code than the interpreter, so it used to be that if you wanted to write a compiler for your language,
that was 10 or 100 times more effort than writing an interpreter. Now, actually, writing the compiler is really not much more effort than writing an interpreter, and in this case, well, the interpreter is, for the examples I ran, the interpreter is only maybe
seven or eight times slower than the compiler. The reason for that is actually the interpreter is fairly fast for this language. There isn't very much optimization you can do on a language where programs are typically only two or three statements long. For languages where you're writing quite complex code, even if, you know, I say complex, even something that's a dozen, a hundred lines long,
there's more that the optimization can work with. So we can do a few things to improve on that that I haven't done here. Maybe we could improve the quality of the IR we generate. For this example, not so much, because the IR is actually pretty simple
because the source language is simple because I need to explain the language and the compiler implementation in one 45-minute talk. Can LLVM optimize it a bit better? What else can we do? Well, optimizers like having lots of information to work with. So maybe rather than having these width and height variables,
maybe you could emit an inner version that would just work on 16 by 16 blocks, and then the compiler could optimize the neighbor's statement a bit better because it would know how many neighbors you had. Maybe you could do a special case for the sides and the corners, and that would be maybe adding a dozen lines to the compiler,
but it would give you a bit of a speed-up. I used in the example code the pass manager builder, and I just said add the standard set of LLVM optimizations, and no one has really done much effort on working out which ones they should be. They're just ones that someone thought made sense at the time,
and they maybe got tweaked a bit, and they don't do very badly for C and for C++. They work about as well as the GCC ones, but for languages that have different characteristics, maybe altering the order of the passes, adding some extra ones,
maybe switching them around a bit can give you better results. And on the LLVM website, there is a list of passes page, and this will give you about 200 different passes you can do. Maybe look at some of these, see whether they do things that particularly match your source language or don't, and you can write new ones.
There's a tutorial online for writing a new pass. It's pretty simple. You have... Typically, you'll subclass one of these three either modules or function or loop passes depending on the scope of your optimization, and there are lots of the passes that exist now are analysis passes, so they don't transform the IR.
They just collect some information about it, like is this a loop? What else can we do? Is this value aliased? And there are some passes that already exist that are specific to source languages. I'm going to mainly talk about Objective-C because that's mostly what I work on. These automatic reference counting optimizations
are passed by LLVM now. They use LLVM's flow control analysis, and they determine whether reference count manipulation operations are redundant or not, and if they're redundant, they remove them, so it's pretty language-specific, and it can also be library-specific, so if you have a common pattern in some library
that you use a lot that could be made faster, it's really easy to write an LLVM optimization that is specific to your particular framework and just add that to your compilation thing, so, for example, QT's signals and slots mechanism. Someone's looking at speeding that up by taking advantage of the fact
that we can just plug extra optimizations in that are only relevant to QT, and for the GNU step Objective-C runtime, there are a few that do things like inline caching. Yeah, and if you're writing your own compiler, LLVM does the compiler bit,
but some of the runtime stuff maybe you can reuse from other projects. libdispatch gives you cheap concurrency, the BoM garbage collector. It isn't the best in the world, but it is off the shelf, easy to reuse, and the GNU step Objective-C runtime gives you a nice dynamic object model that's really easy to work with,
and just one very final thing before they kick me off the stage. A lot of Clang, which maybe you've come across as a C compiler, is also following this same approach with the rest of LLVM, so it is also really modular, so you can use libClang to parse header files.
You can get the type encodings for those and the names, and you can do FFI without any need for custom header file parsers or for defining how things map from C to your language, and we do this in Pragmatic Smalltalk
where you just send a message to this C pseudo class, and it knows that that's supposed to be a C function call, and so it will check in your headers, find a function that has a name matching that message, and it will just work by magic, and now I'm going to stop speaking before they shout at me.
You have to go to the microphone, and then I can hear what you're saying.
We've got time for maybe one or two questions, but that's it. The earlier question was, are these slides online?
The earlier question was, are these slides online, which you just answered. No, not quite. Are these slides you just presented online? They are not, but they will be soon. I want to annotate them before I put them online. Have you considered having an application-specific language like this
for writing compilers? Yeah. Actually, one of the things that is interesting is if you look at the LVM optimization parsers, a lot of them are doing a sort of pattern matching that is really ad hoc, that really should be done by some pattern matching language. So one thing I'd like to do
is have an optimization parse language. It seems like your code was mostly using the APIs for LLVM to build up something. So if you had a language where there were operators that did precisely that,
it might be easier to write. Another thing that would be interesting is taking something like Clang and just extending it. So you write C code with placeholders and then let you fill in the gaps. Sort of like Bison. I think people are running to their next talk.
Was that the last question?