We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

RaptorJIT: a fast, dynamic systems programming language

00:00

Formal Metadata

Title
RaptorJIT: a fast, dynamic systems programming language
Subtitle
Forking LuaJIT to target heavy-duty server applications
Title of Series
Number of Parts
490
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
RaptorJIT is a Lua implementation suitable for high-performance low-level system programming. With the project scope reduced to the systems programming domain we want RaptorJIT fit one use case and excel at it, and we’re not afraid of radical change. This talk will be about our efforts to reduce the project’s complexity to improve maintain-ablility and pave the way for new features. A story about porting the LuaJIT interpreter from assembly to C, ruthless trade-offs, and ambitious performance targets in an expressive language. Topics include: predictable performance in JIT compilers, always-on profilers, memory safety in low-level programming
Server (computing)Programming languageInterpreter (computing)Multiplication signCompilerPoint (geometry)WordComputer programmingCASE <Informatik>BitLevel (video gaming)DialectHash functionPhysical systemHigh-level programming languageHybrid computerTable (information)Data structurePrototypeCartesian coordinate systemFunctional programmingNumbering schemeGoodness of fitSoftwareOpen sourceSystem programmingCanonical ensembleCentralizer and normalizerRevision controlObject-oriented programmingDynamical systemImplementationSemantics (computer science)Exterior algebraMultiplicationLecture/Conference
Computer programExpressionSystem programmingProgramming languageComputer programmingDialectCompilerImplementationJust-in-Time-CompilerLocal ringMathematicsOverhead (computing)Server (computing)Gaussian eliminationServer (computing)Just-in-Time-CompilerPhysical systemSet (mathematics)System programmingCASE <Informatik>Table (information)Mathematical optimizationSoftware maintenanceMachine codeCompilerString (computer science)ImplementationProgramming languageObject-oriented programmingPrimitive (album)Cartesian coordinate systemProfil (magazine)MathematicsOrder (biology)Personal digital assistantRepetitionForm (programming)DigitizingMatching (graph theory)SummierbarkeitLine (geometry)Instance (computer science)Level (video gaming)Overhead (computing)NumberSelf-organizationWordComputer animation
Machine codeArchitectureInterpreter (computing)Assembly languageEnterprise architectureString (computer science)Programming languageAssembly languageInterpreter (computing)Moment (mathematics)Exception handlingBitType theorySpring (hydrology)Message passingComputer iconMultiplication signMachine codeOrder (biology)Self-organizationCASE <Informatik>Enterprise architectureReduction of orderAsynchronous Transfer ModeVirtual machineSoftwareMemory managementArithmetic meanTracing (software)String (computer science)MathematicsWeightMathematical optimizationSlide rule32-bitWindowTotal S.A.Run time (program lifecycle phase)Computer animation
CoroutineRead-only memoryData bufferResource allocationMathematical optimizationMachine codePersonal digital assistantJust-in-Time-CompilerDrop (liquid)Interpreter (computing)Hacker (term)Observational studyComputer programServer (computing)HeuristicProblemorientierte ProgrammierspracheInteractive televisionOverhead (computing)Set (mathematics)Computer programmingCartesian coordinate systemMultiplication signDrop (liquid)Canonical ensembleDefault (computer science)BitMachine codeCASE <Informatik>Right angleMathematical optimizationWeightCompilerComponent-based software engineeringBuffer solutionDisk read-and-write headBefehlsprozessorLoop (music)Graph (mathematics)DebuggerIntermediate languageTime domainTracing (software)Electronic visual displayMessage passingPrinciple of maximum entropyPairwise comparisonProduct (business)Just-in-Time-CompilerResultantBuffer overflowProfil (magazine)Branch (computer science)ImplementationPoint (geometry)Interpreter (computing)Perturbation theorySelectivity (electronic)HeuristicMemory managementOperating systemMathematicsCausalityOverhead (computing)Scaling (geometry)Run time (program lifecycle phase)Term (mathematics)Source codeAnalytic continuationCountingWordWorkloadInheritance (object-oriented programming)Direction (geometry)Single-precision floating-point formatBasis <Mathematik>Primitive (album)Axiom of choicePower (physics)Row (database)SpeicheradresseStress (mechanics)GUI widgetSystem programmingEnterprise architectureForcing (mathematics)Line (geometry)Descriptive statisticsLimit (category theory)Expected valueData storage deviceRepresentation (politics)Computer animation
Programming languagePrimitive (album)Revision controlPointer (computer programming)Vulnerability (computing)ProgrammschleifeCode generationSymbolic dynamicsJust-in-Time-CompilerRun time (program lifecycle phase)InformationOperations researchRead-only memoryDefault (computer science)Slide ruleLogical constantTable (information)MathematicsConfiguration spaceSemiconductor memoryTracing (software)Vapor barrierJust-in-Time-CompilerLine (geometry)CompilerInformationMathematical optimizationTypsichere SpracheMachine codeLevel (video gaming)Arithmetic meanType theoryRun time (program lifecycle phase)Functional programmingMultiplication signSelf-organizationFood energyMedianMeasurementComputer animation
Just-in-Time-CompilerProjective planeSoftware bugPresentation of a groupCollaborationismSlide ruleInterpreter (computing)MathematicsComputer animation
Computer animation
FacebookPoint cloudOpen source
Transcript: English(auto-generated)
All right. So, ReptiJIT, a fast dynamic systems programming language. So, first off, hi, I'm Max.
I'm an open-source systems hacker. I'm currently dabbling in high-performance networking applications. And I suppose I should start with a quick rundown of Lua just in case you don't know it. Lua is a simple minimalistic high-level language.
It has like a schemish semantics and a Pascal-esque syntax. It has first-class functions, multiple return values, prototype-based object orientation, and it's surprisingly flexible. And its central data structure is the table,
which is a hybrid between a hash map and a sparse array. And the canonical implementation of Lua is called POC Lua, which is a simple embeddable interpreter intended to be embedded in C and C++ applications among others.
So, here's an extended hello world to get you kind of familiar with the language. I'm not going to explain anything about it. I'm just going to let you stare at it a little bit and hope that's somewhat obvious.
So, LuaJIT is an alternative implementation of Lua. It implements a dialect of Lua 5.1 and a half-ish, plus some extra goodies. That's why I'm saying it's a dialect, not just an older version.
As its name suggests, it comes with the Justin-Time compiler. And that compiler is really, really impressive and achieves performance competitive with C. It's also a really good language for expressing programs that are close to the metal, thanks to its built-in language support
for accessing and operating on C data. And I think personally that LuaJIT is a good systems language. Systems language here, I mean that it's a good language to replace C, like an application that you would have previously written in C,
you could also write in LuaJIT, and you would get very far with that. So, here's an example of some LuaJIT code showing off its ability to poke at low-level data. So, you can see here that the language has built-in primitives for expressing the C data, such as C structs. And you can access that data and the fields
as if they were native Lua objects. To illustrate that, for instance, if P here was instead a Lua table with containing a string or containing a C array and a Lua number, the code here wouldn't change at all. You could still access it the same way,
and you could still copy to the C array contained in the Lua table. So, RaptorJIT is a fork of LuaJIT, and its goal is to be a really good systems programming language.
With RaptorJIT, we want to do a couple of things. First, we want to simplify the implementation and improve maintainability. We also want to improve the compiler for heavy duty server applications specifically, so we have a very narrow use case. We want to write systems applications, like with system stackers, and we want to create a systems language,
and we have a more narrow set of optimization targets, and we think that we can improve the compiler even more by targeting this narrow use case. Especially, we want to eliminate performance pitfalls,
meaning small changes that have big impact in performance as well as unexpected JIT behavior. So, we want to make it more easy to understand JIT compiler and to use it, and under the bottom line, we want to provide more reliable performance.
So, right now, LuaJIT performance is great. We just want to make it more reliable and maybe even better in some cases. Right, RaptorJIT also adds new features. Among those is a low-overhead profiler and matching introspection tools for that data. And hopefully, there are many more features to come,
and that's where you come in, I guess. So, in order to simplify and maintain the code, we are doing, oops, sorry. In order to simplify and maintain the code,
we are doing a big spring clean. So, here's this pull request titled, Big Bang, remove all the features that I can live without. And yeah, it says merged. That's the purple icon on the left there. It removes support for all architectures except x86-64, because at the moment, that's the only thing we care about. And it's enough work maintaining one architecture.
It removed Windows support and removed LuaJIT's 32-bit heap mode. And this allowed us to get rid of a lot of if-defs. We're trying to get rid of all if-defs, because we don't like if-defs. And that resulted in a total code reduction of roughly 50%, which I think is a big win.
The LuaJIT interpreter used to be handwritten assembly, duplicated for each specific architecture. And we have almost completed rewriting that virtual machine interpreter in C and hope that that will make it easier to port
and change the language implementation, that is. And the rationale behind this change is that we spend most of our runtime in compiled code, meaning in traces, in compiled traces. So, for our use cases, high-performance networking, spending any significant time in interpreted code
is out of the question anyway, because that would be way too slow. So, for us, an interpreter that's fast doesn't really do anything. We really don't need an interpreter that's heavily optimized. So, our idea is that, look, we're gonna make it easier to maintain and we're gonna skip the optimizations
that we don't benefit from. And instead, we want to make it more easier for your code to stay compiled and not fall back into the interpreter. So, we're also looking to remove complex optimizations
that don't carry their own weight anymore. So, here we removed a special case fast pass for string interning. And on the next slide here, I'm paraphrasing from the pull request, which somewhat reads, this fast pass, in air quotes, was bad because it was a tricky custom mem compare routine
that needs to be maintained. It turned out to be slower than the stock mem compare on modern x86, which was the slow path. And it led to confusing performance behavior where unrelated memory allocation could bias often use buffer to the fast pass, again, the fast pass, and impact overall performance.
So, in the description, Luke concludes that the fast pass code was written 10 years ago and a lot has happened since then. And he goes on into how the CPU architecture, the operating system that is Linux in our case,
and even compilers like GCC had really evolved in that time, and goes on to say that I think the optimization had simply bitrotted. So, what I want to stress here is, five? Okay. What I want to stress here is that this is not to bash this individual optimization or anything. It's just to say that if you have optimizations
that are kind of smart and try to outperform certain components in your system, you have to account for the cost of maintaining them and the work of continuously verifying that they still actually work and still actually make your program faster or just dead weight.
Okay, so I'm running a little bit out of time here. Right. So, Raptor JIT wants to improve the JIT compiler. To understand the setting here, I guess I should explain that Lua JIT acts as a drop-in replacement for POC Lua. So, it has a really fast JIT compiler,
and it also has a freely fast interpreter. So, if the JIT compiler for some reason fails to compile a code path, it will drop into an interpreter that is still many times slower, sorry, many times faster than the default Lua implementation, canonical Lua implementation. So, in any case, if you have a Lua program that you used with POC Lua, you're gonna have
big performance improvements if you run it with Lua JIT, and so you're gonna be happy. And we think that we can do better for a narrow set of use cases with Raptor JIT. So, to summarize here, the Lua JIT is for existing POC Lua code bases, and Raptor JIT is for new Lua applications written with a just-in-time compiler in mind.
So, in case you don't know how a tracing just-in-time compiler works, I will give you a super-reduced explanation. A tracing JIT interprets code. When it hits a branch, it checks if it's hot or not. If it's not hot, it increments a hot count for that branch in continuous interpretation.
However, when the branch is hot, it starts recording a trace, and then, again, continuous interpretation after recording a trace, and the next time it would hit a branch, which is not shown here, because it's very simplified, eventually, instead of hitting hot branches, you would start hitting, executing traces of hot code.
So, with Raptor JIT, instead of treating the compiler like a special source that you throw into your Lua programs to make them magically faster, we want to foster a culture of understanding the JIT compiler, and with that understanding, we want to formulate
design goals and implement them, and this should, in turn, again, make the JIT compiler easier to understand and, most importantly, easier to leverage. And the issue referred below here talks about avoiding high-impact medium generality optimizations. I touched on this before. So, a high-impact medium generality optimization is, we call problems this term
if they have the following behavior. So, if you make a small change to your program and you get a big change to performance, that's what we call a high-impact medium generality optimization, because it's not general enough for you to freely make changes to your program that are relevant in scale and scope,
but it's high-impact, because you get a big hit if you fall off that optimized path. And that is how we want to, that is why we want to avoid unreliable compiler behavior where small changes to your program cause big changes to performance.
So, LuaJIT aggressively blacklists code paths that face the compile. That favors short-running programs. Our programs, the programs that we target, are all long-running, so just as a policy change, we spend way more time trying to find good traces, and we are good with that. We just want to make sure that we find the best traces for the program, because we know that it's gonna run for a long time.
Yeah, as a first step in that direction, we updated the just-in-time compiler's heuristics for trace selection to match our target workloads, and yeah, not gonna go much into that.
On another note, LuaJIT doesn't actually consider the time domain when selecting traces, so a branch could become hot because it was executed for the hundredth time after an hour of runtime. So, the hotness is not actually frequency, but rather this abstract idea of some counter that at some point overflows.
And I think that's really unintuitive behavior. I think that maybe replicaJIT should consider the time domain and only compile code which is actually executed frequently. We did some experience with that. The results so far were positive, but that's just to show you the kind of hacking that we're doing on that fork.
Right, we also added new features. We added a low overhead profiler. The intention is to have that profiler always turned on. It will always collect profiling data, even in your production applications, which you can then grab while it's running and display in this front end that we wrote to find which traces take the most time and what's the problem and see how they got created.
This is to show that the tool that we wrote is a very visual tool, so this is a dependency graph of the intermediate representation instructions where you can see, ah, this is the loop body and this is the head that's executed before entering the loop. Yeah, there's lots of experimentation going on.
We've experimented with a trace barrier primitive that kind of stops traces from going over that line of code and forces a new trace to be created, if at all. We like that one. We experimented with a JIT unlikely primitive which too long to explain now. We didn't like it and one thing I guess I want to touch on is that this JIT seal primitive,
what this will basically allow you to do is declare or constify a table at run time and that would kind of give the compiler superpower where it could treat tables created after configuration changes as constant and optimized based on those contents.
I've said just a few minutes left so if you want to take some questions. All right. Yeah, yeah, I guess this one's important. I just have to mention it like in a minute. This is something that Luo Vela has apparently already implemented and something we're really interested in because it removes another high impact medium generality optimization. I guess check out the slides
if you want to know more about that. Right, there's general JIT compilers, there's new literature, new science happening that really fixes some basic things. We could implement those things and we're open to experiment with that. And yeah, one of my personal goals would be to have safe foreign function memory access
where because all the information for foreign types, meaning C data, low level data, we have all the type information for that available at run time. So there's like nothing that really stands in between us making that type safe and the compiler's really good at optimizing these checks so that's something that I would like to do. Right, so thank you for your attention.
Please get involved. We are on GitHub and yeah. I would, sorry. Yeah, I would be pleased if I got some of you interested
in this project but also if you like into JIT hacking. I think this is a cool place to start and yeah, if you have any questions please. Yes? No, I don't think so. I mean maybe some bug fixes but I'm not aware.
Yeah, I mean we have a very, very specific goal and we are willing to separate and split. Like we want to cooperate with the other folks and exchange ideas but we're really open to just like going wild on it.
Anybody else? Yes? Yeah. I think so. So when I started working with the RaptorNet project I actually wasn't aware of Luavella but I just recently re-read some of the presentations and slide and to me it seems
that they are very similar in spirit and they have some very specific features that they both want the same thing and I think, I hope that in the future there will be like a strong collaboration between these two projects because some things that we want are already implemented in Luavella and maybe the other way around even. I think that Luavella wants, for example, to have also try out the C interpreter, stuff like that, yeah.
Anyone else? All right, thank you very much. I think the microphone's on. Yes.