We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Exploring WebAssembly with Forth (and vice versa)

00:00

Formal Metadata

Title
Exploring WebAssembly with Forth (and vice versa)
Subtitle
Artisanal, minimal, just-in-time compilation for the web and beyond
Title of Series
Number of Parts
542
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Forth is an extremely minimalistic yet powerful language. Its minimalism has historically made it the language of choice to explore and directly interact with the lowest levels of systems, traditionally the hardware. However, you can also use Forth to explore the low levels of the web platform: WebAssembly. In this talk, I’ll dive into the details of WAForth, a tiny but complete Forth interpreter and dynamic compiler for and written in WebAssembly. You’ll see some Forth in action, read hand-written WebAssembly code, get introduced to tools used for working with WebAssembly, hear about JIT compilation for WebAssembly, and learn how you can move all this outside the web platform into native code.
Assembly languageInteractive televisionFormal languageComputer programmingIntegrated development environmentGame controllerSystem programmingFirmwareOpen setGame theoryDrop (liquid)Logical constantSquare numberCondition numberProgrammschleifeLoop (music)Function (mathematics)Right angleInterpreter (computing)NumberReading (process)SpacetimeData dictionaryCompilerAsynchronous Transfer ModeElectric currentCompilerStack (abstract data type)Machine codeConstructor (object-oriented programming)TrailoutputStreaming mediaDisintegrationStandard deviationBinary filePortable communications deviceWeb browserLatent heatFormal languagePortable communications deviceWordInterpreter (computing)Cartesian coordinate systemMaxima and minimaContext awarenessGame controllerCompilerLogical constantReverse engineeringFunctional (mathematics)Web browserOpen setPositional notationToken ringStandard deviationProgrammschleifeCombinational logicCompilerInstallable File SystemMachine codeRegular expressionSpacetimePhysical systemInheritance (object-oriented programming)Symbol tablePower (physics)Stack (abstract data type)ImplementationResultantAsynchronous Transfer ModeCondition numberLoop (music)Computer hardwareProgramming languageInformationBinary codeException handlingHigh-level programming languageNumberAssembly languageLecture/ConferenceComputer animation
WavePhysical systemAssembly languageExtension (kinesiology)Complete metric spaceCore dumpBinary fileUsabilityPhysical systemMachine codeInformationReading (process)WordMaxima and minimaCompilerFunction (mathematics)outputStandard deviationFreewareInterpreter (computing)Set (mathematics)Computer fileComputer animation
Function (mathematics)NumberString (computer science)Cellular automatonStack (abstract data type)System callJava appletScripting languageMachine codeStructural loadSpacetimeMessage passingVideo game consolePhysical systemMachine codeStandard deviationCartesian coordinate systemKeyboard shortcutXMLComputer animation
Video game consoleTurtle graphicsLogical constantDrop (liquid)Loop (music)Magneto-optical driveRandom numberAngleRight angleTurtle graphicsGraphics softwareIntegrated development environmentComputer animation
DemonSquare numberLaptopExtension (kinesiology)Machine codeWebsiteComplex (psychology)Atomic nucleusMathematicsContent (media)Logical constantEntire functionComputer fileLaptopSound effectMachine codeParameter (computer programming)Scripting languageComputer programmingComputer animation
LaptopMachine codeExtension (kinesiology)Loop (music)Data dictionaryBlock (periodic table)GravitationElectric currentCellular automatonInterpreter (computing)File formatAssembly languageBinary fileSequenceAssembly languageMereologyFile formatComputer animation
Block (periodic table)Loop (music)outputStreaming mediaParsingControl flowLogical constantAddress spaceToken ringLocal ringInterpreter (computing)Data dictionaryElectric currentCompilerFile formatAssembly languageBinary fileMachine codeSequenceMachine codeData structureCompilerInterpreter (computing)Asynchronous Transfer ModeFunctional (mathematics)Computer animation
Block (periodic table)Loop (music)outputFederation of Bosnia and HerzegovinaLogical constantToken ringSlosh dynamicsData dictionaryLocal ringInterpreter (computing)File formatAssembly languageBinary fileMachine codeSequenceEmailRevision controlData typeSheaf (mathematics)Function (mathematics)Element (mathematics)Read-only memoryTable (information)FlagCompilerAsynchronous Transfer ModeCompilerModule (mathematics)OpcodeDiscrete element methodPrice indexInheritance (object-oriented programming)Address spacePort scannerControl flowStructural loadParameter (computer programming)BootingTorusPointer (computer programming)Row (database)Electric currentRun time (program lifecycle phase)Raw image formatSystem callMereologyHexagonEmailMachine codeCASE <Informatik>Binary codeCorrespondence (mathematics)Module (mathematics)Pointer (computer programming)TrailComputer filePhysical systemCompilerSemiconductor memoryWordProjective planeMultiplication signSubject indexingFunctional (mathematics)Interpreter (computing)Computer animation
Physical systemInteractive televisionDrop (liquid)GEDCOMPhysical systemGroup actionCompilerDebuggerFunctional (mathematics)Machine codeCompilerComputer animation
Assembly languageMachine codeNetwork topologyStructural loadRead-only memoryInjektivitätProcess (computing)Game controllerCompilerComputer animation
Table (information)Function (mathematics)Assembly languageMachine codeHydraulic jumpSystem callPrice indexData dictionaryMachine codeFunctional (mathematics)Physical systemSemiconductor memorySubject indexingTable (information)WordThread (computing)Hydraulic jumpData structureBranch (computer science)Structured programmingImplementationComputer animation
MeasurementSieve of EratosthenesFunction (mathematics)Prime idealDrop (liquid)Loop (music)BenchmarkOverhead (computing)Web browserAssembly languageNumberRevision controlPrime numberFunctional (mathematics)Run time (program lifecycle phase)BitSieve of EratosthenesMachine codeInterpreter (computing)Multiplication signMachine codeLogical constantPhysical systemHydraulic jumpResultantCASE <Informatik>AlgorithmComputer architectureWeb browserComputer animation
Web browserAssembly languageLine (geometry)ImplementationDifferent (Kate Ryan album)Flow separationPhysical systemMachine codeLine (geometry)Formal languageRevision controlStandard deviationBlack boxComputer animation
BenchmarkWeb browserOverhead (computing)Assembly languageRun time (program lifecycle phase)Revision controlWeb browserBitMultiplication signImplementationGoodness of fitComputer animation
Run time (program lifecycle phase)Assembly languageMachine codeTrailCompilerAsynchronous Transfer ModeComputer virusBinary fileModule (mathematics)Function (mathematics)Fluid staticsBenchmarkWeb browserOverhead (computing)Process (computing)System callDirected setGaussian eliminationModule (mathematics)Set (mathematics)Machine codeCompilerBinary codePhysical systemTable (information)Line (geometry)Computer architectureResultantCompilerComputer programmingMessage passingImplementationSystem callDirection (geometry)Machine codeStructural loadFormal languageWeb browserComputer animation
Assembly languageLecture/Conference
Web pageMachine codeElectric generatorLecture/Conference
Level (video gaming)Table (information)CompilerFunctional (mathematics)Formal languageLecture/Conference
Binary codeCompilerBenchmarkLecture/Conference
Program flowchart
Transcript: English(auto-generated)
Right, so welcome. My name is Remko and I'm here to talk about two very undeclarative but very minimal and hopefully useful languages. So the first one is FORTH. FORTH is a very
minimal programming language that's been around since the 70s. It's had mostly applications in low-level contexts such as embedded systems, spacecraft controllers and so on, but it's had some other applications as well. If you look at FORTH, the most obvious thing to notice is that it's stack-based. So it uses a reverse polish notation where
you first put something on the stack and then you call a function. But other than that, it looks like a regular high-level language with syntax for constant variables, for comments, syntax for function definitions, loops and conditions and so on. But actually, that's an illusion. FORTH has almost no syntax. So FORTH executes through a very simple interpreter
loop. So what it does is it reads something up until the next space and then decides, is it a number? I'm going to put it on the stack. Is it something else? Then I assume it's a function, which is called a word in FORTH, and it's going to execute
it. So symbols is just like any normal word, so it's just a function of FORTH. Now the same goes for the colon. Colon starts a new definition of a word. Colon, when it executes, it puts the interpreter into a special mode called compilation mode. Now in this compilation mode, the interpreter still advances token by token, but when
it encounters a number, instead of putting it on the stack, what it does is it generates some code that will put that number on the stack later when this word is executed. Same for another symbol. Instead of calling this function, what it's going to do is it's going to compile some code that will call this function when this word is executed.
Now, sorry. So the same goes actually another, yeah, sorry. So it's going to compile. Now the exception for this is that there is a thing called immediate words.
Immediate words are always executed, even if your interpreter is in compiler mode. So an example of such an immediate word is the opening parenthesis, which starts a comment. And so when it executes, what it will do is it will actually consume all the input.
No, it shouldn't be muted. No. Okay. Another immediate word is the semicolon. So the semicolon is what you see when you end the definition. What this will do is it will put your interpreter back out of compilation mode into interpretation mode.
Other of these immediate words are the loops and ifs and then else. And you can actually create your own immediate words. And as such, extend the compiler because these are executed at compile time. So you extend the compiler and you create your own language. So in summary, FORTH is actually nothing but a very simple interpreter loop with an
integrated compiler. There is no syntax almost to FORTH. Just paste the limited tokens. All the behavior of the language is in the execution of these definitions. And you can actually extend the compiler yourself. This combination of super simplicity and power has actually made FORTH a very attractive language to implement on a new piece
of hardware and a restricted piece of hardware. Typically, these FORTH implementations are targeted hardware assembly. But you can actually do this in any low-level language. Which brings me to the second language of my talk, WebAssembly. So I think everybody here knows WebAssembly.
It's an open standard for portable binary code. Most browsers can execute WebAssembly. Many languages can compile to WebAssembly. So the result is that you can run all these languages in a browser. Although WebAssembly was designed for the web, there's actually nothing
web-specific about WebAssembly. It's just an open standard of portable code. So most of the information you find online about WebAssembly is about how you compile your favorite language to WebAssembly or how you run WebAssembly in your browser. So a few years ago,
I wanted to figure out what was actually under the hood of WebAssembly. And at the same time, I came across FORTH. So what I did was I combined both, hoping that I would learn something about both. So that's why I created WA-FORTH. WA-FORTH is a small FORTH system. It's
completely handwritten in WebAssembly and compiles to WebAssembly. So goals are, WA-FORTH tries to do as much as possible in WebAssembly. Now, the problem is WebAssembly is a portable standard, so you cannot do everything in WebAssembly. For example, it needs to do very few things outside
of WebAssembly. For example, reading or writing a character to the output or reading from the input. WA-FORTH tries to be simple. So it's just one big WebAssembly file handwritten. There are no dependencies, no complex tools. The compiler is very simply written. It still tries to be complete enough to be useful. There's an ANS standard that defines
what a FORTH interpreter needs to implement, the minimal set of words. WA-FORTH implements these and implements a bunch of other words as well. What isn't a goal is speed. So of course, because WA-FORTH is implemented in WebAssembly, you're going to get some speed for free.
But still, the compiler is very naive, so I don't expect it to be very fast. Same goes for binary size of the system. It's written in WebAssembly, so it's going to be naturally very small. In fact, it's about 14 kilobytes compiled in a binary WebAssembly.
However, I'm not doing any code golfing or something like that to keep the system small, because I want to keep it simple. And as most FORTHs are not really known to be very user-friendly, and WA-FORTH is not different, although it does emit some debugging information to make debugging easier, as you will see. So what can you do with WA-FORTH? Well,
you can embed it in any JavaScript application, which means you can run FORTH code inside your JavaScript, and you get bi-directional bindings to the system and back to JavaScript. To illustrate this, I have a few example applications. So the first one is the standard
FORTH console that always exists, where you can interactively execute FORTH code, and you can even interactively compile code and then run this compiled code. So it's a REPL, actually. I also have a small graphical programming environment where you can create some graphics using a
logo-like turtle graphics language, but it uses FORTH. It looks a lot like logo, but it's actually FORTH. And I took this a bit further, and then I created a notebook extension, VS Code extension, to create VS Code notebooks. So these are
actually formatted Markdown files interleaved with runnable code. So you can run this code. This is ideal for tutorials, because you can have the code directly there. You can execute it. You can change some parameters and then see what the effect is by rerunning the program. Now, because this is just WebAssembly and it's just a very small system, there's also a script
that converts these notebooks into a standalone, small standalone HTML file with all the functionality, but you don't actually need VS Code anymore to run it. Now, let's have a look under the hood. Like most assembly formats, WebAssembly has a text-based format, which is
much easier to read than the binary format for humans. So this text-based format is based on S expressions, so it looks a lot like LISP. So this right part here is the entire FORTH interpreter that I described earlier, but it comes straight out of WA-FORTH. And it's
actually quite easy to understand. So first it starts by parsing something, parsing the token, and then it's going to either execute it if it's a function or it's going to compile it if you're in compiler mode. Or if it's a number, then it's going to put it on the stack or it's going to compile it. So this tree-like code structure is then transformed to binary WebAssembly
using a tool from WebIt. WebIt is a WebAssembly binary toolkit. This is actually a toolkit with a lot of tools to work with WebAssembly files. It's a very interesting project to look at.
So this is the entire interpreter. The interpreter is actually quite simple. The interesting part is the part where you have to compile something. So you have to compile a call when you're in WebAssembly. Well, somewhere in memory there is a hard-coded binary header
of a WebAssembly module with one function in it. So when a new word definition starts, what happens is some values in this header are reset, and the pointer is initialized to start at the end of the header. So each time the interpreter, this is the piece of the interpreter, needs to compile a call to a function, what it does is it generates some raw binary
WebAssembly hex codes and puts it at the end of the header. So for example, if it needs to do a call, what it does is it generates a hex code for a constant instruction with the index of the function to call and then an indirect call instruction. And so the compiler keeps on adding
binary code to the end of this module. Now once you reach the end of the definition, this code, this binary piece of code, needs to be loaded into the system. So WebAssembly doesn't support anything for this yet, so there's no support for just-in-time compilation, although there are some discussions about it. So what WAForth does is it takes a pointer to this
piece in memory of binary code, and it passes it to the host system, so in this case it's JavaScript. And JavaScript has a small piece of code here running, what it does is it takes this binary, it uses the WebAssembly API to create a new WebAssembly module, and then it
instantiates it. That's all JavaScript has to do. The rest is tracked by WAForth, it keeps track of which module corresponds to which function that it needs to call or compile later on. So here you can see the system in action. So what's happening here is now it's, you started a definition, you start by compiling some things so you're still in
compilation mode, and so it's only when you reach the end of the definition that suddenly you're going to see a new entry in your WebAssembly debugger with a function that has been loaded. So this is the generated WebAssembly code that's been generated by the compiler.
You can get even more control over this compilation process by writing your own WebAssembly inside WAForth. So this is actually, this is again no new syntax, this is just standard WAForth
with some user-defined words, and there's a direct one-to-one mapping from this to this, if you can read it but probably can't from there. Our last thing I want to note about implementation detail is that most forts have very efficient execution by using a system they call Threaded Code. So Threaded Code is actually doing jump instructions all over the place
using values that come from memory or from registers. Now this is something you can do in WebAssembly. WebAssembly only allows structured jumps, so WebAssembly is actually a structured programming language. What WebAssembly does have is function tables, so these are dynamic
tables where you can put functions in, function references in, and then it comes with a special instruction where you can say jump to the function at this index. This is a system that WAForth uses for calling the words. Now the downside is that this is a very inefficient system compared
to direct calls or jumps. So I said that speed wasn't really a goal for WAForth, but it's still interesting to get some idea of ballpark numbers of speed and size involved. So I did some very unscientific thing and I took an algorithm, in this case the sieve algorithm, to compute prime
numbers. I took a forth implementation, ported it to JavaScript C WebAssembly, and then ran it a few times and see what the result was. Again, this is not a very representative benchmark, but it's just here to get a feel for some numbers. So if you look at the execution times,
WAForth is about 10 times faster than a JavaScript forth version. This is to be expected. JavaScript forth versions do pure interpretation. WAForth uses compilation, so there's no surprise there. What is a bit surprising is that G forth, which is a native forth, is not much faster than WAForth. I have no idea why this is. I'm suspicious about this
result. Maybe it's because I'm using an architecture that G forth isn't optimized for. JavaScript is 10 times faster than WAForth, which is also normal because WAForth needs to do these constant indirect jumps and JavaScript doesn't have this problem. It doesn't need to do any function calling at all. And then finally, if you have the C version
and you compile it to WebAssembly using MScript, it's about as fast as running the raw WebAssembly and the native version of the algorithm is slightly faster, although you have to say the WebAssembly engine is pretty good at running this code compared to native code.
If we look at the size of the runtime and the code that is executed, the main takeaway here is that WAForth is actually a very small system. It's like about 15K, but you need a complete browser to run it. That's, of course, huge to run. The question is, can we improve this situation?
So, actually, there are several standalone implementations of WebAssembly in different languages. For example, WebIt has a reference implementation in C++. There's WasmTime, which is security-focused and speed-focused in Rust, but there are several others.
But these only do the WebAssembly part, so there's still this small piece of code, these small pieces that are outside of the system that you need to call out to. If you wanted to use all these engines and try this out and create a standalone version,
you would need to write this little piece of code in all these languages against all these APIs. Luckily, there's something called the WebAssembly C API, and this is a standardized black box API that most of these systems implement. So, actually, the only thing you have to do is write these 200 lines of
implementation dependency, and then I could drop in any engine I wanted and then have a standalone version of my system. Now, if we look at the same benchmark again, we can see that speed-wise, WebIt is about 100 times slower than the browser version,
which is normal. I mean, this version in WebIt, that's a reference implementation. It's very naive. It just does what it needs to do to be functional. What is a bit weird is that once in a time, which is supposed to be fast, it's still about 10 times faster than the browser version, and there is no good reason for this, so I don't know why this is. I haven't tried
other engines yet. Now, if you look at size, you see that if you use a relatively optimizing system, you still have 90 megabytes, which is a lot smaller than a browser, but still, if you have a system of about 15K, this is still big. Can we do something about this? Well, you need the WebAssembly runtime to be able to run
your fourth code and to compile your code and load it, but typically, most programs, once you did the first pass and you did all the compilation necessary, you no longer need a compiler if you want to run the program again, so you can do some ahead-of-time compiling, and
this is where WA4C comes in. So what it does is it takes your fourth program, it uses WA4C to run your program once, and at the end of the cycle, it's going to look at all the modules that you created. It's going to combine them all, combine the final state, and then create one big WebAssembly module out of this. Now, it's going to take this big module and then
use another tool from Webit. Webit is really a cool tool set. It's going to use another tool from Webit called Wasm2C to transform this big module into C, and then it's going to use your host compiler to create a native executable.
So the end result is that you have a fourth code to native compiler, and your native binary is your fourth code with the rest of the fourth system still in there, but the compiler left out. And the cool thing is that, because this is all platform-independent stuff up until the native compiler, you can actually do cross-compiling
easily. So you can just do cross-compiling from fourth to any architecture you want. And all this code is about 500 lines and uses a lot of stuff from Webit, actually, and Webit is the only dependency here. So if you look at our final table of benchmarks, we see that the speed
is slightly better than it was before in the browser version, and the binary is becoming a lot smaller. So the entire system is only about 116k in the end of native code. Now, there's still room for improvement here. So what Webit4C does is it just throws together all these modules and then generates the big module. Now, this big module, there are no
cross-module calls anymore. So what you could do is actually do some post-processing. You could change all these indirect calls into direct calls, which could speed up a lot, because the calls are really the bottleneck here. Another thing you could do is throw away
code that you don't need. So in conclusion, this was a very fast talk. I could only touch upon things very briefly. And what I did was I used fourth to explore low-level language implementation in WebAssembly. Now, because fourth is so minimal, I was able to keep things
very simple, try out a lot of things, and go a lot of places. But I think a lot of the things that I've shown here are actually applicable to other languages, so more declarative languages, if you want to compare to WebAssembly. Although I have to say, if you don't know fourth yet, I can really recommend having a look at it, because I find that there's some
interesting philosophies and concepts behind it. Thank you. Questions? It was fast, wasn't it?
Sorry about that. Yeah, I always have been.
Yes. So yes, I, okay, one question.
So the question is, can I reach the same performance? Can you reach the same performance
if you do it in JavaScript? So yes, so the question is, can you do also this code generation in JavaScript? Yes, of course you can. Potentially you can. So typically what you will see is,
the handy part, because I'm working in WebAssembly, is that I have all the WebAssembly low levels at my disposal. So the hard part, if you go to the other languages, is that you're going to need to have something to manipulate these, for example, this function table is very critical. So you need to be able to talk to that and hook into
that. That's going to be the tricky part, but it's definitely possible. But it's easier if you do it directly in WebAssembly. Of course, you would never write a complex language directly in WebAssembly. That's madness. So you can do it with fourth, but I would not recommend it.
It was super cool, though. Thank you. Question? One more question? Yes, I'm interested because you also used the WebAssembly to C compiler. Yes, I used it. Have you checked the regions?
I didn't, no, I used the WebAssembly to C compiler. The performance was quite on par with this. So if I took the C algorithm, it was about, it's a bad, bad benchmark. But the performance was about 10% slower, so it was not much slower than
native binary. So it's C compiled to native and C compiled to WebAssembly. It was only a little bit slower. Of course, you are running in a WebAssembly, you are still running in a virtual machine, right? So the fact that the performance is going to be maybe a little bit slower,
but I thought it was still okay, given that you're still in a VM. Thank you, Remko. We need to solve. Okay. Absolutely amazing.