Exploring WebAssembly with Forth (and vice versa)
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Untertitel |
| |
Serientitel | ||
Anzahl der Teile | 542 | |
Autor | ||
Lizenz | CC-Namensnennung 2.0 Belgien: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/61786 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
FOSDEM 2023492 / 542
2
5
10
14
15
16
22
24
27
29
31
36
43
48
56
63
74
78
83
87
89
95
96
99
104
106
107
117
119
121
122
125
126
128
130
132
134
135
136
141
143
146
148
152
155
157
159
161
165
166
168
170
173
176
180
181
185
191
194
196
197
198
199
206
207
209
210
211
212
216
219
220
227
228
229
231
232
233
236
250
252
256
258
260
263
264
267
271
273
275
276
278
282
286
292
293
298
299
300
302
312
316
321
322
324
339
341
342
343
344
351
352
354
355
356
357
359
369
370
372
373
376
378
379
380
382
383
387
390
394
395
401
405
406
410
411
413
415
416
421
426
430
437
438
440
441
443
444
445
446
448
449
450
451
458
464
468
472
475
476
479
481
493
494
498
499
502
509
513
516
517
520
522
524
525
531
534
535
537
538
541
00:00
AssemblerInteraktives FernsehenFormale SpracheProgrammierungProgrammierumgebungGamecontrollerSystemprogrammierungFirmwareOffene MengeSpieltheorieTropfenKonstanteQuadratzahlKonditionszahlProgrammschleifeLoopFunktion <Mathematik>Rechter WinkelInterpretiererZahlenbereichLesen <Datenverarbeitung>Minkowski-MetrikData DictionaryÜbersetzer <Informatik>ATMStrom <Mathematik>CompilerKeller <Informatik>MaschinencodeKonstruktor <Informatik>Weg <Topologie>Ein-AusgabeStreaming <Kommunikationstechnik>Desintegration <Mathematik>StandardabweichungBinärdatenMobiles EndgerätBrowserUmwandlungsenthalpieFormale SpracheMobiles EndgerätWort <Informatik>InterpretiererKartesische KoordinatenLokales MinimumKontextbezogenes SystemGamecontrollerCompilerKonstanteReverse EngineeringFunktionalBrowserOffene MengeZahlensystemToken-RingStandardabweichungProgrammschleifeSchaltnetzÜbersetzer <Informatik>DateisystemMaschinencodeRegulärer AusdruckMinkowski-MetrikPhysikalisches SystemVererbungshierarchieSymboltabelleLeistung <Physik>Keller <Informatik>ImplementierungResultanteATMKonditionszahlLoopHardwareProgrammierspracheInformationBinärcodeAusnahmebehandlungHöhere ProgrammierspracheZahlenbereichAssemblerVorlesung/KonferenzComputeranimation
04:53
WellenlehrePhysikalisches SystemAssemblerMaßerweiterungVollständigkeitSpeicherabzugBinärdatenBenutzerfreundlichkeitPhysikalisches SystemMaschinencodeInformationLesen <Datenverarbeitung>Wort <Informatik>Lokales MinimumCompilerFunktion <Mathematik>Ein-AusgabeStandardabweichungFreewareInterpretiererSchnittmengeElektronische PublikationComputeranimation
06:44
Funktion <Mathematik>ZahlenbereichZeichenketteZellularer AutomatKeller <Informatik>SystemaufrufAppletSkriptspracheMaschinencodeLastMinkowski-MetrikMessage-PassingSpielkonsolePhysikalisches SystemMaschinencodeStandardabweichungKartesische KoordinatenSchnelltasteXMLComputeranimation
07:13
SpielkonsoleTurtle <Informatik>KonstanteTropfenLoopMagnetooptischer SpeicherZufallszahlenWinkelRechter WinkelTurtle <Informatik>BildbearbeitungsprogrammProgrammierumgebungComputeranimation
07:34
Dämon <Informatik>QuadratzahlNotebook-ComputerMaßerweiterungMaschinencodeWeb SiteKomplex <Algebra>Kernmodell <Mengenlehre>MathematikInhalt <Mathematik>KonstanteGanze FunktionElektronische PublikationNotebook-ComputerSoundverarbeitungMaschinencodeParametersystemSkriptspracheProgrammierungComputeranimation
08:12
Notebook-ComputerMaschinencodeMaßerweiterungLoopData Dictionaryp-BlockGravitationStrom <Mathematik>Zellularer AutomatInterpretiererDateiformatAssemblerBinärdatenFolge <Mathematik>AssemblerMereologieDateiformatComputeranimation
08:31
p-BlockLoopEin-AusgabeStreaming <Kommunikationstechnik>Syntaktische AnalyseKontrollstrukturKonstanteAdressraumToken-RingStellenringInterpretiererData DictionaryStrom <Mathematik>Übersetzer <Informatik>DateiformatAssemblerBinärdatenMaschinencodeFolge <Mathematik>MaschinencodeDatenstrukturCompilerInterpretiererATMFunktionalComputeranimation
09:04
p-BlockLoopEin-AusgabeFöderiertes SystemKonstanteToken-RingSchwappende FlüssigkeitData DictionaryStellenringInterpretiererDateiformatAssemblerBinärdatenMaschinencodeFolge <Mathematik>E-MailVersionsverwaltungDatentypGarbentheorieFunktion <Mathematik>Element <Gruppentheorie>ROM <Informatik>TabelleFahne <Mathematik>CompilerATMÜbersetzer <Informatik>BimodulBefehlscodeDiskrete-Elemente-MethodeIndexberechnungVererbungshierarchieAdressraumPortscannerKontrollstrukturLastParametersystemBootenTorusZeiger <Informatik>DatensatzStrom <Mathematik>LaufzeitfehlerRohdatenSystemaufrufMereologieSechseckE-MailMaschinencodeCASE <Informatik>BinärcodeMultifunktionBimodulZeiger <Informatik>Weg <Topologie>Elektronische PublikationPhysikalisches SystemCompilerHalbleiterspeicherWort <Informatik>Projektive EbeneMultiplikationsoperatorAutomatische IndexierungFunktionalInterpretiererComputeranimation
11:14
Physikalisches SystemInteraktives FernsehenTropfenGEDCOMPhysikalisches SystemGruppenoperationÜbersetzer <Informatik>DebuggingFunktionalMaschinencodeCompilerComputeranimation
11:46
AssemblerMaschinencodeTopologieLastROM <Informatik>InjektivitätProzess <Informatik>GamecontrollerÜbersetzer <Informatik>Computeranimation
12:10
TabelleFunktion <Mathematik>AssemblerMaschinencodeWechselsprungSystemaufrufIndexberechnungData DictionaryMaschinencodeFunktionalPhysikalisches SystemHalbleiterspeicherAutomatische IndexierungTabelleWort <Informatik>ThreadWechselsprungDatenstrukturVerzweigendes ProgrammStrukturierte ProgrammierungImplementierungComputeranimation
13:06
EinflussgrößeSiebmethodeFunktion <Mathematik>PrimidealTropfenLoopBenchmarkOverhead <Kommunikationstechnik>BrowserAssemblerZahlenbereichVersionsverwaltungPrimzahlFunktionalLaufzeitfehlerBitSiebmethodeMaschinencodeInterpretiererMultiplikationsoperatorMaschinenspracheKonstantePhysikalisches SystemWechselsprungResultanteCASE <Informatik>AlgorithmusComputerarchitekturBrowserComputeranimation
15:05
BrowserAssemblerGeradeImplementierungDifferenteGrenzschichtablösungPhysikalisches SystemMaschinencodeGeradeFormale SpracheVersionsverwaltungStandardabweichungBlackboxComputeranimation
16:10
BenchmarkBrowserOverhead <Kommunikationstechnik>AssemblerLaufzeitfehlerVersionsverwaltungBrowserBitMultiplikationsoperatorImplementierungGüte der AnpassungComputeranimation
17:02
LaufzeitfehlerAssemblerMaschinencodeWeg <Topologie>CompilerATMComputervirusBinärdatenBimodulFunktion <Mathematik>HydrostatikBenchmarkBrowserOverhead <Kommunikationstechnik>Prozess <Informatik>SystemaufrufGerichtete MengeEliminationsverfahrenBimodulSchnittmengeMaschinencodeCompilerBinärcodePhysikalisches SystemTabelleGeradeComputerarchitekturResultanteÜbersetzer <Informatik>ProgrammierungMessage-PassingImplementierungSystemaufrufRichtungMaschinenspracheLastFormale SpracheBrowserComputeranimation
20:17
AssemblerVorlesung/Konferenz
21:05
Web-SeiteMaschinencodeGenerator <Informatik>Vorlesung/Konferenz
22:05
MAPTabelleCompilerFunktionalFormale SpracheVorlesung/Konferenz
22:50
BinärcodeCompilerBenchmarkVorlesung/Konferenz
23:49
Flussdiagramm
Transkript: Englisch(automatisch erzeugt)
00:05
Right, so welcome. My name is Remko and I'm here to talk about two very undeclarative but very minimal and hopefully useful languages. So the first one is FORTH. FORTH is a very
00:20
minimal programming language that's been around since the 70s. It's had mostly applications in low-level contexts such as embedded systems, spacecraft controllers and so on, but it's had some other applications as well. If you look at FORTH, the most obvious thing to notice is that it's stack-based. So it uses a reverse polish notation where
00:43
you first put something on the stack and then you call a function. But other than that, it looks like a regular high-level language with syntax for constant variables, for comments, syntax for function definitions, loops and conditions and so on. But actually, that's an illusion. FORTH has almost no syntax. So FORTH executes through a very simple interpreter
01:07
loop. So what it does is it reads something up until the next space and then decides, is it a number? I'm going to put it on the stack. Is it something else? Then I assume it's a function, which is called a word in FORTH, and it's going to execute
01:21
it. So symbols is just like any normal word, so it's just a function of FORTH. Now the same goes for the colon. Colon starts a new definition of a word. Colon, when it executes, it puts the interpreter into a special mode called compilation mode. Now in this compilation mode, the interpreter still advances token by token, but when
01:45
it encounters a number, instead of putting it on the stack, what it does is it generates some code that will put that number on the stack later when this word is executed. Same for another symbol. Instead of calling this function, what it's going to do is it's going to compile some code that will call this function when this word is executed.
02:06
Now, sorry. So the same goes actually another, yeah, sorry. So it's going to compile. Now the exception for this is that there is a thing called immediate words.
02:25
Immediate words are always executed, even if your interpreter is in compiler mode. So an example of such an immediate word is the opening parenthesis, which starts a comment. And so when it executes, what it will do is it will actually consume all the input.
02:43
No, it shouldn't be muted. No. Okay. Another immediate word is the semicolon. So the semicolon is what you see when you end the definition. What this will do is it will put your interpreter back out of compilation mode into interpretation mode.
03:00
Other of these immediate words are the loops and ifs and then else. And you can actually create your own immediate words. And as such, extend the compiler because these are executed at compile time. So you extend the compiler and you create your own language. So in summary, FORTH is actually nothing but a very simple interpreter loop with an
03:21
integrated compiler. There is no syntax almost to FORTH. Just paste the limited tokens. All the behavior of the language is in the execution of these definitions. And you can actually extend the compiler yourself. This combination of super simplicity and power has actually made FORTH a very attractive language to implement on a new piece
03:46
of hardware and a restricted piece of hardware. Typically, these FORTH implementations are targeted hardware assembly. But you can actually do this in any low-level language. Which brings me to the second language of my talk, WebAssembly. So I think everybody here knows WebAssembly.
04:05
It's an open standard for portable binary code. Most browsers can execute WebAssembly. Many languages can compile to WebAssembly. So the result is that you can run all these languages in a browser. Although WebAssembly was designed for the web, there's actually nothing
04:23
web-specific about WebAssembly. It's just an open standard of portable code. So most of the information you find online about WebAssembly is about how you compile your favorite language to WebAssembly or how you run WebAssembly in your browser. So a few years ago,
04:41
I wanted to figure out what was actually under the hood of WebAssembly. And at the same time, I came across FORTH. So what I did was I combined both, hoping that I would learn something about both. So that's why I created WA-FORTH. WA-FORTH is a small FORTH system. It's
05:01
completely handwritten in WebAssembly and compiles to WebAssembly. So goals are, WA-FORTH tries to do as much as possible in WebAssembly. Now, the problem is WebAssembly is a portable standard, so you cannot do everything in WebAssembly. For example, it needs to do very few things outside
05:21
of WebAssembly. For example, reading or writing a character to the output or reading from the input. WA-FORTH tries to be simple. So it's just one big WebAssembly file handwritten. There are no dependencies, no complex tools. The compiler is very simply written. It still tries to be complete enough to be useful. There's an ANS standard that defines
05:45
what a FORTH interpreter needs to implement, the minimal set of words. WA-FORTH implements these and implements a bunch of other words as well. What isn't a goal is speed. So of course, because WA-FORTH is implemented in WebAssembly, you're going to get some speed for free.
06:04
But still, the compiler is very naive, so I don't expect it to be very fast. Same goes for binary size of the system. It's written in WebAssembly, so it's going to be naturally very small. In fact, it's about 14 kilobytes compiled in a binary WebAssembly.
06:21
However, I'm not doing any code golfing or something like that to keep the system small, because I want to keep it simple. And as most FORTHs are not really known to be very user-friendly, and WA-FORTH is not different, although it does emit some debugging information to make debugging easier, as you will see. So what can you do with WA-FORTH? Well,
06:46
you can embed it in any JavaScript application, which means you can run FORTH code inside your JavaScript, and you get bi-directional bindings to the system and back to JavaScript. To illustrate this, I have a few example applications. So the first one is the standard
07:03
FORTH console that always exists, where you can interactively execute FORTH code, and you can even interactively compile code and then run this compiled code. So it's a REPL, actually. I also have a small graphical programming environment where you can create some graphics using a
07:26
logo-like turtle graphics language, but it uses FORTH. It looks a lot like logo, but it's actually FORTH. And I took this a bit further, and then I created a notebook extension, VS Code extension, to create VS Code notebooks. So these are
07:42
actually formatted Markdown files interleaved with runnable code. So you can run this code. This is ideal for tutorials, because you can have the code directly there. You can execute it. You can change some parameters and then see what the effect is by rerunning the program. Now, because this is just WebAssembly and it's just a very small system, there's also a script
08:04
that converts these notebooks into a standalone, small standalone HTML file with all the functionality, but you don't actually need VS Code anymore to run it. Now, let's have a look under the hood. Like most assembly formats, WebAssembly has a text-based format, which is
08:25
much easier to read than the binary format for humans. So this text-based format is based on S expressions, so it looks a lot like LISP. So this right part here is the entire FORTH interpreter that I described earlier, but it comes straight out of WA-FORTH. And it's
08:45
actually quite easy to understand. So first it starts by parsing something, parsing the token, and then it's going to either execute it if it's a function or it's going to compile it if you're in compiler mode. Or if it's a number, then it's going to put it on the stack or it's going to compile it. So this tree-like code structure is then transformed to binary WebAssembly
09:07
using a tool from WebIt. WebIt is a WebAssembly binary toolkit. This is actually a toolkit with a lot of tools to work with WebAssembly files. It's a very interesting project to look at.
09:24
So this is the entire interpreter. The interpreter is actually quite simple. The interesting part is the part where you have to compile something. So you have to compile a call when you're in WebAssembly. Well, somewhere in memory there is a hard-coded binary header
09:41
of a WebAssembly module with one function in it. So when a new word definition starts, what happens is some values in this header are reset, and the pointer is initialized to start at the end of the header. So each time the interpreter, this is the piece of the interpreter, needs to compile a call to a function, what it does is it generates some raw binary
10:07
WebAssembly hex codes and puts it at the end of the header. So for example, if it needs to do a call, what it does is it generates a hex code for a constant instruction with the index of the function to call and then an indirect call instruction. And so the compiler keeps on adding
10:23
binary code to the end of this module. Now once you reach the end of the definition, this code, this binary piece of code, needs to be loaded into the system. So WebAssembly doesn't support anything for this yet, so there's no support for just-in-time compilation, although there are some discussions about it. So what WAForth does is it takes a pointer to this
10:45
piece in memory of binary code, and it passes it to the host system, so in this case it's JavaScript. And JavaScript has a small piece of code here running, what it does is it takes this binary, it uses the WebAssembly API to create a new WebAssembly module, and then it
11:01
instantiates it. That's all JavaScript has to do. The rest is tracked by WAForth, it keeps track of which module corresponds to which function that it needs to call or compile later on. So here you can see the system in action. So what's happening here is now it's, you started a definition, you start by compiling some things so you're still in
11:24
compilation mode, and so it's only when you reach the end of the definition that suddenly you're going to see a new entry in your WebAssembly debugger with a function that has been loaded. So this is the generated WebAssembly code that's been generated by the compiler.
11:48
You can get even more control over this compilation process by writing your own WebAssembly inside WAForth. So this is actually, this is again no new syntax, this is just standard WAForth
12:01
with some user-defined words, and there's a direct one-to-one mapping from this to this, if you can read it but probably can't from there. Our last thing I want to note about implementation detail is that most forts have very efficient execution by using a system they call Threaded Code. So Threaded Code is actually doing jump instructions all over the place
12:24
using values that come from memory or from registers. Now this is something you can do in WebAssembly. WebAssembly only allows structured jumps, so WebAssembly is actually a structured programming language. What WebAssembly does have is function tables, so these are dynamic
12:42
tables where you can put functions in, function references in, and then it comes with a special instruction where you can say jump to the function at this index. This is a system that WAForth uses for calling the words. Now the downside is that this is a very inefficient system compared
13:02
to direct calls or jumps. So I said that speed wasn't really a goal for WAForth, but it's still interesting to get some idea of ballpark numbers of speed and size involved. So I did some very unscientific thing and I took an algorithm, in this case the sieve algorithm, to compute prime
13:24
numbers. I took a forth implementation, ported it to JavaScript C WebAssembly, and then ran it a few times and see what the result was. Again, this is not a very representative benchmark, but it's just here to get a feel for some numbers. So if you look at the execution times,
13:43
WAForth is about 10 times faster than a JavaScript forth version. This is to be expected. JavaScript forth versions do pure interpretation. WAForth uses compilation, so there's no surprise there. What is a bit surprising is that G forth, which is a native forth, is not much faster than WAForth. I have no idea why this is. I'm suspicious about this
14:04
result. Maybe it's because I'm using an architecture that G forth isn't optimized for. JavaScript is 10 times faster than WAForth, which is also normal because WAForth needs to do these constant indirect jumps and JavaScript doesn't have this problem. It doesn't need to do any function calling at all. And then finally, if you have the C version
14:26
and you compile it to WebAssembly using MScript, it's about as fast as running the raw WebAssembly and the native version of the algorithm is slightly faster, although you have to say the WebAssembly engine is pretty good at running this code compared to native code.
14:41
If we look at the size of the runtime and the code that is executed, the main takeaway here is that WAForth is actually a very small system. It's like about 15K, but you need a complete browser to run it. That's, of course, huge to run. The question is, can we improve this situation?
15:08
So, actually, there are several standalone implementations of WebAssembly in different languages. For example, WebIt has a reference implementation in C++. There's WasmTime, which is security-focused and speed-focused in Rust, but there are several others.
15:26
But these only do the WebAssembly part, so there's still this small piece of code, these small pieces that are outside of the system that you need to call out to. If you wanted to use all these engines and try this out and create a standalone version,
15:40
you would need to write this little piece of code in all these languages against all these APIs. Luckily, there's something called the WebAssembly C API, and this is a standardized black box API that most of these systems implement. So, actually, the only thing you have to do is write these 200 lines of
16:02
implementation dependency, and then I could drop in any engine I wanted and then have a standalone version of my system. Now, if we look at the same benchmark again, we can see that speed-wise, WebIt is about 100 times slower than the browser version,
16:20
which is normal. I mean, this version in WebIt, that's a reference implementation. It's very naive. It just does what it needs to do to be functional. What is a bit weird is that once in a time, which is supposed to be fast, it's still about 10 times faster than the browser version, and there is no good reason for this, so I don't know why this is. I haven't tried
16:41
other engines yet. Now, if you look at size, you see that if you use a relatively optimizing system, you still have 90 megabytes, which is a lot smaller than a browser, but still, if you have a system of about 15K, this is still big. Can we do something about this? Well, you need the WebAssembly runtime to be able to run
17:07
your fourth code and to compile your code and load it, but typically, most programs, once you did the first pass and you did all the compilation necessary, you no longer need a compiler if you want to run the program again, so you can do some ahead-of-time compiling, and
17:22
this is where WA4C comes in. So what it does is it takes your fourth program, it uses WA4C to run your program once, and at the end of the cycle, it's going to look at all the modules that you created. It's going to combine them all, combine the final state, and then create one big WebAssembly module out of this. Now, it's going to take this big module and then
17:46
use another tool from Webit. Webit is really a cool tool set. It's going to use another tool from Webit called Wasm2C to transform this big module into C, and then it's going to use your host compiler to create a native executable.
18:00
So the end result is that you have a fourth code to native compiler, and your native binary is your fourth code with the rest of the fourth system still in there, but the compiler left out. And the cool thing is that, because this is all platform-independent stuff up until the native compiler, you can actually do cross-compiling
18:22
easily. So you can just do cross-compiling from fourth to any architecture you want. And all this code is about 500 lines and uses a lot of stuff from Webit, actually, and Webit is the only dependency here. So if you look at our final table of benchmarks, we see that the speed
18:41
is slightly better than it was before in the browser version, and the binary is becoming a lot smaller. So the entire system is only about 116k in the end of native code. Now, there's still room for improvement here. So what Webit4C does is it just throws together all these modules and then generates the big module. Now, this big module, there are no
19:06
cross-module calls anymore. So what you could do is actually do some post-processing. You could change all these indirect calls into direct calls, which could speed up a lot, because the calls are really the bottleneck here. Another thing you could do is throw away
19:20
code that you don't need. So in conclusion, this was a very fast talk. I could only touch upon things very briefly. And what I did was I used fourth to explore low-level language implementation in WebAssembly. Now, because fourth is so minimal, I was able to keep things
19:40
very simple, try out a lot of things, and go a lot of places. But I think a lot of the things that I've shown here are actually applicable to other languages, so more declarative languages, if you want to compare to WebAssembly. Although I have to say, if you don't know fourth yet, I can really recommend having a look at it, because I find that there's some
20:02
interesting philosophies and concepts behind it. Thank you. Questions? It was fast, wasn't it?
20:28
Sorry about that. Yeah, I always have been.
20:46
Yes. So yes, I, okay, one question.
21:30
So the question is, can I reach the same performance? Can you reach the same performance
21:42
if you do it in JavaScript? So yes, so the question is, can you do also this code generation in JavaScript? Yes, of course you can. Potentially you can. So typically what you will see is,
22:04
the handy part, because I'm working in WebAssembly, is that I have all the WebAssembly low levels at my disposal. So the hard part, if you go to the other languages, is that you're going to need to have something to manipulate these, for example, this function table is very critical. So you need to be able to talk to that and hook into
22:23
that. That's going to be the tricky part, but it's definitely possible. But it's easier if you do it directly in WebAssembly. Of course, you would never write a complex language directly in WebAssembly. That's madness. So you can do it with fourth, but I would not recommend it.
22:40
It was super cool, though. Thank you. Question? One more question? Yes, I'm interested because you also used the WebAssembly to C compiler. Yes, I used it. Have you checked the regions?
23:03
I didn't, no, I used the WebAssembly to C compiler. The performance was quite on par with this. So if I took the C algorithm, it was about, it's a bad, bad benchmark. But the performance was about 10% slower, so it was not much slower than
23:22
native binary. So it's C compiled to native and C compiled to WebAssembly. It was only a little bit slower. Of course, you are running in a WebAssembly, you are still running in a virtual machine, right? So the fact that the performance is going to be maybe a little bit slower,
23:40
but I thought it was still okay, given that you're still in a VM. Thank you, Remko. We need to solve. Okay. Absolutely amazing.