We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

A year of RISC-V adventures: embracing chaos in your software journey

00:00

Formal Metadata

Title
A year of RISC-V adventures: embracing chaos in your software journey
Subtitle
How I started from zero and ended up porting a JIT compilation library and assembling files by hand
Title of Series
Number of Parts
287
Author
Contributors
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
In this presentation I share my 1-year journey with RISC-V and how I started from nearly zero and I ended up porting Guile's JIT library to RISC-V and starting the RISC-V port of Stage0. This journey is full of uncertainties and chaos but that's what finally made this happen. During this talk we'll discuss how embracing chaos can lead to great change and how we can become the source of positive chaos in people around us. This talk is not fully technical but many technical aspects are discussed as they are fundamental in the journey: instruction sets, assembly, debuggers, hexadecimal... and all sorts of low level wizardry will be mentioned and explained. I'll try to make my best to make all of it accessible to anyone with no previous knowledge about them, though.
Just-in-Time-CompilerLibrary (computing)SoftwareChaos (cosmogony)Numbering schemeImplementationCompilerOpen setCompilerArchitectureSoftware maintenanceCodeCode generationComputer programmingOpen sourceOpen setPointer (computer programming)Library (computing)Online helpProjective planeImplementationReduced instruction set computingComputer architectureNumbering schemeBitMultiplication signTelecommunicationComplex (psychology)Programming languageCodeSoftware maintenanceElectric generatorSignal processingComputer scienceComputerSoftware developerMereologyComputer programAssembly languageProcess (computing)Machine codeMobile appLink (knot theory)BlogArmBuildingPoint (geometry)Electronic program guideChaos (cosmogony)CASE <Informatik>AuthorizationGoodness of fitCompilerSoftwareTheory of relativity1 (number)Extension (kinesiology)Network topologyNP-hardSemiconductor memoryPhysical lawSign (mathematics)Stokes' theoremVideo gameKnowledge engineeringMessage passingMetropolitan area networkExpressionNumberGroup actionRight angleOrder (biology)AreaForestArithmetic meanCompass (drafting)Open sourceLine (geometry)Thomas BayesGaussian eliminationObservational studyEndliche ModelltheorieGame theoryBoss CorporationPerformance appraisalLevel (video gaming)Exterior algebraView (database)Data miningMoment (mathematics)ResultantDiagramComputer animation
OpcodeOpen sourceCodeAddress spaceFunction (mathematics)Physical systemMacro (computer science)SubsetCompilerSlide ruleAssembly languageVideo gameLibrary (computing)Compilation albumLink (knot theory)Open sourceNumberINTEGRALLevel (video gaming)CompilerHydraulic jumpProcess (computing)Macro (computer science)Projective planeBitDistribution (mathematics)Mobile appPhysical systemFunctional (mathematics)Computer programOrder (biology)Pointer (computer programming)CASE <Informatik>ChainSource codeHexagonPosition operatorJust-in-Time-CompilerLatent heatCodePoint (geometry)Interior (topology)TouchscreenField (computer science)Revision controlComputer fileResultantDecision theoryMereologyContent (media)Address spaceContext awarenessState of matterWebsiteMultiplication signHypermediaData recoveryDemosceneReading (process)MathematicsRight angleDirection (geometry)Basis <Mathematik>SubsetBit rate1 (number)Symbol tableArmLetterpress printingFamilySubject indexingEvent horizonSelf-organizationSemiconductor memoryEndliche ModelltheorieSpecial unitary groupComputer animation
Form (programming)QuicksortRight angleControl flowContext awarenessProjective planeBitHexagonCodeSource codeJSON
outputAssembly languageBinary fileOpen setRevision controlEntropie <Informationstheorie>File formatDisassemblerAddress spaceOnline chatMembrane keyboardSoftware testingLevel (video gaming)Expandierender GraphOpen sourceCompilerMoment (mathematics)NumberReal numberDirection (geometry)Matching (graph theory)Combinational logicExpressionOnline helpRight angleResultantPoint (geometry)Block (periodic table)System callAssembly languageThermal expansionDoubling the cubeForm (programming)MathematicsLevel (video gaming)Reading (process)BitProcess (computing)Identity managementRevision controlGame theoryGroup actionAngleLibrary (computing)Software testingInsertion lossChaos (cosmogony)GodUsabilityMobile appScaling (geometry)Computer configurationObservational studyMultiplication signPlanningCategory of beingOptical disc driveGoogolWritingInteractive televisionShift operatorVideo gameCASE <Informatik>Line (geometry)HexagonContext awarenessType theoryComputer fileGame controllerProjective planeCompilerNumbering schemeCodeFront and back endsAddress spaceFile formatDecision theoryJust-in-Time-CompileroutputMereologyEmailData miningState of matterComputer animation
NumberLevel (video gaming)AdditionRight anglePoint (geometry)Goodness of fitComputer animationMeeting/Interview
Bootstrap aggregatingSemiconductor memoryBuildingBitLevel (video gaming)MereologyRight angleStandard deviationComputer hardwareProcess (computing)Binary codeNumberComputer programRevision controlPatch (Unix)CASE <Informatik>AdditionOnline helpEndliche ModelltheorieQuantum stateReading (process)WebsiteSelf-organizationCausalityPrice indexDirection (geometry)Coefficient of determinationVideo gameArithmetic meanBit rateNormal (geometry)Forcing (mathematics)Meeting/Interview
Computer animation
Transcript: English(auto-generated)
Hello, this is a guide to Stata and this talk is about my last year working on RISC-V related projects and it's a little bit of an explanation about how to embrace the chaos in your software
in your free time and also how one year can develop and how can you reach unexpected places. So, first, as a little introduction, I'm a telecommunication engineer, so I'm not a very...
I am not an specialist on computer science, let's say. I also have some background on electronics and other kind of things like signal processing and all that, but I spent my whole career more or less working in computer science programming as nowadays, where my job is being an engineer and programmer in technology.
I'm also a Geeks user and contributor and recently, last two years more or less, I've even started to be interested in small computing, making small devices, small computers, small programming languages, in a sense that anyone can maintain their own hardware and software.
So, especially, I'm interested in a scheme, at least in general, but more specifically in the small scheme implementations. So this is more or less my background. So how all this started? This started in Christmas, the last year's Christmas, last year in 2020, right?
So this Christmas, I started to create a small programming language, a small scheme, because I wanted to learn about programming languages in terminal, so I wanted to make a programming language and I wanted to make a compiler. I wanted to make it just a compilable scheme, not just an interpreted scheme.
So I read some papers, I read a book, for example, the Let's Build a Compiler by Jack Crenshaw, which is a really interesting book. And I started working on it slowly, just in my vacation.
And also I decided that I wanted to generate some binary or some assembly code from it, so I needed to study some architecture, right? And I didn't have any background on x86 or MIPS or ARM or anything.
So I decided to study RISC-V because it was the new thing, it's open, everything is like full of interest on RISC-V, so I decided to go for it. And I read this book, which is called the RISC-V Reader, an Open Architecture Atlas,
which you can find for free in Spanish and in Chinese, I think, and in English it's not very expensive. And the authors of the book are the ones that author the RISC-V specification, so the book is really, really good. And it also compares with other architectures, if you have some background on them, like MIPS or ARM or whatever, right?
So this book is fantastic, I learned a lot from it, and I also studied many papers about the scheme implementation, read some scheme implementations too. But in the end, this project didn't happen because I'm a busy man or whatever. My mind is just unable to focus on a project for a long time, so I left this project.
This is my beginning, right? So this is just adding entropy to my chaos, right? So one day, a friend of mine, who is the same guy who introduced me to gigs in previous years, by the way, told me that in the gigs mailing list, they were looking for someone for a RISC-V port. So I decided to raise my hand, he told me, no, you have to raise your hand, say something
because we're interested in this and this might give you some opportunities, and I did. So from that moment, I was just involved in the RISC-V porting effort of gigs, and one day talking with them because we had an IRC with not a lot of messages, but one day
just Andy Wingo said, it would be interesting to migrate also the Lightning library, right, which is the wild just-in-time code generation library to RISC-V. So I said, why not? Let's take a look into that. So it's basically a simple just-in-time code generation library, which exposes some kind
of an instruction set, which is based on RISC architecture, and then you have to generate all the code by yourself, right? That's what the library does. So it's a fork of GNU Lightning, but from one of the older versions, because I think from the GNU Lightning 2.0, it started to add some complexity that Andy didn't like,
so he decided to make a fork and maintain the library by himself, and this is my experience with it, right? I was worried about my C skills, and as I predicted, they were very rusty, so I had
to relearn a little bit of C, I struggled a little bit with that, but in the end, the programming part wasn't that difficult. The complicated parts were the documentation, there was zero documentation, exactly zero documentation from that library, but I could rely a little bit on the documentation from
the new Lightning, which was not completely compatible, but I could use it a little bit, so I based my knowledge on that and then it started growing from that point. Also there is a lot of dead code from other architectures that are not supported, and that architectures were not cleaned from the library, so I spent a lot of time reading
some code that wasn't useful, and in the end, I was feeling like I was a little bit lost, I was unable to do it, and I decided to talk with Andy, and he helped me a lot, he gave me the good direction, and everything just worked from that.
So first lesson, if you are lost, if you feel helpless, and if you feel that you don't have the skills to finish something, maybe you just need a couple of pointers and a little bit of help from someone that really knows the project, and most of the maintainers are really happy to tell you how to continue, and they are always very happy, so this is
the first lesson. So the things I learned are basically this, I learned how to assemble instructions by hand, which we are going to do today, I re-learned or re-understood that code is data in a different level, because I was a little guy, so I already knew that,
but now I know it in another level, so I learned also some GDB development tricks that were very interesting for me, and I also understood that machine code generation is not that complicated, as we are going to see now, and also some stuff about relocation and immediate, that is a little bit complicated for this talk, so we are going to leave it
for your investigation, if you want to follow this link we have below, you can read about it in my blog. So let's assemble an instruction by hand, this is a RISC-V instruction, RISC-V instructions have the following syntax, they start with the app code, then they
have the destination register, a search register, and in this case an immediate, this instruction is an immediate base instruction, so they are going to add, why is it called like that? An immediate, right? It is going to add the contents of the search register with
the immediate we add, which is 56 in this case, and it's going to store the result in A0, in a register A0, so in order to, well this is basically loading 56 in A0, because the search register always has 0, its value is always 0, right? This is also a design decision on RISC-V, which is very interesting, so let's assemble it very fast,
so the app code is this one, you just have to read the specification, which is not that long, it's really easy to read, so this is the app code of add i, so this is the first part, then the destination register is this one, so you have to add this one. These are numbers from 0 to 32, or 31 in this case, and this one is just
one of them in the list, so then there is a small field, which is some kind of an app code, but in this case it's just three zeros, so you just read that from the specification and you add it, then the search register, which is the zero register, and its number
is just zeros, and then the immediate, which has I think 11 or 12 bits, so there you have it, 56, so you combine everything together, you put it in order, like we just followed, so we start with
the app code, then the destination register, the func3, the search register, and the immediate, and we have the full instruction. This is a 32-bit instruction, as any instruction on RISC-V, so this is the way we generate an add i instruction. There are other instructions with other formats, but they are as simple as this, just follow the fields and fill them, and in hex this is the
value of this instruction, this is the same thing, right? So remember this value for the next slide. So once we are able to assemble instructions by hand, we have an example of the CODIS data, right? This, you can see here in the middle, this instruction zero,
is filled with the same instruction we just assembled by hand, and the next one is basically a return, right? This is the same instruction to make a return, it means jump and link phrase them. So it's going to execute a return. So if we have an array of two unsynced integrals of
32 bits, we can fill them with instructions, because they are 32 bits too, and then we can reinterpret the address of this array, and we can basically call that as if it was a function.
So this is what a just-in-time compilation library does. They create pieces of memory, like they allocate pieces of memory, they fill them with instructions that are generated while the program is running, and then they are able to call those instructions,
just returning a function pointer or something like that. So it is not that complicated, right? It isn't that complicated. So if you just understand this example, we are loading 56 in the A0 register, and then we are returning, so we are basically setting the A0 register to 56. So if we call this, we are going to set this int A here
to 56, and if we print it, we are going to print 56 on the screen. So this function just returns 56, but it's two instructions. Okay, so that's the first point. In the meantime, there was a chance to work on Geeks 365 support via NLNet, and NLNet was going to fund us if we
send a correct proposal, and the guys involved in this part told me that I should send a proposal and try to work with them. So I have to learn a little bit more about the context of the Geeks bootstrapping process, which is very interesting, and then I send a proposal.
But as sometimes happens, the proposal was rejected, and it was a little bit sad, but I also thought, and what now? So what now? What now is just basically that I continue with my
life investigating in projects and doing my things. So let's talk a little bit about that bootstrapping process. So Geeks is a distribution that is trying to be able to be bootstrapping from source, and for that we have several projects combined together that let it have
this full source bootstrap, right? So the problem is very easy to explain. A GCC compiler, for instance, the 10th version or the 12th version, I don't know which one is the more recent one, in order to compile that, it depends on the previous GCC version, and that GCC version
depends on the previous GCC version. So you always need a compiler in order to compile the current compiler. So in Geeks, as we are trying to start everything from source, we need somewhere, we need one position in all that change, in all that chain where we decide or we say,
this is the source, this is source code that we can really audit and then we can compile all the rest of the chain, which is really complex. So this stage 0 is one of those projects that start in the early beginning of the chain, they generate a very small C compiler or a C subset,
and then from that we're able to compile more complex compilers until we find, we reach, or until we are able to compile GCC, and from that we can compile the world, right? So this whole process is separated in these small pieces, which are very simple at the beginning and
a little bit more complex when we are going forward, right? So the first step is just an ELF file, which is a binary file, but this is written in hex, so it has to be converted to
the binary, right? And it's basically that, it's just an ELF file written with hexadecimal values and it has comments. So this is the first step. From this first step, then we write a little bit more complicated assembly, which is based on the same ideas that has some labels
and some extras, we write it in hex 0, but now we have a more complicated assembler. From that assembler we generate another assembler, which is a little bit more complex and has more tools, and from that we can write a macro system, and from that macro system we can go adding more tools, right? You understand the process. So from all this process we finally reach
a small C compiler, a SMASH CC from the SMASH project, and from that we can reach a tiny C compiler, then GCC, and then compile the world. So I decided to get into this process, which is a little bit complicated to enter at the beginning, and I felt basically like this. This was my
experience. I reached the project and I found this issue that basically said needs better documentation. And it was a little bit painful because there is documentation in the project, but it is hard to find, it's hard to understand the whole context, and the owner of
the project, the main developer, explains the issue very well. He says that he is too familiar with the code to spot the sort of things that might trip up a new person and need to know the sort of questions they might have and want clarified. So this is a very well explained issue
of this project. So the good things are you can always join the IRC and ask these questions, and they are normally answered very well. So the first impression of the project is a little bit complex, it's a little bit hard to control, but in the end if you make the right questions,
you have the answers. So I got into hex zero, I understood what I had to do, and I started doing it. So I started making the hex zero for RISC-V, and this is a piece of the current code of the hex zero. Mine didn't look exactly like this, but it's just a small detail. So I started making it,
doing all these instructions. First I wrote it in assembly, and then I had to make these instructions, right? Like they look here in the left. All the rest of the parts are just comments that are going to be stripped out, and even the labels like this, this is not a
label, this is just a comment to helping the understanding on the code. So this is how a hex zero file looks. It's just an ELF file, but it's written in hex to be easier to type, instead of having to type all the binary stuff and write it in a binary file, which is not easy to do. So this is just a readable in a text file, it's just hexadecimal values, and then the
spaces, the comments, and everything is going to be stripped from this process, and it's going to be cleaned and generate the final executable. So I wanted to make the first version for the POSIX process for hex zero in RISC-564, and I had two options. Write it in assembly first,
and then assemble it by hand, or recover an abandoned project I had of an assembler, and make it in that project, and that's what I decided to do. So when I was making the first scheme compiler that I told you, I was struggling to make use of any assembler, so I decided to
write my own in Python, and I started doing that, and as many other things in life, I just abandoned the project, and in this case I decided to reuse the backend of that project to generate the instructions. So I was able to take the same backend and add a couple of lines
of code to generate the hex encoded expressions, or the hex encoded instructions, in the format I wanted, so I used that. So as you see, many projects you do add entropy to your world, and then they start to be useful sometimes. Sometimes they are not useful, but sometimes they
are. So what else? I also learned some disassembly tricks from Lightning, so I could use in this project, and I was able to calculate the addresses more easily, instead of having to count all the instructions by hand, as I was doing at the beginning, I realized I
could use gdb to calculate all the addresses, so I made the whole file with the addresses set to zero, and then I ran gdb and started looking for the addresses of the instructions, and started filling the gaps. So the experience, it was rewarding to see I could write assembly,
I reused many tricks from gdb and from the other project, but there's still a little bit of a bad feeling, because the documentation is weird, it's hard to understand all the reasons behind the design decisions on this project, and everything looks a little bit right. The good part is that the input is trappable, I'll leave it in the chat, you can talk with the
people that is handling this project, and you can learn a lot from them. So it's a really weird feeling, but at the same time, it's somewhat fixed by the IRC. So this is my experience with hex zero, but I have some extra stuff. So one Sunday morning, I was bored and I decided
that I wanted to make the hex zero for race 532-2, which is very similar, but we realized that what we needed to do was to replace the ELF headers, and nobody knew how to do that, so I just looked in the Wikipedia, searched for the ELF headers of the 32-bit version,
changed a couple of fields, and at the end of the morning, we had a working version of the hex zero for race 532-bit. So this is another bit of entropy you add, just when you are
bored in a Sunday morning, you can make things and sometimes they work. So the next thing is just a summary of the status of the project, and I'm just going to finish with this. So the lightning project, the just-in-time compilation library, is passing all the tests, but it still needs
some testing in real code, and then needs to be merged in the original project, right, and the state zero, I only made a small contribution on hex zero, but other people started working on top of that, and they made all the steps,
all the steps in the process, until the last one, I think. So this is really cool, this is how chaos expands, right? I only made the very first step, and now it is improved, it's not only my code, it is improved on top, and we have the other steps, we have the hex one,
the hex two, the M0, we have many many things on top of what I did using my work as a reference, so I'm very happy with that. So my status nowadays is that I tried again with an LNET, doing a proposal on the porting efforts on the next level of the bootstrapping process,
which is a new mess, and they looked really interested, so I'm probably going to work on that during this year, and I'm very happy with this, because it's just how all this chaos, all this random work I did, it's starting to become something, maybe it's going to be my
job for the next year, so I'm very happy with that. So the conclusions, all of this was completely random work, I did that without a plan, without a purpose, it was just living my life, trying to go forward, I was suggested to do things, and I say yes sometimes, say no other times, and I just embraced the chaos of life. I'm a I'm a control maniac,
but sometimes I embrace this kind of chaos, and it's really cool when things happen, and they happen as soon as you like to, right? This was completely unexpected, but it was
really interesting to see something happen, even if I didn't hold the control of everything. So I just want to encourage you to just work, do things, and stay curious, and you may reach interesting places doing that. So if you can also, you should learn from people, because it's faster, it's better, and you have some social interaction, and you don't feel that helpless and
alone as you normally feel when you're developing free software. And also as the last point, you don't need to be a genius, if you think about it, I didn't know anything about just in time compilation, I have issues with C programming, but in the end of the day, I did all of this, because it is not as as complex, it is easier if you know the context.
So ask for help, people is going to give you the help you need, and you are going to reach points you will never expect. So just that, try and let it happen. Thank you very much.
Okay, so we have some questions here, right? Let me read some of them. One of the questions is, let me show, yeah, what's the status of geeks on RISC-V? So that's a very good question,
I don't have a good answer to, maybe Ephraim has a better answer for that than me, but we're working on it. We still have some points missing, I think, but we're working on it.
So if Ephraim wants to add anything about it? Yeah, I can jump in quickly. Let's see, we have the bootstrap binaries are added. I've built all the way out to NASA. Tolo, of course, builds. It's still definitely in the early stages. There's a number of packages
where we say, hey, in addition to everything else, this is missing, so I need to actually patch GCC itself, and then there'll be a full world rebuild for RISC-V64. And of course, there's no actual build hardware right now, so everything is done locally.
So it's there, but I hope you enjoy building. Almost there, right? Yeah, so also from my side, I'm working now, as I said in the talk, I was asking for some funds. They say yes, so I'm working on some of these parts. I'm working on taking GCC, an old version of GCC,
and backporting it to RISC-V to add this kind of bootstrapping process. So we're working on it. It's still not ready, but we're working on it. So more questions. Yeah, this is a very specific question, so I'm going to answer it fast.
Do you have to set the memory permission bits between writing the instructions into and executing them? There are any catching issues, or there are catching issues. So I didn't. In the example, I didn't. In an example, it works. You can just execute it, and it's going to work. But in real life, you'd probably like to make an end map to generate
the array and set the correct permissions, including the permission of read-only to avoid the program itself to overwrite the instructions and go crazy. So you'd probably like to do that. And there are catching issues. I don't really think so. If the array you are
working with is big enough, you're not going to have catching issues because it's going to start running on the array as if it was the normal problem. So I don't really think there are going to be any catching issues. In the Lightning, the process is basically that. It's a big malloc. Take a piece of memory, fill it with instructions, and just run. But that malloc
is done with, well, it's not a malloc. It's an end map, and it has some permissions set. So what else? How can people help out with the RACE 5 port? I'm not sure. I'm not sure about that.
From my side, if someone teaches me how does VCC work in the inside, that could be a very good help. But I don't know. I don't know. It depends. It depends. The RACE 5 port is mostly done by me in the case of VCC. If I am also, Jan is working on parts. So just getting
involved in the process is more than enough. I don't know what else to answer. If you have quickly added RACE 532 support, I think it'd be as easy to add 128 support. Yes, I think it's
going to be easy because the RACE 5 standard is very similar in every size. So it's just changing a couple of bits in the header, and you're done.