We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

How to add an GCC builtin to the RISC-V compiler

00:00

Formal Metadata

Title
How to add an GCC builtin to the RISC-V compiler
Title of Series
Number of Parts
542
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Many low level features of architectures are implemented in GCC as builtin functions. Builtin functions look superficially like any C function, but are in fact intrinsic to the compiler and represented as patterns to be matched in the machine description. Builtin functions are often used to access unique functionality of individual machine instructions. Being integrated within the compiler, they are more efficient than using simple inline assembly code. For RISC-V, they offer an excellent way to expose the functionality of instruction set extensions to the C/C++ programmer. Adding a builtin function to GCC is not that difficult, but neither is it completely trivial. In this talk we will show you how to add builtin functions using examples from the OpenHW Group's CV32E40Pv2 processor core.
Asynchronous Transfer ModeTrigonometryReduced instruction set computingCompilerSoftwareCore dumpTelecommunicationCompilerStudent's t-testProjective planeUniverse (mathematics)Computer animation
Function (mathematics)Uniqueness quantificationVirtual machineField extensionCompilerReduced instruction set computingMathematicsAddress spaceStandard deviationFingerprintAssembly languageMachine codeInformationMathematical optimizationStructural loadEvent horizonWordExtension (kinesiology)Core dumpOperations researchArithmetic logic unitRevision controlComputer hardwareProgrammschleifePattern languageMathematical optimizationComputer fileComputer programmingFlagFunctional programmingUniqueness quantificationMultiplication signVirtual machineProgrammer (hardware)Correspondence (mathematics)User-defined functionDescriptive statisticsRootField extensionEvent horizonCompilerCore dumpStructural loadFerry CorstenAddress spacePoint (geometry)MereologySource codeDataflowFunctional programmingCodeLatent heatSquare numberOpen setRevision controlComputer architectureInformationDampingComputer hardwareComputer animation
Field extensionExtension (kinesiology)Standard deviationCore dumpRevision controlOperations researchArithmetic logic unitStructural loadComputer hardwareProgrammschleifeEvent horizonParameter (computer programming)Codierung <Programmierung>WordReduced instruction set computingLatent heatArchitectureOvalFluid staticsStrutInformationVariable (mathematics)FlagIntegerField extensionCore dumpFlagComputer architectureImplementationRevision controlComputer configurationPointer (computer programming)Multiplication signLatent heatMacro (computer science)Structural loadSet (mathematics)Task (computing)Computer fileSpeicheradresseEvent horizonLinker (computing)Variable (mathematics)Extension (kinesiology)Virtual machineCoprocessorSystem callProgrammer (hardware)Computer animation
Reduced instruction set computingPattern languageVirtual machineAmsterdam Ordnance DatumType theoryParameter (computer programming)Predicate (grammar)Function (mathematics)Data typeMachine codeAddress spaceWordStructural loadEvent horizonPrinciple of maximum entropyConstraint (mathematics)Price indexScheduling (computing)SpeciesAsynchronous Transfer ModeOperations researchMilitary operationFlagVirtual machineFunction (mathematics)IntegerArithmetic meanPattern languageSingle-precision floating-point formatBitSource codeProgrammer (hardware)Object (grammar)Core dumpMereologySemiconductor memoryCondition numberPredicate (grammar)Constraint (mathematics)Latent heatAddress spaceDescriptive statisticsOperator (mathematics)Asynchronous Transfer ModeMatching (graph theory)Correspondence (mathematics)Set (mathematics)Pointer (computer programming)Template (C++)Computer fileFunctional programmingMultiplication signSubject indexingPrinciple of maximum entropyNumberImplementationProgramming languageSlide ruleIntermediate languageHeat transferState of matterComputer animation
Template (C++)Function (mathematics)Machine codeString (computer science)Price indexControl flowSubstitute goodRead-only memoryAttribute grammarPattern languagePattern matchingDirected setMathematical optimizationData typeReduced instruction set computingType theoryParameter (computer programming)Predicate (grammar)Virtual machineTelephone number mappingOvalMacro (computer science)PrototypeEnumerated typeCodierung <Programmierung>Electronic mailing listCondition numberSuite (music)Computer configurationPointer (computer programming)Type theoryCondition numberSoftware frameworkTemplate (C++)Multiplication signComputer-assisted translationScripting languageVirtual machineFunctional programmingSoftware testingSource codeNumberOcean currentComputer simulationField extensionPredicate (grammar)BuildingMicrocontrollerPattern languageMereologySubject indexingIntegerRegulärer Ausdruck <Textverarbeitung>Correspondence (mathematics)ResultantEndliche ModelltheorieBitPrototypePresentation of a groupComputer configurationStructural loadSemiconductor memoryExpert systemCompilerParameter (computer programming)Line (geometry)Asynchronous Transfer ModeComputer animation
Assembly languagePort scannerPartition (number theory)Object (grammar)Error messageReduced instruction set computingMountain passSummierbarkeitSuite (music)Compilation albumLinker (computing)Computer hardwareOpen setLocal GroupSource codeComa BerenicesMathematical optimizationLevel (video gaming)MathematicsPresentation of a groupFunctional programmingCompilerSoftware testingQuicksortSlide ruleMereologyComputer hardwareOpen setCore dumpBitTerm (mathematics)CodeCodierung <Programmierung>Message passingPort scannerComputer animation
Program flowchart
Transcript: English(auto-generated)
Hello Fozdom, I am Nandini Jamzanas. I am a software toolchain engineer at Embercosm. I lead the Core 5 GNU toolchain project. I am also a UK
electronics scholar from UK ESF. UK ESF encourages young electronics scholars, students to study electronics and pursue a career in the sector. UK ESF also connects
top UK universities with leading employees. In this talk, I'll be giving you a tutorial on how to add a GCC built-in to the RISC-V compiler.
Okay, so what is a built-in? Well, in C++ and C, there are two types of functions. You've got your user-defined functions and your built-in functions. User-defined functions are functions that the programmer has defined within their code so they can use it.
But a built-in function are functions that are already implemented in the compiler. So the programmer doesn't need to write specific code for it and can directly use these built-ins. Many low-level architectures in GCC use built-ins. Built-ins
look superficially like any C function, but there are intrinsics to the compiler which are directly implemented within. These built-ins have specific patterns to be matched in the machine description file and have access to unique individual machine functionalities.
Because they are integrated within GCC, they are more efficient than using just simple inline assembly. For RISC-V, this presents an excellent opportunity to expose the ISA extension to C and C++ programmers.
This is an example of a simple built-in in GCC, which takes the square root of a float. There are tons and tons of GCC built-ins, but I
don't know if you know, but there's probably like two in RISC-V. And this is why I'm giving you a tutorial about it so we can add more. It is important to say that yes, we call it a built-in function, but it's not really a function. There aren't any corresponding entry or exit points and
the address cannot be obtained. Here is the square root float built-in that is implemented in GCC. You can find it in GCC builtins.dev. All of the source code will be linked at the end, so don't worry. I will give that to you.
If you want to make a specific RISC-V built-in, then you would go into the link below or the path below, which will be in RISC-V built-ins.cc. Yes, I'm talking a lot about built-ins. We could simply just use inline assembly.
But this is why we shouldn't be using inline assembly. If you want to use inline assembly, you have to annoyingly specify the pattern every single time you use inline assembly. Sometimes you can get it wrong. GCC does not know about this built-in, so there's a huge risk of data flow information being lost.
Again, GCC does not know about this instruction that you're using with inline assembly, so optimization cannot be used. The reason we use built-in functions, well,
all of your data flow information will be retained. Patterns can be recognized and used elsewhere by GCC. You only need to specify the pattern once, and that will be in the machine description file. And then, voila, you just need to use your built-ins, put in the arguments, and the program will be fine.
Again, with built-in functions, they're implemented directly in the compiler. So GCC will know about it and can use their optimization flags. What do I talk about when I say optimization? Well, GCC has a bunch of optimization flags. Here are two that I'm currently using as an example. The first one is with the flag minus
zero. I don't think that is, that's the basic level of optimization. In fact, I don't think that's any optimization at all. This is just hardcore assembly, which you will use for cv.el, which I'll explain later. And
when you use an optimization flag, minus 02, that will increase performance, reduce compilation time. GCC optimizes those assembly instructions because it knows that it doesn't need to be used. You might have noticed that I'm using cv.el. You're probably wondering what the hell that is.
Well, cv.el is part of cv32e4ep ISO extensions, also Core 5 ISO extensions. The cv.el is part of event load extension. We are currently implementing version 2 of this in Open Hardware's Core 5 GCC and Benutos.
The first set of extensions, the first set of versioning has the first five extensions and then version 2 has event load, SIMD and bit manipulation. I would like to emphasize that all of these extensions and instructions are in Benutos, the assembly and the linker.
But it's time to add GCC, to add built-ins in GCC. I am going to be using event load for this tutorial. This is because, well, event load only has one instruction.
So it's a very beginner friendly task. That instruction is cv.el, which will load a word and cause the cv32e4ep processor call to go into sleep state. This is
an instruction that GCC will not know about because it's very machine specific. Thus, we need a built-in. Before we get into all of this, it is very important to call out the naming conventions of these built-ins.
A general convention name for a built-in in GCC will just be built-in and then the instruction name. But if you want to make it a RISC-V specific built-in, it'll be built-in RISC-V, the vendor and the name. For a Core 5 specific one, it'll be built-in RISC-V, cv for Core 5, the extension name, and then the instruction name.
Yes, I understand it's a bit long-winded, but it is very important to emphasize which vendor, which architecture you want to use, what extension, what instruction. It just makes it a lot easier for the programmers to know which instructions they want to use.
So for my built-in, and if you want to use it, it will be called underscore underscore built-in, underscore RISC-V, underscore cv, underscore alw, underscore alw, because there's only one instruction. I'll just call it the same thing.
So this is an example of how to use this built-in. This built-in will take a void pointer. It will be loading it from a specific memory address and then loading it into a general purpose register, which is an unsigned 32-bit
integer. From this example, yes, you'll have to, the only thing you'll have to do is just put in the pointer and it will return your unsigned integer value. Can you speak a little louder, please?
Oh, okay, sorry. Now that I've spoken about what event-learn is, it's time to add an extension to GCC. So most of these
implementation for adding an extension will be in RISC-V common.cc. So we've called our extension xcv, which will be the main extension, and then you'll have eight sub extensions, which will be xcvelw.
There isn't any isospecific class, so I'll just use a macro none, and this will be the first version of it. Because I am implementing a sub extension, we'll have to
imply it here by putting the sub extension first and then the main or parent extension. Next we add the corresponding masks and targets. Before we do all of this, we need to go into RISC5.opt
to emphasize or add the target variable and the corresponding core five flags. This file is very sensitive, and so you'll have to, even though it's two lines, if you mess it up, then you've got GCC crashing everywhere.
So you have to be very careful in this file. And then use that flag for your corresponding target, but you also use it when you have to specify your GCC options. So I've done that in RISC5-common.cc, which is here.
Now it gets into the interesting stuff to actually define the built-in. RISC5 has a function already made for us so we can make these built-ins. That is in RISC5's built-ins.cc.
It takes in five arguments, and I'll be going through all of these in the following slides. That'll be the instruction name, the built-in name, built-in type, function type, and the availability predicate. So using this function,
I have created my own file, which is called corev.dev, and this is where all the corev-related built-ins will be in. My first built-in will be in corev.dev, and
the name of the instruction name will be CVEWSI for single integer. The name of the built-in that the programmers will be using will be CVELWELW, but that will be expanded to built-in RISC5.
Then you've got the corresponding built-in types, function types, availability predicates, and I'll go into that more. So the instruction patterns. This is probably the most difficult part of the whole built-in
implementation. So the instant name is the name of the associated instruction pattern in the machine description file. It uses, it takes in five operands, but the last operand is optional, but I recommend you putting in if you can. You've got the name, you've got the RTL template, conditions, output template, and instant
attributes, and that will be all in RISC5.md, but I will be creating my own MD for Core 5 specific, so we don't merge it into RISC5.md. So this is an example of RTL templates or register transfer language.
It's a template that's very, very similar to intermediate representation that GCC uses. It's a template that GCC will take and then put in the corresponding registers or operands that it needs to do.
So this is my instruction pattern that I will be using for this built-in. The name will be RISC5 underscore CV as we've previously defined it. I am using the set pattern, and this will take a destination register and a source register.
The destination, I think, yeah. This will be the destination register, the first operand, and I've used the match operand pattern,
which will take M as machine mode and the index of this operand, the predicate, and the constraint. The machine mode for this will be SI, which is a single integer. It's 32 bits. It's zero for the index of this operand. We usually start with zero as the indicate for indexing.
The predicate for this will be a register operand as we'll be loading it into a general purpose register, and then the constraint will be equals R
emphasizing as register equals two, meaning it's going to be written two. Next part of this is the source register, which will be the memory specific address.
So we're using mem to specify the size of the object being referenced, SI being single integer, 32 bits. Again, we're using match operand to match the register or the pointer to the specific address.
The index number will be one because that's the next number. I am using an address operand and then P specifying as pointer. I am using an unspecced volatile for this instruction because it's a volatile operation.
It's very machine specific. It can get difficult and there are times where it could be trapped. We are, I guess, referencing in a state that is fragile and vulnerable, so that is why I've been using an unspecced volatile.
Now that I've talked about the RTL pattern, we talk about the condition. The condition is important to add
so that the instruction can only be generated within these conditions. You can only generate this pattern if the target is to x call velw and that it's not a 64-bit target.
Next we talk about the orange bit, which is the output template. The output template will be what you will see in the assembly. So you define it with the instruction name, so cv.er and then slash t for tab and
then this is where you use those index numbers to reference which operands you want to use. So I'll be referencing a percent zero and then percent a one. Percent zero will be the destination register and percent one will be a source register. I am using percent
percent a to substitute as a memory reference. Lastly, we talk about the optional operand, but again, this is something we should try to put in if you're going to add a built-in. We want to tell
GCC that this is a load type of instruction and the mode is SI throughout the whole built-in. The reason I've added this optional operand I mean the instructions still can be generated, but GCC can now optimize it knowing that it's a load, knowing that's a machine-mode SI.
That is now the big part of the built-in. We've discussed the instant name and the template name. Now it comes to the built-in types. In RISC-V there are currently only two types of built-in types.
Those built-in types can be found in RISC-V builtins.cc. This is RISC-V built-in-direct and RISC-V built-in-direct-no-target. RISC-V built-in-direct corresponds directly to a machine pattern we've just created.
Whereas RISC-V built-in-direct-no-target does the same thing, but the return type will be void, but we are returning a general register operand or a 32-bit unsigned integer. So we'll be using RISC-V built-in-direct.
Next comes the function types. Again, everything is in RISC-V-builtins.cc. Currently there are only two types of prototypes for RISC-V.
You can only have a returning type, you can only have a return type and one argument, but in coming presentations I'll be talking about it a bit more because I only have 45 minutes to talk about this presentation.
When it comes to defining which return types and argument types we're using, that will be in RISC-V-ftypes.dev. So the comment says that it will expand to RISC-V
underscore unsigned integer and then a void pointer because that's what I'll be using for my built-in type. Lastly, we have the availability predicate. This is very similar to the conditions we had in the RTL template.
So we use this avail function that has been declared in RISC-V built-ins.cc. It takes the name of your availability predicate and then the corresponding conditions. As you can see, it's very similar to the condition we had in the RTL template, which is a target reference
and then it's not a 64-bit target. Now that we've added the extension and the instruction and the built-in, it's time to test it. This is a very simple test just to make sure that it works.
It's a compilation test. It takes in a void pointer with an offset. It returns an unsigned 32-bit value. You can see there are comments on the side. These are deja vu comments.
We are using deja vu because we want to use a simulator or it can be used on microcontrollers. It's a framework testing model that we use for our test scripts. The first comment, we'll talk about telling it it can be an execution or a compilation test.
So this will be a compilation test because we haven't got an executable target yet. The second line is to tell you the options for this built-in. If you don't specify the options, then this test won't run because this instruction only works within
xcorevelw. Then the last comment will be for checking if our instruction has been generated in the assembly. It should be generated once.
There are dashes to escape. It's very sensitive because it's a regular expression type of framework. We've got a run script for this.
It's very important to build GCC because I've been running tests without building GCC and wondering why it doesn't work and it wasn't until our GCC experts told us, no, you've got to run build. You have to run GCC and then run it. So this shows the results
from our run test scripts. Although it's just one test, there are 18 passes. That is because it goes through nine
optimization levels. Each optimization level goes through a scan assembly test and then a compilation test. Like I promised, I put up the slides for where all of this
will be found. This will be found in GitHub's Open Hardware Core 5 vignetteers and Core 5 GCC. This is also part of the Open Hardware group. We are still looking for volunteers and people to contribute to this project.
It's very important to also mention the GCC internals manual. It's probably the guru of GCC. That's what I rely on the most now. Thank you for listening to my presentation. Do you have any questions?
Is there any way that you can still reuse part of this work without having to use C code, or would you always need to go to C code? For now, I've just been using C code, so I'm not really sure.
So in this case, you would also use these methods in Fortran code? You could, yeah.
I haven't been using that myself. There's no reason for this not to work. It's expressed in terms of a C code. So because I was a bit confused more about the built-in concept in general, because usually people use C code to not be machine-specific,
and if you use it like a built-in, then you become machine-specific. Yeah. Oh, yeah. Depends on the built-in. The GCC has built-ins which are sort of general. I mean, like all the maths functions, for example.
I can put in maths. It's not machine-specific. In a sense, obviously, it's not standard-specific in this case, but you can have other mathematics. OK, at least I've touched on specific ones.
Well, actually, it is not architecture-specific. It's a general built-in function. Yeah, but even for mathematics, built-in functions, you always have, not always, but mostly, a kind of architecture-specific. Oh, yeah. There can be stuff like the encoding of the numbers or such like that.
It's sort of, you know, architecture-specific. So it should work, yeah. Actually, that's one way to avoid these architecture-specifics. Like, rather than encoding a non-pattern into your code, just by using a constant or bit pattern
and then sort of casting to a proper 14.5, you can use built-in non. It's a built-in function that produces the correct encoding of a non for your target.
OK, thank you for listening to my presentation.