The Ides of RISC-V
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Alternative Title |
| |
Title of Series | ||
Number of Parts | 637 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/53650 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Intrusion detection systemReduced instruction set computingEncryptionAssembly languageEmulatorSoftware testingParallel portEmailUniqueness quantificationVector spaceExtension (kinesiology)SequenceComputer fontMachine codeComputerArchitectureProxy serverKernel (computing)ImplementationSoftware developerSoftwareArmFreewareOpen setIntelComplex instruction set computingSqueeze theoremSet (mathematics)CASE <Informatik>Shift operatorMessage passingCiphertextWritingArithmetic meanPresentation of a groupSoftware testingShift operatorEncryptionComputer architectureMessage passingComputer engineeringFinite differenceReduced instruction set computingComputer scienceSoftware developerArtificial lifeNeuroinformatikComplex (psychology)Set (mathematics)DistanceArmBefehlsprozessorProgrammer (hardware)Multiplication signField (computer science)WordKey (cryptography)Complex instruction set computingEndliche ModelltheorieLink (knot theory)Slide ruleGroup actionSqueeze theoremBitCodeCryptographyProcess (computing)ReliefDifferent (Kate Ryan album)Operator (mathematics)Self-organizationRight angleInformationLevel (video gaming)1 (number)Uniform resource locatorOrder (biology)ImplementationEmulatorParallel portSoftwareFreewareMachine codeMaxima and minimaCartesian coordinate systemVector spaceUniqueness quantificationComputer fontWebsiteCoprocessorMedical imagingElectronic mailing listGenderLabour Party (Malta)Decision theory19 (number)TrailComputer programmingFamilyCASE <Informatik>Rule of inferenceData structureDirection (geometry)Goodness of fitShared memoryVideo gameIntercept theoremExtension (kinesiology)Computer animation
08:51
ImplementationSequenceEncryptionVector spaceOperations researchPhysical systemPerspective (visual)CiphertextConfiguration spaceStructural loadResultantOperator (mathematics)EncryptionFinite differenceProgramming paradigmASCIIKey (cryptography)BefehlsprozessorTerm (mathematics)CodeEntire functionSinc functionNumberRepresentation (politics)Graphics tabletElement (mathematics)Data storage deviceConfiguration spacePhysical systemOrder (biology)Vector spaceSystem callSet (mathematics)Multiplication signBitProgrammer (hardware)SpacetimeFlow separationRegulärer Ausdruck <Textverarbeitung>MultiplicationMessage passingString (computer science)ImplementationShift operatorSeries (mathematics)Boss CorporationInverter (logic gate)CASE <Informatik>DivisorSoftware testingMachine codeWeightMobile WebNumbering schemeDecision theoryDreizehnDistanceMoving averageComputer architectureProcess (computing)GenderParallel portComputer animation
17:35
EncryptionDirectory serviceReading (process)Assembly languageRevision controlBootingString (computer science)String (computer science)Machine codeComputer clusterInternetworking
18:17
Reduced instruction set computingComputer hardwareVector spaceOperations researchEmulatorEncryptionSource codeMachine codePhysical systemComputerArmProxy serverKernel (computing)Computer hardwareCompilerKernel (computing)Proxy serverImplementationSource codeNeuroinformatikPoint (geometry)TouchscreenGame controllerEncryptionEmulatorVector spaceAssembly languageParallel portPhysical systemComputer configurationCartesian coordinate systemProcess (computing)NumberMachine codeOperator (mathematics)Data storage deviceComputer virus1 (number)Server (computing)Dependent and independent variablesComputer animation
20:55
Proxy serverKernel (computing)Asynchronous Transfer ModeEncryptionBootingPhysical systemAsynchronous Transfer ModeBootingRouter (computing)Computer hardwareProcess (computing)NeuroinformatikVirtual machineProxy serverPerspective (visual)BitCondition numberCommutatorPoint (geometry)Functional (mathematics)Interface (computing)Cartesian coordinate systemSystem callPhysical systemFundamental theorem of algebraMereologyGame controllerComputer programmingThread (computing)Kernel (computing)BefehlsprozessorUML
22:15
Interface (computing)Read-only memoryConditional probabilityCore dumpAddress spacePhysical systemSemiconductor memoryInterface (computing)Point (geometry)Control flowComputer animation
22:43
Proxy serverKernel (computing)Asynchronous Transfer ModeEncryptionPhysical systemMultiplication signComputer programmingEncryptionControl flowPoint (geometry)BitCartesian coordinate systemFunctional (mathematics)
23:07
HauptspeicherAddress spaceRead-only memorySymbol tablePhysical systemPoint (geometry)Control flowOrder (biology)BitSemiconductor memorySpeicheradresseAddress spaceSymbol tableArithmetic meanComputer animation
23:39
Directory serviceAssembly languageRevision controlBootingString (computer science)File formatSheaf (mathematics)Interior (topology)Core dumpElectronic visual displayVector spaceElectric currentRead-only memoryContent (media)TouchscreenAliasingSimulationFreewareMaizeMiniDiscMultiplication signOperator (mathematics)Computer programmingScripting languageAddress spaceArithmetic meanOrder (biology)BitCore dumpComputer fileRight angleInterface (computing)Function (mathematics)Symbol tablePoint (geometry)Electronic mailing listDoubling the cubeDirection (geometry)Uniqueness quantificationCartesian coordinate systemObject (grammar)Asynchronous Transfer ModeMachine codeSource code
26:57
EncryptionUniqueness quantificationVector spaceExtension (kinesiology)EmulatorReduced instruction set computingSequenceParallel portMachine codeCartesian coordinate systemEmulatorExtension (kinesiology)EncryptionVector spaceUniqueness quantificationFeedbackMultiplication signPresentation of a groupAxiom of choiceSelf-organizationTrailParallel portElectronic mailing listComputer animation
28:01
Element (mathematics)Physical lawComputer animation
Transcript: English(auto-generated)
00:07
Hello, FOSDEM. How are you today? I hope everyone is enjoying the presentation so far. My name is Will Hawkins, and I'm here to present today,
00:20
the writing and testing of a parallel Caesar cipher in RISC-V. Before I get started, I just wanted to give you some contact information here on the first slide, along with my pronouns. And I also wanted to hope, just wanted to say that I hope everyone out there watching today is safe and healthy,
00:42
and the same is true of their family and friends. That's the most important thing right now. After that, I just wanna say thank you to all the great organizers of this conference, and especially of this particular track, who have done such a great job organizing a wonderful list of presentations.
01:01
I know that I'm enjoying the presentation so far, and I look forward to the ones that are coming after me, as well as some of the discussion that we're going to have today in the next few minutes. So I just wanted to give you a quick outline of what I'm gonna talk about today. First, I'm gonna give you a brief introduction. I'll talk a little bit about RISC-V.
01:21
I'll talk about what exactly is a Caesar cipher, some different ways that we can implement the Caesar cipher, and then we'll discuss the uniqueness of RISC-V vector extensions, which are crucial for that Caesar cipher parallel implementation,
01:40
which is our whole goal here, after all. Then finally, and probably of most interest to the group here, I'll talk about Spike, which is an emulator for RISC-V. We'll talk about emulating the application that we wrote, the Caesar cipher implementation, and then we'll talk about debugging, how we do that in Spike,
02:01
and at the end, I'll take some questions and we'll have a discussion, maybe even some more live coding. So as an introduction, all I'm assuming today is that you have an interest in this topic. That's really the basics. If you have a minimal familiarity with computer architecture,
02:20
that'll really help, and I also assume that you'll ask questions in the chat or afterwards when you're confused. That'll give me a chance to interact with you and help make the presentation as meaningful as possible. I also expect that this awesome audience will correct me when I am wrong.
02:40
Notice that I said when, not if. I am sure that I will say something wrong as we go through this. I also wanna say that all code is online and that these slides are shared so that you can follow all the links in here to all the code and the other references that I make. Finally, just a quick introduction.
03:01
I'm a software developer and computer scientist in the United States. I'm a long time free software user and contributor, and I love to learn new things. So that's why I'm here today to present to you something new that I've learned. So let's talk first about what is RISC-V.
03:22
RISC-V is an instruction set architecture. Instruction set architecture is an abstract model of a computer and a concrete implementation of an instruction set architecture is a CPU. The instruction set architecture tells the computer
03:41
all of the things that it can do. It specifies how everything operates and crucially, it specifies the operations that the programmer can instruct the computer to do. Those instructions, those operations and the way that the programmer tells the CPU
04:01
how to operate or the computer how to operate is through something called an instruction set. And the instruction set contains a set of instructions. You're all familiar with instruction set architectures. A few examples, the Intel x86 and ARM.
04:20
So RISC-V instruction set architecture is similar to Intel x86 and ARM, but unlike Intel x86, it has the RISC-V has a reduced instruction set. The opposite of a reduced instruction set is what x86 is.
04:41
It's a complex instruction set. And in a complex instruction set, each of the instructions that the computer can take and do is a high level instruction. In other words, you might be able to tell the computer to do something like, make me a grilled cheese sandwich. That's a pretty high level instruction that includes a bunch of low level operations
05:02
that need to be done in order to produce that grilled cheese sandwich. In a RISC processor on the other hand, you would break that complex operation down into a bunch of smaller operations, such as set location to the pantry, walk to the location,
05:20
set the target to the door, open the door, et cetera. The idea and the benefit of doing a RISC computer versus a CISC computer is that you can optimize each of those low level instructions and you make no assumptions about what the programmer wants to do. So the programmer is free to add
05:42
as many different operations to compose as many low level operations as they want to come up with arbitrary high level behavior rather than being confined to the high level behavior that the computer designer thought that the programmer would need. That's the technical difference.
06:01
The policy difference I think is even more important. Unlike ARM and x86, RISC-V is free and open. That means that if you want to, if you're a computer manufacturer and you want to implement a CPU that follows the RISC-V instruction set, you don't need to license that from ARM or x86.
06:26
That's a huge, huge relief and a huge benefit. The work on RISC-V began in 2010 officially, but the concept of making a RISC computer and the lineage of the RISC-V process
06:42
of instruction set architecture has its lineages in the 1980s, beginning with David Patterson's research at Berkeley. Now that we know what an instruction set architecture is and what RISC-V is, let's talk a little bit about what the Caesar cipher is.
07:01
A cipher is just a way to encrypt text and decrypt text. This cipher is named after Julius Caesar famously, who is said to have used it to communicate with his army. Now, the sender and the receiver of encoded messages share a key, and that key is a shift distance.
07:22
The sender encodes a plain text message by shifting each letter forward by this shift difference, by this key. The receiver commensurately receives the decoded message and decodes it by shifting each letter backwards
07:41
by the same shift distance. As an example, if we wanted to encode the message piece to all of our generals in the military, we would shift each letter of that word forward by our shift distance. In this case, we're gonna say it's a shift distance of one.
08:03
So the message that our enemy might intercept is QFBDF. And unless they were smart enough to decode our secret key, which is so amazingly secure here, they would not be able to understand what that is.
08:20
However, our general in the field would be able to shift each of those letters back by one and decode that message to receive the intended message that we mean to send piece to the enemy. So how do we implement the Caesar cipher?
08:41
Well, we can do it in one of two ways. The easiest way is to do it sequentially where we encrypt each letter of the message one at a time. And what I'm showing here is how we would encrypt and encode the message piece. The first thing that we do is we assume
09:01
that each of the letters in the message is actually it's ASCII representation. So the P is represented by the number 112, the E by 101, the A by 97, and so on and so forth. Because it's ASCII and we assume ASCII, we know that each of these will fit in an eight bit spot.
09:25
So the first thing that we wanna do is encode the letter P. How do we do that? Well, we just add our shift distance, which in our example is one, and we get the letter Q represented here by 113. The same is true for the E, the A, the C, and the E.
09:49
This sequential implementation can be accomplished in a form, in a paradigm called single instruction, single data.
10:01
Each instruction that we did, each operation where we added one, operated on a single piece of data. In this case, it was the P and the letter one. So our instruction was the add, and our single piece of data was the P and the letter one.
10:24
Now, that seems very slow, and perhaps we can do it better. I'd really, since none of the operations depend on the other, I'd really like to be able to perform the entire encoding in parallel.
10:41
And we can accomplish this using the paradigm single instruction, multiple data. So here's how we accomplish that. We simply say, all of my data is this string piece and the shift key one for each of those.
11:01
So now we have two bits of data, but we have a single instruction, and that instruction is add. As a result, what we get is the message encoded all in one fail swoop. So how do we look at that
11:21
in terms of the operations that the CPU can perform? Well, let's look at this as two vector operations where we have a single instruction and multiple data. Our data comes from vector zero and vector one.
11:42
Vector zero has one, two, three, four, five instructions, and the same with vector one. What we'll do is we assume that vector one contains the key for the message, and that is a one in the F, a one in the G,
12:01
a one in the H, a one in the I, and one in the J. And we assume that the up here has our message, P-E-A-C-E. Here I'm just representing A through E so that I can show you conceptually how this works. The single instruction is to add vector zero
12:24
to vector one and store the result in vector two. That's a single instruction, a single operation. And the data comes from vector zero and vector one. That's our multiple data. The result is that vector two is filled with A plus F,
12:45
B plus G, C plus H, D plus I, E plus J, all in one operation. That makes things significantly faster. Now, how is this implemented in a traditional instruction set architecture?
13:04
Well, in a traditional instruction set architecture, there may be a vector system. And let's say that that vector system is called V. That vector system has a series of vector registers that can store data and can store vectors.
13:21
Let's say that we only have space enough in the vectors for each reg... In the vector system for each register to be 64 bits wide. But we wanna operate as if that's more than one element. So we can break this down
13:41
into two 32-bit elements inside a single vector. That's a vector with two elements. And let's say that I wanna perform some operation on two of those. Well, I might do a V add 32 and a V sub 32, a V mole 32, et cetera.
14:02
Those are different operations. But I can break this down further. Maybe I want four 16-bit elements in each register. And I want to perform operations on those. I might do a V add 16, a V sub 16, et cetera.
14:23
Now, what happens when I want to add some additional information? And I want to make this so that I can make the registers bigger. Well, that's easy, but is it? I wanna maintain all the backwards compatibility.
14:41
So I leave in my instruction set, I leave vector system V. And I simply add a new one. We'll call that vector system X. A vector register in vector system X is now 128 bits. So I want to be able to divide that
15:01
and perform operations. Well, now I can divide that into two 64-bit elements. I can divide it into four 32-bit elements or eight 16-bit elements. And for each one of those, I need a separate set of operations. But as you can tell, that escalated very quickly.
15:23
Now the programmer has to remember all of these different operations in order to be able to perform their work. On the other hand, RISC-V is much more flexible. It specifies that a vector register in a vector system is simply N bits wide.
15:43
And then I can divide that up into however many different elements fit into that by calling the V set operation one time. In this case, I've called V set and said that I want as many 64-bit elements in my vector register as possible.
16:01
Then once I've configured it, all I have to do is call V add, V sub, V mul, et cetera. It remembers the configuration. Now I can do the same without having to remember any more operations just by setting the vector register, the vector system, to say that each vector register
16:21
has as many 32-bit elements as possible. So on for 16, and I've not added any other operations. That's more like it. So how do we do this for our RISC-V implementation? Well, we have our plain text
16:42
and we configure the vector. We configure the vector system into eight-bit elements for each vector register. And we have some leftover padding here in most cases, which is fine, we just won't use it. So we load the plain text into the plain text vector.
17:05
We load the key into the key vector. Then we encrypt, which is an add operation. We just add the T and the one. We get U, E in the one, we get an F,
17:20
S in the one, we get a T, the T in the one, we get a U. Then we store that cipher text back to the, back to memory. Now, let's see how that looks in practice. I've got here my code
17:40
that I've downloaded from the internet, and you can see all those are all available on GitHub. And I've started by building this code using build-scissor.sh. And now I can simply execute run-scissor.
18:02
And it asks me to enter a string to encode. I enter piece and away we go. Piece encrypts to QFBDF. Pretty simple, right? Awesome. Now, what you may wonder is how I got an entire computer
18:24
running on RISC-V and how I was able to do that demonstration. Well, actually I wasn't. What I was doing was using an emulator. And if you're in this dev room, I'm assuming you're all familiar with the emulator. But why would I use the emulator in this case? Well, number one, it's very easy to test using an emulator.
18:44
Number two, there's a lack of available RISC-V hardware that runs commodity operating systems. There's plenty of RISC-V hardware out there that are dedicated to very specialized processes and it is proliferating across the industry.
19:01
However, they don't all run commodity operating systems like Linux, which is what I wanna run. More importantly, I wanna say that it's a lack of available RISC-V hardware to me. I don't have any. There are plenty of options out there. In fact, the Beagle 5 was just announced a few days ago,
19:20
which is going to be $160 system on a chip that will let you run Linux, which is awesome. Finally, and probably most importantly, is the vector operations that we're using to run to do this Caesar cipher implementation in parallel are not yet standardized. So there can't be any actual implementations out there
19:41
that perform these. So we need to emulate them. So let's talk about how we built this code from our source code all the way to our executable application. The first thing on our host computer, we run a cross compiler
20:01
and we let the RISC-V assembler and cross compiler take our application source code and turn it into an executable. Again, that's on the host computer. Then we fire up Spike, which is our emulator. But Spike can't directly implement,
20:21
can't directly execute a user application. Why not? Because all it's doing is executing, is emulating hardware. It's not emulating anything else. So what we need is a proxy kernel. And that proxy kernel will get started when the emulator starts,
20:41
do the initialization of the emulated hardware appropriately, and then pass off control to our Caesar cipher application, which is the point at which we're able to see that prompt on the screen. So from the perspective, let's dig a little bit deeper here
21:01
and let's talk about the proxy kernel. The RISC-V computer has three privilege modes of execution. There's machine mode, supervisor mode, and user mode. And here's how the process boots. The machine starts in machine mode
21:21
and it initializes control at the reset point. The first function that the kernel invokes is initFirstHeart. And the heart is a hardware thread, and that's the most fundamental thread of execution in a RISC-V CPU. Then it calls bootloader to do the bootloading.
21:43
Then it relinquishes the machine mode privileges that it has and enters supervisor mode, completes the rest of the bootloader. And finally, it will relinquish supervisor mode privileges and go to user mode, at which point it will start the,
22:01
it'll call the start application, they're called the start function in our application, which in turn calls the main function. And I'm assuming that you're all familiar with a main function in a program. Now, not only can Spike emulate programs and run them,
22:21
it can also debug a system, which is pretty cool. Spike gives a GDB-like debugging interface with break points that can be either unconditional or conditional, and it will allow for memory inspection. So it's very GDB-like, which is really, really neat.
22:44
Let's give a quick example of how we would debug our Cesar Cipher application if something were wrong. Let's say that I want to stop at the very first instruction that I wrote in my program and march through it one step at a time
23:00
so that I can see what's going on. I want to stop at the beginning of the main function that I wrote. Well, while there are break points in Spike, they operate a little bit differently than in GDB. In Spike, we cannot define a break point using a symbol. So we can't say break main.
23:20
We have to say equivalently that we want to break on a memory address. Therefore, in order to set our break point, we have to define the address, we have to find the address of main in memory and set it as the break point. So let's see how that looks.
23:40
The first thing I'm going to do is I'm going to use objdump in order to find out where main is, what its address is. So I'm going to say run objdump, which is a script in the repository, and I'm going to redirect that output to caesar.obj.
24:07
All right, now I'm going to open the caesar.obj file and I'm going to look for the main symbol.
24:22
Look, it's right there, amazing. All right, so if you're familiar with objdump, and even if you're not, this is relatively straightforward to read once you understand what's going on. All this is saying that at address 0x10194,
24:41
there's an add instruction and that's where main is. So let's open this file up over here so we can follow along. It'll be a little bit small, but that's okay. Just for reference, so we'll quit this. And now what we're going to do
25:00
is we're going to execute the command debug caesar, which is going to build and run spike in debug mode so that we get our command line, our debuggable interface, just like we would if we ran the program in GDB. So what you can see here is that you get a list of operations if you run help,
25:21
and you can figure out what each of those are just by reading the documentation and playing around, which is what I did. So let's set our break point. We'll say until PC, which is until the program counter on core zero,
25:40
which is our only core, hits the address 0x10194. This instruction means execute the application until it hits address 0x10194. All right, so now we did that.
26:01
And now let's continue to step through our program and see where we are. If you look over here, we expect to see that the next operation that we run is going to be this add I, where we move the stack pointer down to make room on the stack. And voila, there it is.
26:23
Core zero is just executed this add I instruction. I could do that again, and I could do that again. And in fact, if I just hit enter, it'll do the same operation one more time. And what you'll see is the instructions
26:41
that are executing over here correspond to the instructions that we have in our compiled code. Pretty darn cool. If there's time in the question and answer and people want to learn more about this, we can explore this a little bit more.
27:01
So in conclusion, what did we talk about? Well, we talked about RISC-V and learned a little bit about it. We talked about what the Caesar cipher is. We talked about two different ways to implement the Caesar cipher. We talked about the uniqueness of RISC-V vector extensions. And we played around with the spike,
27:21
which is a RISC-V emulator. We showed how you can emulate the application and debug the application. Again, all code is online, and I encourage you to submit feedback and ask questions. Thank you so much for taking the time to listen to the presentation today. I know that there are so many great presentations here
27:41
going on in parallel, that all of this is so overwhelming and you had a choice to choose my presentation, and I really sincerely appreciate it. Thanks again to all the organizers of the conference and especially of this particular track. I'm looking forward to the question and answers.