64 bit Bare Metal Programming on RPI-3
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 611 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/41853 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Year | 2017 |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Core dumpPhysical systemComputer fileBootingForm (programming)DiameterCodeAssembly languageSemiconductor memorySystem callComputing platformFamilyBefehlsprozessorVideoconferencingBitWhiteboardComputer programmingConfiguration spaceOperating systemLibrary (computing)Sheaf (mathematics)Web pageScripting languageFuzzy logicPlastikkarteArmCuboidCase moddingMedical imagingRight angleText editorGreatest elementControl flowLevel (video gaming)Shared memoryPersonal identification numberWeb 2.0Data conversionCartesian coordinate systemCompilerWindowDifferent (Kate Ryan album)WordComputer architectureMultiplication signEmulatorDecimalInternet forumCausalityFirmwareMereologyMessage passingRadical (chemistry)Limit (category theory)Video game consoleRun time (program lifecycle phase)Integrated development environmentBinary fileEmailSerial portXMLProgram flowchart
07:26
Sound effectUniform resource locatorNumberAddress spaceBitProduct (business)Video game consoleCodeSlide ruleNormal (geometry)Exception handlingExtension (kinesiology)Latent heatCore dumpLoop (music)Variable (mathematics)Level (video gaming)Integrated development environmentSemiconductor memoryComputer hardwareCoroutineCoprocessorDialectBefehlsprozessorCache (computing)Device driverClique-widthVideoconferencingFormal languageDifferent (Kate Ryan album)SubsetPlastikkarteEmailComputer programmingKernel (computing)Linker (computing)Personal identification numberSerial portBroadcasting (networking)Directory serviceServer (computing)Document management systemPointer (computer programming)Focus (optics)1 (number)Communications protocolBack-face cullingDemo (music)Complex (psychology)Casting (performing arts)WordMedical imagingASCIIDemosceneBuffer solutionCartesian coordinate systemFirmwareScripting languageECosSystem callProcess (computing)Right angleDigital photographyMappingTranslation (relic)XML
14:47
Computer animation
Transcript: English(auto-generated)
00:00
The next speaker is a member of Ada Coree. He contributes to the GNU Ada compiler. It's Tristan Wingal and he will speak about 64-bit bare-metal programming. Let's upload him. Let's take care of the light.
00:21
Here you are. Hello, thank you for coming. So it's a talk about bare-metal platform, which is usually things that come without boxes, like that, and particularly without any operating system. So when you program on bare-metal platform,
00:42
you don't use any operating system. Why you want to do that? The main reason is because there is not enough resources to use an operating system. For example, this is Arduino. There is not enough memory to have an operating system. But there are other reasons.
01:02
It's fun. But it's different from usual. It's fun. You can learn a lot of things. Low-level things. There's a lot of things to learn about when you do bare-metal programming. And I have chosen Raspberry Pi 3. Why?
01:21
Mainly because it's very, very popular, which means there are a lot of forums. There are tutorials on the web about how to program directly on Raspberry Pi, and also because it's a very safe platform. You cannot break it. It will always work.
01:43
However, there are some drawbacks with Raspberry Pi 3. Because it's based on the Broadcom system of chip, there are very few documentation about it. Here is a page about the Raspberry Pi 3 platform
02:02
documentation, which basically says, okay, it's a ARMv8 CPU. Thank you. It's also written in the marketing documentation. And for more documentation, see Raspberry Pi 2 or Raspberry Pi 1. End of the documentation.
02:23
Not enough, but we can deal with that. So maybe you know about the Raspberry Pi family. The first one was the Pi 1, which was based on a very old ARM core. The Pi 2 was much more interesting because it's based on a new core,
02:41
and there are four cores. So I wanted it. And the last one is even better because it's four 64-bit cores. So I want it, and I want to use it. The architecture of the Raspberry Pi is a little bit weird.
03:00
There are two, well, there are four ARM 64-bit CPUs that share level two cache. And there are also the video core GPU, which contains the firmware, which uses the firmware. And they share the memory.
03:22
The boot process of the platform is interesting because it's, I would say, unusual. Because it's the GPU that starts the first, running its firmware, and then loading from the SD card the application into the memory.
03:44
And then, once the application is loaded, it starts all the CPUs. So the nice things about this platform, the Raspberry Pi platform, is that the CPUs boot start from your code, not from firmware code.
04:03
Only the GPU use firmware. There are a couple of files that need to be present on the SD card. Some files that are used by the GPU to boot, configuration file, which is interesting,
04:21
and your image that will be loaded in the RAM, and will be executed by the ARM CPUs. If you want to execute 64-bit codes, you have to specify some command in the config.txt file.
04:45
It's explained. So let's start our first bare metal program. Usually, we do things like either breaking leads, or writing a message on the console. So I will do something quite common,
05:04
which is a hello word on the console. And for that, you need a time limit emulator connected through a serial to USB converter. This is the URL of the code I will show
05:21
in a moment, so it presents to you. So this is a USB to serial converter, and you connect it directly on some pin in the header of the Raspberry Pi 3. Very bare metal.
05:41
Quickly, this is a Mac file. So there are two main files, the CRT though, which is the assembly code that is executed, and the main C code. We don't use any C library. We use linker script to tell a section are grouped,
06:04
and we create not an ELF file, but a binary file. So at the end, you have to copy this file on the SD card. The CRT0 is the usual name for C run times zero,
06:23
which means the first file to be executed by the, in fact, non-present C runtime. It is written in assembly because you do so low level things that it cannot be expressed by C code. It has to initialize the card, the board of the card,
06:40
but on the Raspberry Pi, it's very easy because the GPU does most of the initialization. For example, it does set up the RAM. It does set up the video, so everything is much easier on this platform.
07:01
However, you still have to create an environment small, necessary to execute the C code. So this is the whole assembly code for the Hello World. This is the first instruction executed. There are four CPUs, and all the four CPUs
07:23
are started together, so you need to put into a busy loop three CPU and keep only one, which is done by Discord. And then you have to initialize the first CPU, the main CPU, here you load the stack pointer,
07:43
here you clear the memory that has to be cleared for the C environment because all the variables are initialized to zero, this is done here. And finally, you call main. So our C code, C code starts with main,
08:02
like normal application, and this is the code we have seen previously that's called main. You can do whatever you want to do in C, but there is no C code and no C library, so you have to write everything you want to execute.
08:25
This is the main code, so there is next slide for how the UART, so the serial console is initialized, and we do just a puts Hello World. Puts is here, it print every characters,
08:43
and to execute every product to print every character, we deal with extended backslash n to backslash n backslash r, and this is how to print one character, so we wait until the UART is ready, and when it is ready, we write one byte at one specific location that will have a side effect
09:03
and will be sent over the serial line. This is how to initialize the UART, so this is most of the code. So this is very bare metal things. We change some bits at some specific address
09:23
that are specified here, and this has a side effect of initializing, well, here, enabling the UART, specifying number of bits that will be transmitted, specifying the speed of the UART, and here we have to specify that the pin are, in fact, used for the UART.
09:44
Okay, this is documented in the Raspberry Pi documentation, and this is very, very bare-board stuff. To correctly gather all the things and specify your address, we use a linker script,
10:04
okay, nothing very interesting, and then you have your first Hello World program. So what can you do next? Accept things like Hello World in different language. You can write your own drivers.
10:21
Well, if you want to start, you can start with GPIO because that's very easy, just a way to send signal on the headers. E-squared-C, SPI, as a way to communicate using serial protocol, and they are very easy. Using SD card isn't very difficult to program,
10:43
but it has a, well, a small stack of things to do to communicate with the SD card. Video is very easy because most of the work, if not all, is done by the GPU firmware, so you just have to say, okay, I want a frame buffer,
11:03
and you get a reply saying this frame buffer is at this address with that width and that height. You can do drivers if you want, for USB, Bluetooth, Wi-Fi, Ethernet, except that's much difficult, and documentation is not very, very extensive
11:23
on these topics. If you want more performance, you have to enable cache because without cache, well, the CPU starts with cache disabled,
11:40
which create abysmal performance, so you want really to enable cache for performance, except that if you enable cache, you have to specify that higher regions are not cacheable because higher regions have side effects, so they must be stored, must be just,
12:01
must be go to the device, and win must be come from the device, and if you want to specify that some regions are not cacheable, you have to set up the MMU, which is a little bit complex, and when it is set up, you specify that some regions are not cacheable,
12:22
and the easiest way to set up the MMU is to use 1.1 mapping, so no translation, just writes on regions. You can also try to use the four cores,
12:41
so as we have seen, all the processors start, and we have put three in the busy loop. There is a specific register to get the core number, so you get number from one to three. You have to specify a stack for each processor, and to execute a specific start routine for each processor,
13:05
but don't forget to initialize hardware only once. If you want to go even farther, you will have to know that core start has the highest protection level, EL3,
13:20
and you can switch to lower level, execute code to lower level to go to, well, from exception level to hypervisor level, and then from hypervisor level to kernel level, and if you want, you can also go to user level. There are code in the SMP directory that does exactly that,
13:44
so it setups, it enables cache, it setups MMU, and start all the four cores.
14:01
What we have done with that, we have done, well, one colleague, it's not me, sorry, one colleague has done a broadcasting demo, we choose the four cores, we choose DMS2D from the GPU to speed up, except that it doesn't choose the GPU,
14:22
and we have reached 60 frames per second. So this is a screenshot, well, not a screenshot, a photo, and this is, if I can't play it, this is a video from the demo.
14:41
So, dam it all. That's it. Thanks for this.