We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Semihosting U-Boot

00:00

Formal Metadata

Title
Semihosting U-Boot
Subtitle
Look, ma, no serial!
Title of Series
Number of Parts
542
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Semihosting provides console, filesystem access, and other functions over a debug interface, such as JTAG. This is especially useful when traditional bootstrap interfaces such as serial, USB, or Ethernet are not available in hardware. This talk will discuss implementing improved semihosting support in U-Boot; semihosting's strengths, weaknesses, and how to work around them; and how to semihost U-Boot with OpenOCD on your next board bringup. Semihosting has long been used to provide host services for embedded ARM systems, especially microcontrollers. However, its use on Linux-capable systems has been much patchier. Vendor-supported recovery modes often use JTAG, but seldom make use of the features provided by semihosting. U-Boot has supported loading files using semihosting on ARM Virtual Express platforms since 2014, but lacked serial support and integration with standard commands. In release 2022.07, such support has been added, motivated by use on QorIQ platforms. NXP QorIQ platforms require a valid configuration programmed into the boot source in order to boot. Although there is a fallback configuration, it does not support traditional firmware loading interfaces such as USB or Ethernet. By using U-Boot semihosted over JTAG with OpenOCD, a recovery image can be loaded which is sufficient to complete device programming. The same binaries can be used to boot from eMMC as well as JTAG, simplifying configuration. Because semihosting is standard across ARM platforms which support JTAG debugging, similar strategies can be reused whenever JTAG is the most convenient communication method. The target audience of this talk includes users of ARM or RISC-V platforms with JTAG Boot; users of U-Boot, OpenOCD, and QEMU; and developers of other bootloaders who are interested in adding semihosting support.
14
15
43
87
Thumbnail
26:29
146
Thumbnail
18:05
199
207
Thumbnail
22:17
264
278
Thumbnail
30:52
293
Thumbnail
15:53
341
Thumbnail
31:01
354
359
410
BootingSerial portPhysical systemCodeFactory (trading post)WritingData storage deviceBootstrap aggregatingBootingFile Transfer ProtocolPlastikkarteBefehlsprozessorComputer hardwareComputer networkConfiguration spaceWordPartial derivativePerformance appraisalInstallation artDebuggerOpcodeParameter (computer programming)Read-only memoryOrder (biology)BootingConfiguration spaceHypermediaComputer hardwareElectronic visual displayIterationOperating systemPowerPCMultiplication signLine (geometry)Internet forumSerial portHard disk driveTelecommunicationIntegrated development environmentRun time (program lifecycle phase)BootingPeripheralWhiteboardMaxima and minimaBootstrap aggregatingFlash memorySemiconductor memoryBitRootCodeFile systemParameter (computer programming)Personal identification numberInterface (computing)OpcodeMathematicsTDMAFunctional (mathematics)PlastikkarteDebuggerCommunications protocolVariety (linguistics)Cartesian coordinate systemRight angleContext awarenessData storage deviceDifferent (Kate Ryan album)Bus (computing)DivisorPerformance appraisalFunction (mathematics)Structural loadImplementationPointer (computer programming)Point (geometry)Computer programmingFactory (trading post)Loop (music)System callDiagramComputer animation
Read-only memoryWritingString (computer science)BootingStrutOvalPhysical systemBootingBeer steinDynamic random-access memoryStructural loadRun time (program lifecycle phase)Variable (mathematics)Message passingLibrary (computing)ImplementationProcess (computing)Error messageEmulatorDebuggerComputer programSynchronizationRange (statistics)Computer programmingAddress spacePhysical systemBootingProcess (computing)EmulatorNegative numberWritingDebuggerError messageFunctional (mathematics)Semiconductor memoryString (computer science)Computer fileOverhead (computing)Structural loadDirectory serviceExistenceQuicksortIntegrated development environmentClassical physicsDynamic random-access memoryReading (process)Memory managementPointer (computer programming)Semantics (computer science)Multiplication signParameter (computer programming)Standard deviationImplementationNumberSystem callRegular graphVideo game consoleMedical imagingBootingLengthComputer hardwareBuffer solutionVariance1 (number)Computer animation
DebuggerBootingReduced instruction set computingBootingServer (computing)Single-precision floating-point formatMessage passingSoftware developerBitLetterpress printingThumbnailSlide rulePhysical systemBlogComputer fileProcess (computing)Patch (Unix)String (computer science)Insertion lossDevice driverType theoryAsynchronous Transfer ModeControl flowComputer hardwareRadical (chemistry)QuicksortFunction (mathematics)Regular graphScripting languageVideo game consoleElectronic mailing listAddress spaceCoprocessorDebuggerInterrupt <Informatik>Closed setComputer programmingConfiguration spaceSerial portMessage passingServer (computing)ArmCore dumpSoftwareoutputAuditory maskingCodeLink (knot theory)Revision controlDifferent (Kate Ryan album)Computer animation
Program flowchart
Transcript: English(auto-generated)
Hi, I'm Sean. Today, I'm going to talk about semi-hosting in the context of U-Boot and what it is and how it works and maybe why you might want to use it. So first, I want to ask how do you bootstrap a system?
So you might do this for two reasons. One, you have a new board right from the factory and it has nothing on it at all and you have to get something on it. And the other one is maybe you bricked it and this happens to me sometimes, actually happens quite a lot, especially when I'm working on U-Boot and the board will no longer boot.
So there's two basic steps usually. The first one is you want to get something running on your board and the second one is you want to then write something to storage so you don't have to do the process again. So there's a variety of protocols you can use. USB, of course. I like UMS. It's very nice. It makes your device look like a USB flash drive, which is very, very convenient.
There's also a bunch of Ethernet stuff, the classic TFTP. Fast boot makes an appearance twice because it can do both. If you have an SD card, bootstrapping is super easy. You just pop out the SD card and put whatever you want on it and put the SD card back in.
But a lot of boards don't have SD cards, so this is not always an option. There's serial. This is usually kind of slow. So you might only want to use it for the code execution part, but it's definitely there. Some boards have it built into the bootloader. You can just flash something over serial. And there's also JTAG.
JTAG is kind of a classic one. Also slow, you probably wouldn't want to flash your whole root file system over it, but it's pretty reliable and a lot of boards have it. What if you only have JTAG and you don't have any of these other nice protocols? So I'd like to take a little bit of a different approach to the problem.
And let's talk about something totally different, which is the NXP Core IQ line of communications processors. These are the newest iterations of a very old line, which stretches to the M68K, and there's a very long lineage of PowerPC stuff in there.
And they tend to have lots of Ethernet, some PCIe, some USB, but not any display interfaces. So they're not really media socks, and they often have hardware accelerated networking. So you can do some stuff in hardware, which you would normally do in software.
And this is kind of the main selling point on why you might want to use these. So all of these have something they call a reset configuration word, or RCW. And this started back in the PowerPC days as just basic initialization. What end-to-end of your sock is going to be, maybe what dividers you're going to have on some clocks,
how wide your boot bus is, what are you going to do with your debug pins? And this is kind of a small amount of data, so they stuck it on their some pull-ups and pull-downs on some of the pins. And this is a very standard thing you will see on a lot of different socks. And then they wanted some pin muxing, because when they originally started with this,
all the pins were fixed function, and you can sell more chips if you can change the function of some of the pins, so that you can use like USB on one chip and maybe Ethernet on another. So they added some pin muxing, and they added it to the RCW. And then they added a lot more pin muxing, because the more pin muxing you have, the more applications your chip can fit into.
So they started running out of pins, because they started getting maybe like 128, 256, 512 bits of stuff that they needed to configure. And so they decided they were going to put the RCW on the boot device. So the first thing the sock does when it boots up is it reads off this RCW,
and it configures all the pins, and then it continues at the boot. And this is kind of convenient, but it creates a chicken-and-egg problem, where in order for your sock to boot up, there has to be something on your initial device. And if you're in a situation where you have to bootstrap it, there's nothing there, so the sock won't boot up.
So what they did is they created a hard-coded reset configuration word. This is for maximum compatibility. They would disable all the peripherals, and you would just have your boot device. And so you could always boot into this and be safe and not break your board.
But this is not so great, because they never added runtime pin muxing. So this chip, you select a function for your pins, and you can't change it. There are a few pins where you can change it, but for the most of them, you're stuck. So when you have this maximum compatibility RCW with everything disabled, you have no Ethernet, you have no USB,
you have no serial even, and all you get is JTAG and your boot device. So NXP knew they had a problem, and they decided to solve it by introducing this override. So you would boot via the hard-coded reset configuration word, and then you would program via JTAG the values that you
actually wanted that would enable all your peripherals for your board, and then you would do a partial reset, and it would come up, and it would load everything like it was supposed to. But there's a couple problems with this. The main one is that they never documented this stuff, so in order to use it, you have to use the JTAG probe, which is,
like most JTAG probes, kind of a gouge because they know you're buying the chip, so you got to have the JTAG probe. And you have to use their IDE, which is a yearly subscription, and they're not cheap. So this is not a great situation, and if you didn't think this was great, here's a glowing review I found on the forums. Our manufacturer uses a single PC to perform the initial programming. On this PC,
they have an evaluation copy of CodeWarrior, which is their IDE. Every time that evaluation copy expires, they erase the hard drive of the PC, install the OS again, and load another evaluation copy. So this is not ideal, and I
thought about how I might address this and make it better, and I remembered something that I learned about a couple months ago. It's called semi-hosting, and the basic idea of semi-hosting is that you attach a debugger. In my case, it's over JTAG, and your code is going to execute a special breakpoint instruction, and when your debugger sees this, it will read out
an opcode in R0 and an argument in R1, and it will do something for you, and then it will give you a return code back in R0. And this is very, very similar to how syscalls work, because your program will execute a special instruction, the operating system will read out your registers, it will do something for you, and give you a return code.
So what do you get? Well, the thing that I wanted most is serial, because I didn't have any. So first, I looked at some of the syswritec, and syswritec is basically putchar. So we can implement puts here, and
so we're going to take in a string, and we're going to loop over all the characters in the string, and for each character, we're going to trap or execute our breakpoint instruction, and we're going to pass for our opcode, the writec, and we're also going to pass a pointer to the character, and if you may know that putchar actually just takes the character,
and so this is kind of an unfortunate performance implication, because we have one breakpoint and one memory access per character in the string, and for JTAG, this is not very performant. If you've ever used a 300 baud modem, you know that it's very slow. This is even slower,
so this is really not useful if you actually want to use your serial output. So we can do better, though. They also have something called syswrite0. This is basically puts, so our PIDOS implementation gets very simple. We're just going to trap with write0, and now we get one breakpoint per string,
but we still have to do one memory access per character, and the problem is that we don't want to read off the end of the string. We have to make sure that we don't go past the null terminator, so the debugger has to read a character, and then see, is it the null terminator, and if it's not, you read another character, and you keep doing this, and we really don't want to go off the end,
but the problem is that for JTAG, setting up a read is a pretty intensive process. There's a lot of overhead, and it can be still pretty slow, so this is faster. We're about ten times as fast, but it's still slow, really not usable, but we can do even better.
So we're going to use syswrite, which is basically the write system call, and for this one, because we have multiple parameters, the previous ones only had one parameter, so it just goes in the argument, but for this one, we're going to fill in our arguments inside of a struct, and we're going to take the file descriptor and the buffer and the length of the buffer,
and we're going to fill this in with standard out, and our string, and the length of our string, and then we're going to trap, and we're going to pass a pointer to our struct, and this is generally how we pass multiple arguments to semi-hosting, because there's only one argument register, so they will take a pointer to the struct,
and so now we get one breakpoint per string, and two memory accesses per string, and this is reasonably fast. We can do stuff with this, and it's not glacially slow. So this is the kind of implementation I ended up using, and if you've been paying attention, you'll note that syswrite kind of implies the existence of sysopen,
and you can open any file on your host system, which is pretty convenient, and you can do all the standard stuff, like seeking it, and reading it, and closing it. We don't get stat, but we do get the file length, which is mostly what we want, because usually we just want to open it, find out how long it is, and then read the whole thing.
So in U-boot, you may classically do something like this. If you want to load your Linux and then boot it, you're going to load it from mmc0, add a particular address, and then you're going to give it a file name, and then you'll boot it, and so we can replace this with load-hostfs, which is something on your host debugger file system,
and that Linux image will get read from the directory that you're running your debugger from, and it's the same structure, because under the hood, it's using the same API, and there's a dash, because there's only one hostfs, and we don't need to have
multiple debuggers support, and there's a special file called colon TT, which I think stands for teletype, and this is your standard in and standard out, and almost everybody uses this except Qemu, because Qemu doesn't have this huge overhead for memory accesses, so they don't actually care if you can use
your console with read and write, and so you just use write0 with them, and it works. So one classic problem with booting with JTAG is that your regular boot process is going to look something like load SPL, and SPL is going to initialize DRAM, and then SPL is going to load
regular U-Boot into DRAM and execute it, and when you do this with JTAG, instead you have to load SPL over JTAG, and JTAG is going to run and initialize DRAM, and sometime you have to load U-Boot into DRAM over JTAG, but we don't really know when, and so a really classic way
to do this is you just pick a time, and you wait that long, and then you load U-Boot, but this is kind of awful, because if you have any kind of variance in how long DRAM initialization takes, or how long it takes, especially if you're doing other
hardware initialization, you have to just wait a lot longer, and in the average case, you're going to be doing nothing, and this can really drive you nuts as a developer, because you might be waiting like 20 seconds, because sometimes it takes 20 seconds, but most of the time it doesn't. So you can also re-implement DRAM in TCL,
and this is a really common thing for vendors to do, because they love just, you know, it's very simple for them, they just write all the registers, and it happens over JTAG, and this avoids the whole timing problem, because we know exactly when DRAM has been initialized, but it's a totally different process from normal, you have to specify your parameters
in a different format, in a different language, it's not going to be tested as much, and it probably won't initialize things in the same way, so it can be more buggy, and it's kind of worrisome, especially when you have to, your regular U-Boot will work fine, and maybe this doesn't work so well. But semi-hosting makes this really simple,
because SPL can load, and then it will, over JTAG, initialize DRAM, and it says to your host, please load U-Boot at this address, and your host will do that, and then it continues on its way, and it's extremely simple to use, and it solves this whole timing problem,
which can be very annoying. So what else do you get? Well, we get some error handling, errorno is practically essential to find out why something failed, is error is not, the idea of is error is that you will pass in a return code, and is error will tell you if it's an error or not, but the problem is that some of these semi-hosting commands have
different semantics for the return code, and most of the time the semantic is negative numbers are errors, so effectively you're doing this whole big semi-hosting call just to compare to zero. So I don't really know why this is in here, and there's actually several functions that are kind of like that. For example, SysTime will get you the real time, which
can be helpful if your device doesn't have an RTC or you don't want to initialize it, but SysElapsed will get you the number of ticks that your program has been running, so maybe you would use this for benchmarking, but the overhead of doing semi-hosting is a lot
larger than the amount of precision that you're going to get, so I'm not really sure why you use that one either. There's some libc emulation. You can pass in a command line. U-Boot, we don't really need this because we have the environment and we have the device tree, and those are kind of classic ways to pass in parameters, but if you're not using U-Boot and you don't have
this sort of system set up, you can get command line parameters pretty easily. There's also where it thinks the heap is and where it should malloc stuff, but usually you know this when you compile. You say this address range is going to be where I'm going to stick my heap, so also
I'm not really sure why that's in there, and as you may have noticed, you can write files, so of course you can mess things up, especially on Unix where you can open up a lot of files that aren't really files and do some fun stuff with them, but you can also just run arbitrary commands and you can move files too, so you have to really trust this stuff that you're
going to run because as far as I know, no one does sandboxing. They just implement all this stuff, so maybe they shouldn't, but that's how it is, so if you've ever used semi-hosting before, you may be familiar with this problem. Breakpoints are actually invalid instructions,
and your program will crash unless there is a debugger attached, and the debugger will handle it for you, and you won't end up executing it, so typically you would have to have two programs, one with semi-hosting enabled and one with semi-hosting not enabled, and the one with semi-hosting enabled you'd have to run with a debugger, but we can get around this using a
pretty simple trick. This one is from Tom Verboer, and the idea is that in your synchronous abort handler, you first check to make sure that we have an invalid instruction, and otherwise you panic, which probably involves printing out the registers or doing something, complaining loudly on the serial, which you might not have. Then we need to check to make sure our
instruction, which is held in ELR, is the semi-hosting ARM64 HALT instruction, which is the special breakpoint, and the lower bits of the PC are actually not the PC on ARM because
they have stuff like are you in thumb mode or not, so we need to mask those off. You could probably just do and tilde 3, and if we actually find out that it was supposed to be a semi-hosting instruction, we're going to disable semi-hosting, which on your processor can do whatever it wants, but on U-Boot it just sets a global variable
that says we don't have semi-hosting, don't try it again, and then we pretend that we get a failure. Negative one is almost always a failure, and then we advance the PC by 4 bytes. So if you want to use semi-hosting in U-Boot, you can enable these configs. The
first one enables semi-hosting of any kind and also enables this command. The second one, semi-hosting serial, will get you some serial input and output, and you'll probably want this serial PUTS because normally U-Boot will print a character at a time, and PUTS will group those characters into strings and print them all at once. And if you
want to have this thing, you will need to enable config semi-hosting fallback. And if you want to use it in SPL, then you can enable the SPL versions. There's no serial version because U-Boot always enables the serial device in SPL that it's using in the regular U-Boot. And these are the things that I worked on adding, and I also
worked on config semi-hosting a lot, but the basic support was already there. There's pretty recently added, so it's either in the January release or maybe the March release. I'm not sure. And if you want to know more about how to enable this, we have a
documentation link. And of course, you're also going to need a debugger. So I like to use OpenOCD, maybe because I'm a masochist. And OpenOCD is a debug server for JTAG. So the idea is you launch OpenOCD, and it connects to your debug probe, and then you
can tell the debug probe to do things like start or stop your processor, and you can also attach GDB to it like it's a running process. So this is pretty simple for OpenOCD. You just halt the processor, you enable semi-hosting, and then you resume it. And typically what you would do is in between this enabling semi-hosting and resuming, you would load your program and then
resume at a particular address. And this you could stick in a script and just run and automate the whole thing. So there's a couple of downsides to OpenOCD. You can kind of think of this as like a wish list or things that annoy me but not enough that I fix them. One of them is that uses the
same terminal for regular like logging messages like, you know, I attached a debugger and that sort of thing as semi-hosting output so they can be kind of get intermixed. So you have to watch out for that. The serial is cooked, which means that when you type something, nothing happens until you hit enter and
then everything happens. And this is kind of okay because if you're editing a command line, it's generally really slow if like you hit backspace and then you have to go to U-boot and U-boot interprets the backspace and echoes it back and then it gets displayed on your terminal. So cooked is nice here. The problem is that OpenOCD is
not doing anything. So if you unplug the device or you hit ctrl-c in your debugger, it won't notice until you hit enter. So this can be kind of fun especially because even if you know about it, you might forget. And this
single-threaded thing also ties into there's no sandboxing. So ideally you would do something like fork off another process and maybe unshare some stuff or put it in a chroot and then that would be where you would run all your semi-hosting stuff. Like it would open the file and you could limit it to just a few files but there's no sandboxing. So your whole
system is there. Once again, you have to trust your stuff. So should you use semi-hosting? I would say not unless you have to, especially not the serial stuff. But it's good to have. If you have to use it, it's nice. And sometimes it's convenient. If you're doing emulation, it can be really simple because you don't have to emulate an MMC device. You don't have
to write a driver for an MMC device. You just call your semi-hosting instruction and you can load the file right into where you want it. And you don't have to do any hardware. And if you're already using JTAG boot, this can be really nice to solve some of your sequencing stuff. But I wouldn't recommend it in general. So I'd like to thank a couple people. Tom Verboere
wrote a blog post on this stuff that got me thinking about the whole thing. Andrei Pishivara did the initial semi-hosting. And he also worked with me when I was upstreaming my stuff. So I'm grateful for that. And of course, Tom Riemie and Simon Glass, who reviewed and merged all of this code
and a lot of other patches along the years. And of course, Meric, who put me up to this talk. And Seiko, who employed me while I was writing the code. And if you're interested in this, there's that blog post I was talking about. There is the RISC-V software spec, which is just the ARM software spec, but they use a different instruction and different
registers. And of course, the ARM software spec. And this link may die because ARM has a tendency to rearrange things. But for now, it works. Thank you. Does anyone have a question? Questions? Yes, but
only for debug prints. And I haven't looked into it that closely. I
think the whole stopping Linux to do a breakpoint is kind of invasive, because Linux tends not to like that. Because your interrupts for that core will just not happen while it's stuck on the debugger. And you can kind of break your devices that expect there to be an interrupt that gets handled in a reasonable manner. So typically, when you stop the processor in Linux, your
EMMC will just break. So generally, I've only seen it for debug prints, and usually only if like you can't get to the real serial console. Yeah. Okay, since we have a couple minutes, I
have a one more slide. So normally, when you will print something, this is what it gets, it'll get like hello,
slash n. And it'll normally print this like H e ll o slash r slash n and inserts the slash r. And it'll do it one character at a time. But as we've established earlier, this is glacially slow on semi hosted hardware. So what I initially did was this, and I printed out hello, slash n, and
then I added the dash slash r. But this will actually break things because they expect it to be r n and not n r, even though like functionally, they're the same. So I ended up having to do it the other way. So if you're implementing this stuff, be aware of that. Although, if if
you are doing this, like on a microcontroller, you can probably just put hello r n in your strings. And maybe that's better.