We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Self-Hosting (Almost) All The Way Down

00:00

Formal Metadata

Title
Self-Hosting (Almost) All The Way Down
Subtitle
A FPGA-based Fedora-capable computer that can rebuild its own bitstream
Title of Series
Number of Parts
542
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
I will demonstrate FPGA-based 64-bit RISC-V computer, capable of booting and running the riscv64 port of Fedora. Using Free/Libre packages available as part of the Fedora repositories, this machine is capable of recompiling not only its own software (e.g., kernel, glibc, gcc), but also its own gateware (i.e., FPGA bitstream), completely from source code, all the way down to (but not including) the physical (FPGA) silicon. Modern hardware development shares many similarities with software: design and specification in a programmatic hardware description source language (HDL), and compilation of said sources into either photolitographic masks etched into silicon for Application Specific Integrated Circuits (ASICs), or into configuration data (bitstream) for a Field Programmable Gate Array (FPGA). Hardware vulnerabilities (accidental or intentional) can be inserted during any such lifecycle stages: as part of the design in HDL sources, during compilation where buggy or malicious toolchains generate malfunctioning designs from clean HDL sources, or during ASIC fabrication, where masks are altered to etch backdoors or Trojans directly into the silicon. Once fabricated, ASICs are difficult, expensive, and impractical to check for vulnerabilities, which can be as bad as a privilege escalation backdoor allowing for a total system compromise, even in the absence of any software exploits available to the attacker. Let's begin by mitigating against ASIC fabrication-time backdoor insertion by using soft-IP-core hardware blocks on FPGAs, which are fabricated in the absence of any knowledge of the final design details, and also consist of a regular grid of identical, generic configurable blocks -- making it easier to inspect for defects. Having settled on FPGAs for hardware designs requiring enhanced assurance, we can mitigate against HDL source and toolchain vulnerability insertion by insisting on openly available sources to both, and on the ability of the system to be self-hosting, i.e., to rebuild everything, from source, without relying on assistance from any external "black box" or proprietary components. I will demonstrate a Fedora capable RISC-V computer based on the Rocket CPU, using LiteX for the rest of its chipset, deployed on a Lattice ECP5 FPGA board, with the bitstream generated from sources by a fully Free/Libre toolchain consisting of Yosys, Trellis, and NextPnR. Most importantly, the computer will be capable of (slowly) rebuilding its own bitstream, by being capable of directly executing the Yosys/Trellis/NextPnR toolchain.
14
15
43
87
Thumbnail
26:29
146
Thumbnail
18:05
199
207
Thumbnail
22:17
264
278
Thumbnail
30:52
293
Thumbnail
15:53
341
Thumbnail
31:01
354
359
410
Musical ensembleComputer wormSharewareComputerPresentation of a groupBitMultiplication signGoodness of fitQuicksortComputer animation
Logic gateRead-only memoryComputerBootingBefehlsprozessorField programmable gate arraySicEndliche ModelltheorieHardware description languageChainLattice (group)WhiteboardBuildingStreaming mediaService (economics)Point cloudCompilerProgramming languageBootstrap aggregatingCompilerHacker (term)Compilation albumTrojanisches Pferd <Informatik>Binary fileIterationBinary codeEntire functionMachine codeDegree (graph theory)Confidence intervalCompilation albumLoginIterationInsertion lossComputer programOperating systemQuicksortCore dumpPhysical systemField (computer science)Theory of relativityStability theoryInstallation artProgramming languagePasswordBitMotherboardBackdoor (computing)Content (media)Streaming mediaSource codeComputerTerm (mathematics)Open setCompiler constructionSemiconductor memoryRootkitBefehlsprozessorFreewareField programmable gate arrayEndliche ModelltheorieSharewareComputer hardwareCartesian coordinate systemWordProcess (computing)Bootstrap aggregatingComputer animation
Hacker (term)Compilation albumCompilerTrojanisches Pferd <Informatik>Binary fileIterationCompilerBitSource codeMachine codeElectric generatorBinary code1 (number)Compilation albumComputer animation
Identical particlesCompilerIterationCompilation albumTrojanisches Pferd <Informatik>CompilerBinary fileSoftwareHacker (term)Component-based software engineeringStack (abstract data type)Run time (program lifecycle phase)Physical systemKernel (computing)ResultantBitIdentity managementRandom number generationoutputCompilation albumProcess (computing)Source codeCuboidBackdoor (computing)Greatest elementReliefDegree (graph theory)Binary codeConfidence intervalComputer animation
Component-based software engineeringStack (abstract data type)Run time (program lifecycle phase)Physical systemKernel (computing)SoftwareHypercubeCompilerSoftware engineeringComputer hardwareBefehlsprozessorField programmable gate arrayArchitectureHardware description languageLogic gateRegister-Transfer-EbeneExtension (kinesiology)FreewareCompilerOSI modelSoftwareCartesian coordinate systemComputer hardwareStack (abstract data type)Compilation albumOrder (biology)FirmwareField programmable gate arrayFreewareComputer programOperator (mathematics)Computer architectureRoutingReverse engineeringHeat transferWordChainGroup actionBinary codeQuicksortDifferent (Kate Ryan album)Computer animation
FreewareLogic gateExtension (kinesiology)Register-Transfer-EbeneField programmable gate arrayHardware description languageCompilerRun time (program lifecycle phase)Physical systemSoftwareOSI modelKernel (computing)Compilation albumFunction (mathematics)LogicMathematical optimizationTexture mappingLogic synthesisOperator (mathematics)outputUniqueness quantificationComputer programMachine codeGraph (mathematics)Axiom of choiceLogic gateBitDigitizingDifferent (Kate Ryan album)Compilation albumBefehlsprozessorAbstractionSoftwareIntegrated development environmentSequenceSocial classSoftware developerConfiguration spaceProcess (computing)Universe (mathematics)Level (video gaming)Hidden surface determinationTerm (mathematics)Block (periodic table)Declarative programmingBuildingComputer animation
SoftwareField programmable gate arrayComputer programBlock (periodic table)Streaming mediaComputer hardwareSurfaceSicHardware description languageTrojanisches Pferd <Informatik>BefehlsprozessorBackdoor (computing)Chemical polarityShared memorySlide ruleAxiom of choiceComputer configurationBefehlsprozessorAuditory maskingGroup actionTrojanisches Pferd <Informatik>Information securityComputer hardwarePerspective (visual)Game controllerComputer animation
Field programmable gate arrayStreaming mediaComputer programBlock (periodic table)SoftwareRing (mathematics)Regular graphAsynchronous Transfer ModeVulnerability (computing)BefehlsprozessorBitBuffer solutionKernel (computing)Configuration spaceSoftwareSequenceBlock (periodic table)Computer animation
Block (periodic table)Computer programField programmable gate arrayDisintegrationTerm (mathematics)Streaming mediaSurfaceComputer hardwareSicHardware description languageTrojanisches Pferd <Informatik>Backdoor (computing)BefehlsprozessorChemical polarityIndependence (probability theory)Equals signFreewareLogic gateCompilerSoftwareSharewareLambda calculusWhiteboardSoftware bugIndependence (probability theory)Probability density functionLambda calculusRegular graphWhiteboardIntegrated development environmentModal logicSource codeBefehlsprozessorSurfaceRight angleMereologyPerfect groupSoftwareField programmable gate arrayBlack boxHardware description languageComputerQuicksortFlagSet (mathematics)Machine codeCore dumpOrder (biology)Multiplication signExtension (kinesiology)Identity managementRing (mathematics)CumulantComputer hardwareLink (knot theory)Gateway (telecommunications)Computer animation
Streaming mediaPhysical systemWhiteboardRandom numberFirmwarePlastikkartePartition (number theory)Kernel (computing)BootingSlide ruleLink (knot theory)BuildingLevel (video gaming)Address spaceRoutingNetwork topologyTable (information)Electronic mailing listComputer fileOpen setQuicksortPartition (number theory)Right anglePlastikkarteComputerBootingService (economics)MereologyBinary codePhysical systemMedical imagingMehrplatzsystemCore dumpSynchronizationMathematical optimizationComponent-based software engineeringRevision controlKernel (computing)Modulo (jargon)Default (computer science)Asynchronous Transfer ModeBitComputer fontFilm editingDataflowLine (geometry)Computer animation
FirmwarePartition (number theory)PlastikkarteBootingKernel (computing)Computer-generated imageryRootkitVideo game consoleSerial portComputer configurationEmulationCasting (performing arts)Radical (chemistry)Multiplication signPerfect groupOpen setRight angleConfiguration spaceSoftware bugPatch (Unix)Arithmetic progressionLink (knot theory)SharewareComputer clusterKernel (computing)BootingVirtual machineElectronic mailing listAddress spaceMedical imagingFlagStructural loadComputer animation
SharewareBootingStreaming mediaStructural loadBuildingMaxima and minimaDenial-of-service attackLambda calculusWhiteboardTouchscreenPlastikkarteBootingType theoryComputer animationPanel painting
Gamma functionGame theoryKeilförmige AnordnungGEDCOMLibrary (computing)BootingKernel (computing)Panel painting
Hill differential equationExecution unitMIDISign (mathematics)MedianService (economics)Multiplication signPhysical systemPanel painting
LiquidExecution unitManufacturing execution systemGEDCOMInformationArchitecturePasswordMaxima and minimaGamma functionFingerprintConvex hullStirling numberElectronic program guideComputerFirewall (computing)Video game consoleBootingAsynchronous Transfer ModeComputer networkVirtual machineInterrupt <Informatik>Normal (geometry)Order (biology)Panel painting
Execution unitMoment of inertiaQuantum stateEmailComputer networkVirtual machineOrder (biology)Type theoryLoginWindowComputer programDirect numerical simulationMultiplication signAuditory maskingVideo game consolePanel painting
World Wide Web ConsortiumConvex hullUser interfaceGEDCOMLaceGastropod shellComputer-assisted translationInformationBefehlsprozessorPlastikkarteMereologyBootingPanel painting
Video gameWorld Wide Web ConsortiumData miningCellular automatonSequenceIRIS-TGamma functionMaß <Mathematik>Execution unitLink (knot theory)Semiconductor memoryComputer fileNetwork topologyPlastikkarteSlide ruleBootingPatch (Unix)Multiplication signPanel painting
Convex hullComputer iconEmailExecution unitSynchronizationImage resolutionPhysical systemComputer networkResolvent formalismPanel painting
CryptographyElectronic data interchangeLimit (category theory)Woven fabricConvex hullGEDCOMMIDIEmailGamma functionMaxima and minimaFunction (mathematics)Reduced instruction set computingType theoryConfiguration spaceAliasingPanel painting
Moving averageHill differential equationExecution unitEntire functionBuilding
World Wide Web ConsortiumWhiteboardPoint (geometry)BitPanel painting
Module (mathematics)MaizeMenu (computing)Computer fileComputerWhiteboardGastropod shellPanel painting
Annulus (mathematics)VacuumRight angleTask (computing)Metropolitan area networkHeegaard splittingProcess (computing)
Hydraulic jumpGamma functionFunction (mathematics)SummierbarkeitComputer file
Maxima and minimaExecution unitProcess (computing)SummierbarkeitComputer animation
Bimodal distributionSharewareBootingStreaming mediaStructural loadBuildingHorizonWhiteboardSlide ruleComputer animation
Kernel (computing)Computer configurationProcess (computing)System on a chipBootingTerm (mathematics)Field programmable gate arrayGraphics processing unitBlock (periodic table)Process capability indexVideoconferencingGraphical user interfaceSicOpen setFreewareRhombusGradientLaptop2 (number)FreewareProcess capability indexSequenceLattice (group)BitBlock (periodic table)ChainVideo gameCore dumpState of matterComplete metric spaceConfiguration spaceMultiplication signSocial classLevel (video gaming)Order (biology)Computer engineeringBuildingBootingVideo cardReal numberPlastikkarteCountingHidden surface determinationComputer animation
Computer animationProgram flowchart
Transcript: English(auto-generated)
exchange and showing things. All right. Thank you. Okay. Ready to go. All right. I don't mind. No, it's nice.
Yeah. So, it's time for the- Yeah, exactly. Okay. Thank you. Good morning everyone. Thank you for being here. My name is Gabriel Samo. I'm going to try to get the introductions over with quickly. I work for Carnegie Mellon's CERT, which is the sort of OG CERT the US government started back in the 80s after the Morris worm because they suddenly realized computers were going to
be a thing they were going to have to care about. The cool thing about that is I get to indulge in my paranoia and OCD in a professional capacity, which is much much better than it sounds probably the way I make it sound. I'm going to probably sit down during this presentation every once in a while when I need to work the demo a little bit. So, don't think that's weird.
So, with all of that out of the way, we're going to talk about self-hosting and why that's important and how it impacts things like hardware and the ability to trust it. And then further into that sort of distinctions
between ASICs application specific integrated circuits, dedicated silicon versus programmable FPGAs and what the threat models are in the trade-offs and how much you can trust each one of those and what you're gaining and losing when you're switching between them. And then next will be a demo of what probably is
the slowest most memory constrained computer that's capable of running Fedora Linux that you've seen recently. It will be on a 50 megahertz rocket chip CPU, soft core CPU running on an FPGA. It's going to have 512 megs of RAM
in this particular incarnation. It is using, like I said, rocket chip and LiteX on an FPGA with free open tool chains, EOSIS, Trellis, and XPNR being used to build a bit stream for the FPGA. And then this computer,
when it runs Fedora, you can install EOSIS, Trellis, and XPNR on the computer that was built using those tools and run those tools on the computer to rebuild bit stream for its own motherboard. So, it's basically like a self-contained, self-hosting thing which is really exciting. So, let's start with this whole idea of self-hosting.
Most of you are probably familiar with what that means. The joke is, well, no, it's not me hosting my own content on Google Drive or somewhere in the Cloud, but rather it's a term of art in the field of compiler design and it means a compiler is written in its own language and it can compile its own sources.
Then there's a related concept of bootstrapping, which is basically kind of, well, you have a self-hosting compiler that built its own sources, both chicken and egg, which one was there first? Well, there had to be a third party trusted compiler that was originally used to build the first binary of our own compiler before we could rebuild it.
At some point, we reached stability where the next iteration of the binary we build out of the source isn't significantly different from the one we already used and that basically means we've achieved self-hosting and the process for that is called bootstrapping. One interesting thing about self-hosting compiler is that they suffer from this attack that Ken Thompson pointed out.
Ken Thompson being one of the designers of the Unix operating system among many other glorious achievements. He pointed out that compromised binary of a self-hosting compiler could be created that attacks clean, otherwise sort of benign trustworthy source code
and builds malicious binaries. One being of like the scenario he described was the attack against the login program which if you build a login program, it'll have a backdoor root password that will allow somebody to log in without knowing the actual system root password. The other thing this malicious behavior does
is it inserts itself into any subsequent iterations when it detects that the compiler's own sources are being built using it. So, it's a self-perpetuating attack that isn't actually present in the source code. The only way to get rid of that would be to reboot strap the entire compiler because presumably, we do trust the sources and the sources are clean and there's
no malicious behavior specified in the code itself. One way to not necessarily get rid of the problem, but to point out or to test whether we have been subjected to one of these attacks is David A. Wheeler's PhD dissertation called Diverse Double Compilation and in the example here,
we'll be using CC as our suspect compiler and TC as the third-party compiler, and it's not necessarily T for trusted, it's T for third-party. The heuristic here is that we pick the third-party compiler in a way that gives us a high degree of confidence that it is not
in collusion with the suspect compiler. So, the people who put it out aren't the same group, think maybe GCC on one hand and MSVC, Microsoft on the other or something like very diverse, that's where the word diversity comes from. The way this works is that if we
compile the sources of CC with both our own suspect binary and with a third-party binary, if everyone's innocent and no one's trying to screw us over, what should happen is we should be obtaining
binaries reflecting the sources of CC that are functionally identical, because these are diverse different compilers, they would produce different code, the code generation would be different. So, the binaries aren't bit by bit identical, but they should be doing the same thing because they're implementing the same source code. Then if that is true, then the next move would be to take the sources to CC
again and rebuild them with our two intermediate compilers that we obtain, and if you control for the initial conditions, if you have the same initial condition, same random number generator seed and everything, and identical input pumped into functionally identical binaries, the result should be bit by bit identical.
If that's true, then we can breathe a sigh of relief and say, okay, we are very unlikely to be subject to a trusting trust attack, and that degree of confidence is sort of equivalent to our heuristic ability to pick a third-party compiler that isn't in collusion with our suspect compiler.
By the way, the highlighted box on the bottom here is basically the process of bootstrapping CC using the third-party TC compiler. So, back to self-hosting. If you have a self-hosting compiler and source code to everything,
the binary of the compiler when it operates, when it runs, it runs on top of, I don't know, a C library and the kernel, and basically a software stack. It's an application on top of that, but it's an application that can compile all of the things it needs to run itself. If you have source to everything and you've compiled everything
from sources that you otherwise trust, then you have a self-hosting software stack built around your compiler, and the applications are bonus, all the stuff you actually want to use the computer for. If you build out from source, but the stack of software with the C compiler at the top,
systems, libraries, kernel, and whatever you have underneath that for the software is a self-hosting software stack. Examples of that we have in the wild, there's the Linux ecosystem, there's the BSD ecosystem. Those are all compliant to this with this idea.
Now, there's a holy war going on with whether hardware will respect your freedom or not, and some people are claiming that hardware should be completely immutable and never upgradable with firmware or anything like that
in order for it to be completely respecting of your freedom and no binary blobs, and different people say, well, I mean, you may actually be able to put free firmware blobs on your proprietary firmware blob-enabled hardware of today, if you just reverse engineer it and so on. But anyway, the idea is in order
to trust the computer, it's not enough to just have a self-hosting software stack. We need to understand what hardware does, and hardware as we've learned in recent years isn't really hard at all. It's very, very mushy, very complicated. It does all sorts of things that scare us, and we need to take a closer look at it.
So, software talks to an instruction set architecture and a bunch of registers that are mapped somehow, and that's basically where software talks to the hardware that's demark here, and then there's all sorts of layers underneath, microarchitecture, whatever.
It all ends up with this register transfer level, which is combinational and sequential logic, basically a bunch of gates, a bunch of flip-flops, a clock, and so on. It's not my word for it, it's just a word I picked up from the wild. I don't know exactly who to attribute this to,
but these layers of the hardware stack are typically referred to as gateway, and it's the stuff you write in something like Verilog, or VHDL, or MiGen, or Chisel, and so on. Then obviously, all of this has to run on actual physical hardware which could be dedicated circuits,
application-specific, integrated circuits, or optimized silicon, or programmable FPGAs. So, if we have free software tool chains for HDL compilers for making gateware out of sources,
which we do thanks to the group who put out IOSIS, Claire Wolf, and GateCat who made the Trellis and the next PNR place and route software. So anyway, if we have those things, those are software that can be built by the self-hosting C compiler which can compile the software stack.
Now, this thing can take source code, HDL sources, and build all the layers of gateware which then support all the operation of the software stack. So, you have a self-hosting software plus gateware stack. Unfortunately, that leaves for now out the actual physical layer,
the silicon versus the FPGA. So, this is as far down the layers of abstraction. We can go with self-hosting that I'm personally currently aware of. So, being a relative late comer to developing hardware,
I'm a software person, have been my entire career, took a couple of classes at the university where I work, learned Verilog, learned a bit of digital design, and it surprised me to realize that essentially designing gateware is sitting down in front of a development environment and writing a program in
some functional slash declarative syntax like Verilog and VHDL. You basically write a program and then hit the compile button, and it compiles your code into ever more elaborate, basically graph netlist of building blocks and eventually gates,
and then you have a choice of building a binary blob, which is bitstream for the FPGA, and it's basically a binary blob just like a binary blob comes out of an actual program you write for software. The difference being software will tell some CPU a sequence of steps of what to do, whereas bitstream will tell an FPGA what to be.
It sits there and it acts out the configuration that is being compiled into a binary blob. But other than that, it looks like software development to me, and I probably am pissing off a bunch of people for saying that. Now, the interesting thing is if you don't want the FPGA bitstream,
but rather would like optimized silicon, then you're further elaborating your gates and your RTL into a very complicated graph of transistors, which then get laid out and made into masks, and there is an entire very, very expensive, very, very involved process of actually etching this and carving it into stone, so to speak.
We have the saying of, well, is it the dog that wags the tail, or is it the tail that wags the dog? Well, in terms of that, making actual silicon is one stage in a compilation pipeline, like a software development compilation pipeline, just like a five-megaton tail
is wagging a tiny little Chihuahua dog, basically. But if you look at it from a software guy's perspective, it's just one stage of the compilation pipeline. Just figured I'll share that with you. So, now, we have the option of doing a CPU, and this slide is specifically from
the perspective of we're going to make a CPU, and the choices are putting a CPU in dedicated silicon versus putting a CPU in an FPGA. With the dedicated silicon, obviously, you have high performance, lower area, high clock speeds.
The problem with that is, from the perspective of the hardware attack surface, one thing we don't control is the foundry, the chip foundry where we're sending those masks to be made, right? Documented attacks that have been done. So, the University of Michigan group had this A2 Trojan at the IEEE security and privacy, like three, four years ago,
and what they did was, if you have access to these masks, then you can tell where things are and you can add, maybe, these things have like billions of transistors. But if you carefully understand how this whole thing works, you can put in 20 transistors in the capacitor,
and the transistors are wired such that when the CPU, because this is a CPU, remember, is executing a sequence of unprivileged instructions, depending on how you wired those transistors in, they incrementally charge the capacitor a little bit at a time until at the end of the sequence, the charge capacitor will flip a bit in the register.
If that register is your CPU privilege flag, as in ring or whatever, your kernel mode versus user mode, then you have a baked into silicon privilege escalation vulnerability that relies not at all on any vulnerabilities in software. So, if you theoretically have perfect software, you'd still be able to basically do a buffer, I mean, not a buffer overflow, privilege escalation attack on
a CPU that's been compromised like that. As opposed to FPGAs, which you're asking the Foundry, the manufacturing facility, to make you a regular grid of basic configurable blocks. It looks like snap circuits for grown-up engineers.
Most importantly, the Foundry has absolutely no idea what this FPGA will be used for, and if it's ever going to be used for a CPU, where on this regular grid of identical blocks, will the register be that holds the crown jewels to like the privilege ring flags or anything like that.
So, pre-gaming an attack in this scenario is qualitatively harder for the hardware manufacturing facility because they don't know what you're going to be using it for, and where your things are going to be put on it by the place in drought software. So, the price you pay for not letting them know where
your privilege flag is going to be by using soft core CPU is basically performance, a huge performance loss, but that's essentially the trade-off. So, if we've decided to use FPGAs because we're paranoid, and we're trying to deny the silicon Foundry knowledge of what we're going to be doing,
the rest of the attack surface is, if we don't trust our HDL tool chain, but we do because it's part of the self-hosting stack and we have source code to it, and then there could be design defects like bugs in the sources to the CPU. Kind of like, I don't know,
Spectre and Meltdown and you'll never know whether those are intentional or just somebody getting away with trying to optimize things, plausible deniability all the way down. But if you have source code to everything, you can always just edit the source code and rebuild things, and you have a self-hosting environment which will allow you to rebuild every part of it as necessary.
Which is what brings me to this slide, freedom and independence from any sort of black box closed non-free dependencies. You can trust the computer that runs as a self-hosting gateway plus software stack to the same extent you can
trust the cumulative set of source code. Now, a lot of people are going to say, well, no one ever reads that much source code, then it's impossible to understand. I agree. I don't want to read any of those sources myself, but the cool thing about it is, if I ever down the road have a question about, hey, this computer did something weird,
I could do a vertical dive into the software layers, the RTL, the source codes to the gateway, the source code to the whatever it does weird, I can actually have enough brainpower to do one debugging session through it. But in order for me to be able to do that, I need to have source code to everything and with the knowledge that I'm not going to read most of it.
So, that's my perspective on this, my ability to trust my own computer. I hope I'm doing okay with time. Are we talking about 15 minutes here? Perfect. All right. So, I am going to now show you a Fedora capable computer built on this Lambda concept board.
So, if you download the PDF from the conference site, the links are clickable. So, it'll take you to the place where I ordered it from. It's commercially available board. Hopefully, they'll make more because it was sold out the last time I checked. It uses LiteX and the rocket chip CPU.
It uses Yoast, Trellis, and XPNR for the toolchain, OpenSBI for the firmware, and then I downloaded the latest incarnation based on rawhide, 37 of Fedora's RISC-V 64-bit port. Thank you, David Abdurakmanov. He's the guy, the one-man show
behind building most of the stuff, and it's really, really appreciated. If you have the LiteX and all its dependency installed, and there's going to be a link in the slide deck to more detailed build instructions for this.
But it's pretty much a stock LiteX build. You install LiteX according to all the recipes that are available online, and then you run this command line which says, we're going to build it with the rocket chip, can't highlight, with the rocket chip CPU, 50 megahertz, I want Ethernet,
I want SD card support, I want to use flow three optimization to the yosis component of the toolchain, I want strict timing, and I want the register map saved to a CSV file. Now, this is all a little bit clunky still at this point because you're going to have to manually
build the device tree table for it. LiteX doesn't build a device tree table for rocket chip-based designs automatically, and it's one of the things on my to-do list to teach you how to do that. But once you have the generic register map
and you know what the addresses are for all the devices, we have to add a chosen boot args line which contains the kernel command line for the booting Fedora. The black font is the standard cut and
paste from what Fedora already uses, modulo this route which is going to be on the SD card. The other thing we need to do is set enforcing to zero because once we have our sync stuff from one image to the SD card, the labels are all wrong and SE Linux is going to scream at us. So, we set enforcing to zero and
then the default is to boot into graphical mode. So, we have to tell it to use like run level three equivalent, which is the multi-user target and system D, and then last but not least, system D is really impatient because it's used to running on I don't know five gigahertz,
20 core systems, this thing's 50 megahertz, so system D will give up on starting services way before the thing actually has a chance to actually start all that stuff. So, we need to increase the system D timeout. Now, enforcing we could get rid of like it takes about a day to relabel the whole SD card on the 50 megahertz system, but then you can get rid of this part of the command line
because it'll actually work properly with SE Linux. You can set to this as the default, so then you can get rid of the multi-user target, but this should stay because it affects both the init RD version of system D and the one that actually boots from the real route.
Now, once we have a device tree blob ready to go, we make a binary out of it and then we build open SBI, that's another sort of stock you get open SBI to just build itself using a built-in DTB right now. So, the other thing that Litex should eventually be made to do is to build a DTB into the actual bitstream and then have
open SBI just take that like it does for most normal computers. For now, we just have to build a binary bitstream thing into the open SBI blob. Then, we put that on the first partition of the SD card has to be
VFAT and there has to be a boot.json file which lists the open SBI blob and its load address, which is the very first address in memory, and then the Linux kernel image and the init RD image. The Linux kernel image and the init RD images come from Fedora or would normally,
but I had to do some customizations. The stock for the kernel has two problems that I'm dealing with right now. One of them is it lacks UART IRQ support for the Litex UART. So, that stuff is kind of making its way right now. It's somewhere in Greg Cage's TTY next tree,
and it's been accepted for upstream, but hasn't made it into mainline yet. The other thing is between the port of RISC-V Fedora port based on Fedora 33, which was the previous major release of
the RISC-V port of Fedora and the current one, a bunch of config flags have been turned on additionally in the stock Fedora kernel configuration. I found two, but I'm working on finding a third one which if enabled will cause the kernel to crash when it boots on
this machine on this computer. Either David will tell me, well, we can get rid of not actually enabling this one because it was enabled by mistake or if it has to be enabled, then I either have to find a percolating patch for some kernel bug that's already been found, or I have to find it and submit a patch myself.
But anyway, that's kind of work in progress. Right now, I've been building a custom kernel and I'm doing that on RISC-V Fedora machine running on QEMU for reasons of speed and I need something to actually build the kernel before I can boot this machine for the first time. Then down here at the bottom,
there's a URL clickable link with all of this, but in much more detail that you can actually reproduce. All right. Well, perfect because now I'm going to sit down and actually try to work this demo for you guys. I recorded an ASCII cast of my terminal. Let me try to maximize this.
I'm flooding my screen and I'm sending the bitstream with OpenOCD.
So, this is the ECP5 bitstream that I built. I'm sending it to the ECPIX5 Lambda concept board. This is LiteX. This is basically where I type SD card boot.
I'm going to try to zoom in so you can actually read the screen and see what it's doing. So, it's starting to boot. Loaded boot JSON is loading the RAM disk, and if I fast forward, it's going to load the actual kernel image,
and then it's going to start booting, and this is what it looks like. It takes a very long time, this whole video, if you have time to watch it at normal speed is four hours long. Well, if it's a 50 megahertz computer, what do you want? So, anyway, let's see.
If I fast forward to this creatively, you'll see System D actually booting here, and a bunch of okay services being started. Let's see, at some point,
we failed to mount what VAR, LIB, NFS, whatever, but we don't care about NFS on this computer. Oh, it also failed to start Firewall D. But other than that, it seems to be pretty happy. At some point, it starts the console.
Let me pause this again. Let's zoom in properly, so you guys can see what it looks like. So, this is a boot prompt for Fedora in text mode. If you don't have IRQ support in your UART, it would trip over itself, getTTY would basically just kind of interrupt
itself before it serviced its own soft interrupt. So, it needs actual IRQ support in order to not crash when it starts. The other cool thing about this is it actually does work on the network. If I Nmap from my normal machine for Fedora,
it'll actually find this. So, 192.168.2.229 on my home network was where this Fedora machine actually grabbed the DHCP lease and talked to DNS mask and everything.
My attempt to log into this machine took about 20, 30 minutes because here's the cool thing, right? You type login and it starts the login program and then it starts bash, right? So, in order for all of that to work, it needs to be loaded into RAM and
linked against glibc and all that stuff. That takes a little longer than the timeout the first couple of times until it's actually managed to pull enough of it into RAM and actually let you log in. So, there's a couple of attempts
and I'm trying to log in both at the console in this window and over SSH. See here, I actually just succeeded because it says, hey, last login something, something that means I'm actually just going to get a shell eventually.
Once I do get a shell, I can start exploring cat proc CPU info looks like this. Proc interrupts looks like that. I have the UART, I have ethernet zero, I have my SD card and this is part of the CPU.
The slash boot.json, this is the file that told the litex BIOS what to load into memory. What else do I have here? This is the actual source. I mean, I just copied it over to the SD card. So, this is the source to the device tree file. There's my boot.args console,
all the stuff we talked about on the previous slides. This is the CPU node. Let's see what else is going on here. My devices, but all of this I had to edit by hand and I promise I'm going to teach litex. I'm going to submit a patch to make it build,
generate this programmatically so that I don't have to modify the device tree file every time I rebuild my thing. So, long story short, I'm going to fast forward over a lot of this stuff. Once I'm able to login from everywhere, the next thing is system D network
or resolve D is not enjoying itself on this machine. So, I had to disable it and stop it and add 8888 and 8844 to an actual hard-coded etc.resolve.conf. At that point, my DNS resolution started working.
Crony also started working because it could resolve like Fedora, INTP, whatever the alias thing it has in its config file. Once I have all of this ready to go, I type dnf-y install python3-mijaniosis-trelles-next-p and r,
and it's doing it really slowly. But if you have patience or can fast forward, which is really cool, it's a cool feature of ASCII. How do you pronounce that?
ASCII, A-S-C-I-I-N-E-M-A, ASCII Cinema? I don't know, but you guys know what I'm talking about, right? Like you can record your- anyway, this is basically what I used here. So, I don't know, we're like 142 minutes into this entire thing and it's installing RPMs.
How many months or years did the PCC build take? Well, we'll get to that. We are going to address that elephant in this room, definitely. Well, so here's the thing, right?
So basically, it takes about an hour and plus to install all the RPMs. But then, let me pause this thing at some point. What I did to demonstrate the fact that it can actually self-host, is I had a very simple Verilog Blinky,
which just makes a counter out of the ECP5 board's LEDs, and if I zoom in here, essentially, this is what it does. It has a counter and the LED0 red, LED1 green, LED2 blue, and LED3 red, that's basically bits 27,
26, 25, and 24 of the counter, and it goes at a couple of seconds, you can actually see it blink. That's the Verilog, and I'm running, actually, here we go, like the build of this. I have a shell script, I'm running it manually.
So, YoSIS is the first thing that creates a JSON file. NextPNR will do the place and route, and then ECP pack will actually take what NextPNR produces and spit out an SDF file, which you can shove at the actual board on which this computer currently runs. So, I did that,
and so in this other window, I have top running, and so YoSIS is using, I don't know, let's see if I keep going, percent CPU, where is percent CPU? Right here. So, it uses about 80 percent of the CPU, I don't know, man DB, whatever,
cron starts some process that drops down to 50 percent, because now it's splitting it with whatever thing. So, I had to kill those while this was running, just to keep it on task and making progress. Run YoSIS, run NextPNR.
This is pretty much, if you've run NextPNR ever before, you'll recognize the output. It succeeds. Then we do ECP pack to generate the SDF file, and once that is over, I did a MD5 sum of the top SDF file,
so that when we run the following demo, or when I show you pushing this thing to the actual board, when it starts to blink, here's the checksum of the SDF file. BAE0, yada, yada, yada, 618. All right. So, at this point, the job is done. It took, I don't know, 50 minutes to build the bitstream.
And if I pause this, perfect. If I pause this creatively here, I am doing an MD5 sum of the top SDF file, and you'll see BAE0D, yada, yada, 618.
That is actually the thing I built on Fedora running on this board. If I let it run and it goes, then you'll see right now it's like kind of, okay, so here it started blinking,
and it blinks exactly like the Verilog I showed you earlier says it should. So, I was capable of building bitstream for this board on Fedora running on this board. With that, we're going back to the tail end of the slide deck, and we're talking about, right.
So, building the blinky on my Intel laptop takes what, 10 seconds or less. So, 10 seconds, dot, dot, dot, 90 minutes. Building bitstream for actual RISC-V rocket chip,
litex takes half an hour, dot, dot, dot, whatever that translates into. You'd be here a very, very, very long time if you waited for this thing to really self-host itself and rebuild its own bitstream. It can do it. We've established that.
It's the qualitative leap has been done. It's just a quantitative problem now. We can make this thing faster, right. So, the immediate thing is to figure out the Linux config stuff. Teach litex how to be more, you know, civilized about booting and generating device trees,
and working maybe with U-boot or something, and actually have a standardized boot process. Lite SEDA is like they have a SEDA core, which works on some FPGAs, but currently not yet on the ECP5. In the medium term, right,
in order to make this thing a little faster, right, on my VC707 board, I can get eight cores running at the 100 and 150 megahertz. So, basically twice or three times as fast and eight times as many cores as I can fit on the lattice chip.
The problem is, if I do that, it's not self-hosting because I need Vivado to pull that off. I need the Xilinx proprietary tools. So, whatever I can do to encourage or join or whatever in the future the effort to target the Xilinx, large Xilinx, not just any Xilinx chips, large Xilinx chips with complete free tool chains.
Count me in, let me know, tell me what I need to do. I don't have a lot of money, but I have a lot of determination. I'm a very stubborn individual. All right. Well, thank you. I'll take that as a compliment. No, for real. I got that. We could put in fancier IP blocks.
If it's a larger FPGA, maybe we can get away with some kind of video card like thing, or maybe be a PCI master so we can plug video cards or other cards into this computer. Then, in the long science fiction term, I mean, what I'm doing right now is I'm taking a class or a sequence of
classes that culminate in taping out an actual ASIC at Carnegie Mellon, which I'm doing in my spare time. I want to understand how ASIC fabrication works because I want to have something useful that I can say about it right now. It's just all high level. Oh, yeah, you can't trust the fab, but I have no idea what goes on in one of those things,
and I want to know what goes on in one of those things. Then, there's been a kid who probably just graduated from their electrical computer engineering department, Sam Ziluf was his name, but he was famous before he joined CMU because he did some silicon transistors stuff on integrated circuits in his own garage
probably with like 70s technology or whatnot. But it's a start, right? Then, maybe in the future, it would be really cool if I lived long enough to see some nano assembler, kind of like a 3D printer in my house that maybe costs as much or less than the average American single family detached home or something.
Because right now, the way chip fabrication works, they're like you can count them on one hand, how many actual places are that make these things, and then obviously, they have the attention of important people and nation state actors and all this stuff. It would be nice if we could democratize that a little bit more.
So, if I live long enough to see that I've either lived a very long life or something cool happened in my lifetime, either way, I win. And with that, thank you. Time for questions or do we do that off? Okay, awesome.
It was a good talk. Just use the entire 40 minutes. Sweet. Thank you.