We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

eBPF loader deep dive

00:00

Formal Metadata

Title
eBPF loader deep dive
Title of Series
Number of Parts
542
Author
Contributors
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Everyone who ever used eBPF has interacted with a loader(libbpf, cilium/ebpf, aya) but not many users know what actually happens behind the senses. I hope to give some insight into what it takes to load eBPF programs into the kernel and how features like BTF, Global data and CO:RE actually work.
14
15
43
87
Thumbnail
26:29
146
Thumbnail
18:05
199
207
Thumbnail
22:17
264
278
Thumbnail
30:52
293
Thumbnail
15:53
341
Thumbnail
31:01
354
359
410
BootingLibrary (computing)Structural loadKernel (computing)Object-oriented programmingSoftware development kitAbstractionBootingSocket-SchnittstelleRun time (program lifecycle phase)Library (computing)Term (mathematics)Multiplication signBootingBootingComputer programmingComputer programmingSystem callStructural loadMobile appContext awarenessKernel (computing)DiagramComputer animation
OvalUsabilityComputer programmingFunctional programmingDataflowFitness functionData storage deviceLogic programmingAddress spaceSource codeCombinational logicBitRight angleComputer animation
File formatRead-only memoryFunction (mathematics)Revision controlSocial classLine (geometry)EmailBootingSheaf (mathematics)Computer fileVirtual machineProcess (computing)InformationElectronic program guideFile formatComputer programmingNormal (geometry)Structured programmingComputer animation
String (computer science)Data typeFlagRevision controlVirtual machineType theoryInformationField (computer science)Function (mathematics)Social classEmailSheaf (mathematics)String (computer science)Program codeComputer programmingFile formatComputer animation
Kernel (computing)Attribute grammarWeb pagePrinciple of maximum entropySystem callSet (mathematics)Computer programmingComputer animation
Computer programStapeldateiLink (knot theory)Generic programmingLevel (video gaming)MultiplicationSystem callFunctional programmingPersonal identification numberOperator (mathematics)Wrapper (data mining)File systemObject-oriented programming1 (number)Constraint (mathematics)Computer programmingComputer animation
Computer programEmailComputer programmingMacro (computer science)Sheaf (mathematics)BootingProgram codeCompilerDiagram
Computer programOpcodeContent (media)Sheaf (mathematics)MultilaterationNumberOpcodeSource codeStructural loadObject-oriented programmingHydraulic jumpComputer animation
Computer programCore dumpSystem callFunctional programmingComputer programmingCore dumpSheaf (mathematics)Address spaceSymbol tableObject-oriented programmingCompilerComputer animation
Computer programIEC-BusPointer (computer programming)Kernel (computing)Computer programmingBootingSemiconductor memorySpacetimeComputer animation
EmailCubeMapping2 (number)Term (mathematics)MereologyInformationComputer programmingMultiplication signDiagramComputer animation
Core dumpKernel (computing)Pointer (computer programming)MultilaterationLine (geometry)Computer programmingRevision controlSpeicheradresseIntermediate value theoremComputer animation
DisassemblerEmailSymbol tableLevel (video gaming)Structural loadKernel (computing)Computer fileDataflowPoint (geometry)InformationIdentifiabilityComputer animation
MappingBit
MappingInformationLevel (video gaming)DataflowType theoryMereologyComputer animation
Data typeAttribute grammarInformationFile formatLevel (video gaming)Revision controlInformationSymbol tableType theoryParameter (computer programming)NumberNormal (geometry)Functional programmingMappingFormal verificationMoment (mathematics)Computer animation
Manufacturing execution systemCore dumpBootingSystem callAttribute grammarDifferent (Kate Ryan album)Computer animation
Data typeFile formatAttribute grammarInformationComputer programmingRevision controlCompilation albumKernel (computing)Line (geometry)InformationComputer fileBootingComputer animation
Interior (topology)StrutType theoryPoisson-KlammerNumberInformation
Data typeGame theoryDataflowKey (cryptography)Descriptive statisticsKernel (computing)Level (video gaming)Sheaf (mathematics)
Data typeType theoryInformationCodierung <Programmierung>Price indexEmailRow (database)Line (geometry)Core dumpInformationLine (geometry)Functional programmingInterior (topology)Field (computer science)MereologyString (computer science)Type theoryRevision controlSource codeQuicksortElectronic mailing listComputer animation
InformationStructural loadLine (geometry)Type theoryKernel (computing)Field (computer science)Level (video gaming)Computer programmingComputer fileSheaf (mathematics)System call
Object-oriented programmingCompilerMacro (computer science)Level (video gaming)BootingMacro (computer science)InformationBlogLibrary (computing)QuicksortVideo gameBoiling pointComputer programmingType theoryKernel (computing)CompilerComputer programmingComputer fileCompilation albumSystem callHierarchyGoodness of fitComputer animation
Gamma functionMaizeCore dumpStructural loadMathematicsComputer programmingNetwork socketHTTP cookieMathematicsStructured programmingCodeCompilerFunctional programmingPoint (geometry)MereologyMacro (computer science)Pointer (computer programming)Arrow of timeComputer animation
Line (geometry)InformationRow (database)EmailData typeString (computer science)Core dumpRevision controlNumberGoodness of fitKernel (computing)BootingEqualiser (mathematics)Type theoryPatch (Unix)InformationField (computer science)QuicksortStructural loadGeneric programmingMultiplication signSheaf (mathematics)Computer programmingCartesian coordinate systemString (computer science)Personal identification numberInterpreter (computing)Revision controlRight angleMathematicsBinary fileCuboidBinary codeComputer animation
Program flowchart
Transcript: English(auto-generated)
All right. Let's get started again. So welcome back, everyone. The next talk is from Dylan about eBPF Loader Deep Dive.
Yes, hello, everyone. Thank you for attending. Before we start, I have to make a quick confession. I'm only 80% done with my talk. No, but really. Today, I'm going to talk about eBPF loaders.
And while I'll do my best to go as deep as I can within the time constraints, there is, of course, so much more to go through. So let's start with what is a loader, for those of you who are not even know. So the term can be used in multiple contexts.
But for the purpose of this talk, I will refer to a loader as any program that interacts with the kernel via syscalls. Or what you more commonly see is a program that uses eBPF Loader library to do most of that work for it.
So examples of loaders are IP and TC, which can be used to load XDP programs or TC programs, for example, but also BPF tool, which can do the same, or BPF trace, or even your own app if you decide to use a loader library
and make something great. Loader libraries are basically obstructions on the eBPF syscall to make it easier to use. Kind of like libc, but for BPF, which is the name for
the first example comes from, libBPF. But of course, there are many others, like Aya, where we had a talk before on this day, or BCC or Selium eBPF, for all examples of loader libraries, libraries that load BPF programs into the kernel.
So why do we need loaders? This is an example, this is the program example we've been working with today. And it's quite simple. So if we, on the left side, I do declare a map, which we will be using to store flow data, so packets and bytes
per second, for a combination of source address and destination address. And on the right is a bit of logic that checks that we have enough data, and interprets it as IPv4. Now there's a handle IPv4 function mentioned here, but
it doesn't fit on the slides, so we'll get to that later. When I compile my program, I get what's called an ELF, an executable and linkable format. Or linkable, now that I think about it. Whatever. If you, a normal C program, if I were to pull any random
hello world C program from the internet, compile it, like I showed in the above command, we'll get out an executable. And you can use it out of the box, no need for trickery or things. You make it executable, and you execute it, and you get
hello world on the command line. If you get an eBPF program, and you try to compile it with commands you found on the internet, you'll get a relocatable. Now if you try to execute it, you'll get an error. So it doesn't work. What you need is a loader.
The executable that we have is like a pre-made IKEA furniture. But the relocatable we get for eBPF is two pieces, and perhaps, if you're lucky, a guide on how to put them together. And this is the job of the loader. Putting the pieces together, and providing the guide to
make it easy for you to use it. Now, an ELF, as we generated, has the following structure. So we have this large file. We start with an ELF header, which contains information like, this contains eBPF.
And it's this many bits, this machine. And it has a bunch of sections. These sections have names, and each of them can have a different format. So the string type has a bunch of strings.
Our programs have a bunch of program code in them, et cetera, et cetera. But those are referred to each other. So you have all the arrows, and they point to each other, and they link to each other. But in this form, it's not that usable, because the
kernel only understands syscalls and eBPF programs. It doesn't know how to handle such an ELF. So what the BPF syscall looks like is like this, if you pull up the mem page. We have a bunch of commands. Each command has attributes.
And in the kernel, they're defined in a very big union. And every command has its own set of attributes that you can use to instruct the kernel to ask the kernel to do something for you. I can't go over all of them because of time constraints. But the most important ones are loading your program,
creating a map, loading BPF, and of course, interacting with the map, attaching it somewhere, et cetera. There are quite a few commands. Each of them does slightly different things. And the loaders, in most cases, provide functions that either call multiple of these to do a batch, like a
big operation, a high-level operation, or they provide small wrappers for you to do your low-level operations yourself. There are also links, which is a newer concept. And you can pin your objects to the file system. So they live longer than your program. And we have a few other miscellaneous functions for
doing measurements, statistics, iteration, et cetera. But I can't go in this talk, unfortunately. So back to our program. When we write our program, we have a macro here that says sec. That's quite unique for BPF.
Every BPF program needs to have this section tag there. And this tells the compiler to put all of the program code in the specific section that we named. And the name of this section follows a convention, which can be used by the loader to inform it that
this is an XDP program. So it should be interpreted as such. Now, we can dump this section. So if we dump this section with the LVM object dump, then we get out this, which is hard to read if it's not annotated. But it's a bunch of eBPF instructions,
starting with the opcode. So the actual opcode that tells it if it's add, subtract, whatever. Source and destination registers for various opcodes act on. We have offsets for jumps. These are relative. And intermediate data for, to say, load some data
into a register, like a constant value. And sometimes we can use two of them together to represent a 64-bit number. But we'll get to that later. We can also ask object dump to decompile this for us. And we'll get the decompiled eBPF program.
So the bytes on the left side and the actual program on the right side. But you'll notice that there's a call here. So one thing that I didn't tell you before is that the handleIPV4 function that we have is marked in such a way that it won't be in line. So it's a separate program.
And eBPF can do BPF to BPF function calls. And if you do that, it puts out this instruction, a function call instruction, but with minus zero. Where do we call to? Well, currently nowhere, because we haven't assembled the pieces of our furniture yet.
So what actually, what also happens is that the compiler will emit relocation information, which we can again visualize. And it says, all right, we have a certain instruction at this given offset, and you should put a relative address of this other function in here.
Then we can go to the symbol table and we can look up this name and it says, oh, that function lives in the .text section where for eBPF programs, all of the function to function calls, all of the functions live together.
So we have these two separate pieces of the puzzle and they refer to each other. But the kernel only has one pointer for our instructions. It expects that every program we give it is one contiguous piece of memory with instructions
and it all should work. So we have some work to do. We need to figure out, or the loader rather, needs to figure out how it wants to lay out our program, so piece all of the puzzles together, find all of these references, and then put in the correct offsets. All of this happens in user space before we even go to the kernel.
Now, second fun thing is that we can define our map. So again, we have the sec part, .maps, put it in the .maps section, and this is the part that I have been hiding from you until now.
It's also quite simple in terms of eBPF programs. We get an IPv4 header, check that we can use it, and we write, or we get a value from the map, and if it doesn't exist, we write a new one and increment the values every time this happens
to account for some information. So keep this program in mind, and then if we go look at the instructions again, the disassembled version this time, we see that we have two of these long lines which are zero at the end. So these are the 64-bit intermediate values
that I was talking about, and they're this long to pre-allocate room for actual memory addresses later instead of relative jumps, but they're zero, and these should be references to our map, and later on, these will become pointers when the kernel gets its way with it.
And in our case, we again need to figure out what to put in here. So same routine, we have relocation information, the relocation information points to the instructions that we have. It says you need to plug in a flow map here. We go to the symbol table,
and there it says we have a dot map section, and there lives a flow map. In this case, we handle it slightly differently, so we then have to go load this flow map first, get a file descriptor, which is our unique identifier for the map, and we need to actually put in that file descriptor into these empty values
so the kernel knows where to go. Creating maps is also a command, so we have the map create command, and it takes these arguments. I cut out a bit of the later ones, but these are the essentials, what type, how big are my values, et cetera,
give it a nice name, and there are two ways to define these. We have the new way of doing it, which are called BTF maps colloquially on the left, but there's also the old way of doing it using a BPF map definition on the right.
Don't use it. If you go into libbpf in the part of the libbpf which is used during eBPF construction, it will warn you that you shouldn't use it and go for the left side. But the odd thing is that if you use these newer BPF maps on the left
and you go look at what's actually then written to your .map section, it's all zero. There's no information. It still allocates room for your map, but they'll all be zero and there's no information. All information instead is in the type information of the flow map.
So we have to get in what is BTF. BTF stands for BPF type format. It's derived from DWARF, so the actual DWARF debug symbols that already are used for normal C programs, but as a way, compact or smaller version of it,
which only really is concerned about type information and not about where at which moment a variable lives. And these are used because eBPF itself is just too limiting and we want to do more, especially in the verifier. So we have, for example, features like spinlocks,
which should only be used on maps that have spinlock values in them, or we have callback functions so we can define these BPF functions, but instead give them to a helper function. But this helper needs to then know that it's the correct number of arguments
and the correct type. So all of this type information we can give to the kernel, and that's why it's, especially if you want to use these new fancy features, it's important to use the BTF information. It also allows for flexible map arguments. So for example, if I go back, we have the definition,
and one of the things you'll notice is that we have pinning as an attribute here, but you will not find it in the syscall attributes. This is purely something that we communicate to the loader library. That we communicate to the loader library, not just lib eBPF,
but that's just the name that it has currently. And we can do a lot of different cool things with that. It also provides debug information for us. So if we go look at loader programs, it will be annotated with the line information and from which file we can read. And perhaps one of the coolest feature
is compile once, run everywhere, which allows the loader and or the kernel to modify our program slightly, so it will run on multiple versions of the kernel, even if the internals have changed. So if we dump this BTF that we have
from our example program, it looks like this. Things, features to note are the numbers on the left in square brackets. Those are the type ID. Besides its actual type, so we have pointers, integers, arrays, you can basically represent every C type
within BTF this way. There's an optional name, and then there's a lot of information about the specific type. And they refer to each other, so you'll notice a lot of type ID is something else. So you can also visualize it by nesting it. I've done this manually. By the way, there's no comment that does this,
but this is how you can do it yourself. So we have the, for example, a map section with our flow map in it, and you can see that we have the type, the key, the value, and we have this very detailed description of exactly how it's structured that which offsets which things live and the names for it which are used
to check all of these certain things, and also to create a de-loadable, we'll use this to infer the actual value and key sizes to give to the kernel. This BTF is structured in, so it lives in a .btf section, and it's sort of structured like this,
so we have this header, then types and a lot of strings, and each type starts with the same three fields, so we have a name offset, so an offset into the strings. We have information and a size or type, depending on what the information says. This translates into the name
and the type of the BTF information, and then the last part is specific to that type, so encoding for ints or a list of fields for a structure, et cetera. We also have the .btf.ext, the extended version of it, and this contains function information, line information, and optionally core relocations.
So the line information contains a bunch of lines, so it will annotate this instruction as part of this line of your original source program, and functions to label every one of these BTF functions
that you have defined. Loading the BTF itself is quite simple. You use the load BTF command in the BTF syscall over the BPF syscall, give it the blob that we have. It needs to be slightly changed, especially for the data size, the data section type,
but that's more details to explain exactly why, and a bunch of logging information. Once you have it, we get a file descriptor of the BTF object, and of course we have all of these type IDs, so when we are loading our map again, there are these fields where you can say,
this is my BTF object, which contains all of my types, and this is the type of my key, this is the type of my value. That's how we wire everything together. The same goes for programs, so we give it the program, the BTF the program uses, and we give it these file, these func information and line information blobs,
which will make sure that everything is nice and annotated in the kernel. So we end up with a sort of hierarchy that looks like this. We start by loading the BTF. We can then load our maps, which use it, and then once we have our map file descriptors,
we can load our programs after we have, of course, assembled all of the pieces of our program. And that all happens, can happen within one call to a loader library. And for the last part, the core, which I touched on a little bit earlier, like I said, compile once, run everywhere.
There's this really good blog post for, which I encourage everyone who wants to use the feature, to, which contains information on how to actually use it. But what it boils down to is there are, in libbpf, there are these macros to make your life easier,
and they boil down to a bunch of compiler built-ins. And they're basically questions to ask the loader just before, or the kernel just before, or while loading the program. Like, where does, what is the offset of this field? Where does this type even exist?
Do I have this enum value? I have this small program that writes, writes values to, or that captures a certain, or the cookie value of a socket when it closes. Not useful at all, but it does help us to illustrate the point. When we, when this macro resolves, it looks like this.
And the important part to notice here is that we do a helper call, and where the arrow starts, we have the socket pointer, and we have an offset, and we add an offset, which we get from this built-in function. This offset, this offset is then,
gets encoded in the 104 that we see here. This is this offset that we add to the pointer in the actual code. But the compiler will also emit this relocation, which will tell us to, that this might be a piece of the code that we want to tweak, depending on if the structure changes.
So if we, again, look at this relocation, there, unfortunately, as far as I'm aware, there is not a good command line tool to visualize or to decode this, so I decoded one manually. It looks like this, so it says, okay, instruction number two, which is the instruction that we were, that we were at. Instruction number two refers to type ID 18,
and it has this accessor string. And this accessor string is a bunch of numbers, which is basically offsets, like the field number that it tries to access. So the sockets, then the second field would be SK common, and then cookie, and so forth.
Now, this type information that we knew when we created the program is included in the BTF section. But the kernel also has BTF types for all of its types it has. So we can do a sort of diff, do a comparison, and see that, for example, fields changed position,
or we can't find a certain field. And our loader can do this, can resolve this, see it, and then patch our code, change this offset value right before we actually load it, which makes it possible to use it on so many different kernel versions.
I'm out of time. That's everything I can offer you for now. Are there any questions, and thank you. Thank you, any questions?
There's one in the back, all right, okay. It's difficult now. Can you pass this on? Hey, thanks for the great talk. So I haven't dealt that much with BPF,
but since we have those binaries that we cannot really launch, because we have to load them in another ELF, right? At least as I understand. Would it make any sense to make either a loader that would just work out of the box for those binaries, or use the bin fmtem misc feature from the kernel
to be able to load those BPF ELF files, and use some kind of generic or general interface, and just load them and run them? Yeah, but I think it does make sense to some extent. There, for example, the IP tool
that doesn't have anything additional, so it takes this ELF and just loads it as best as it can, and there is probably some way to use the interpreter in the ELF itself, just like we do for dynamically loaded executables.
As far as I know, no one has tried it so far, but I think it could work, at least for a limited, use case, where you don't have to, where you would only load something and pin it, and then allow some other application to actually work with it afterwards. Yeah, thank you. All right, thanks, we are out of time.
If you have more questions, you can find Dylan in the hallway, and yeah, thanks again.