eBPF loader deep dive
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 542 | |
Author | ||
Contributors | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/61800 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 2023479 / 542
2
5
10
14
15
16
22
24
27
29
31
36
43
48
56
63
74
78
83
87
89
95
96
99
104
106
107
117
119
121
122
125
126
128
130
132
134
135
136
141
143
146
148
152
155
157
159
161
165
166
168
170
173
176
180
181
185
191
194
196
197
198
199
206
207
209
210
211
212
216
219
220
227
228
229
231
232
233
236
250
252
256
258
260
263
264
267
271
273
275
276
278
282
286
292
293
298
299
300
302
312
316
321
322
324
339
341
342
343
344
351
352
354
355
356
357
359
369
370
372
373
376
378
379
380
382
383
387
390
394
395
401
405
406
410
411
413
415
416
421
426
430
437
438
440
441
443
444
445
446
448
449
450
451
458
464
468
472
475
476
479
481
493
494
498
499
502
509
513
516
517
520
522
524
525
531
534
535
537
538
541
00:00
BootingLibrary (computing)Structural loadKernel (computing)Object-oriented programmingSoftware development kitAbstractionBootingSocket-SchnittstelleRun time (program lifecycle phase)Library (computing)Term (mathematics)Multiplication signBootingBootingComputer programmingComputer programmingSystem callStructural loadMobile appContext awarenessKernel (computing)DiagramComputer animation
02:20
OvalUsabilityComputer programmingFunctional programmingDataflowFitness functionData storage deviceLogic programmingAddress spaceSource codeCombinational logicBitRight angleComputer animation
03:08
File formatRead-only memoryFunction (mathematics)Revision controlSocial classLine (geometry)EmailBootingSheaf (mathematics)Computer fileVirtual machineProcess (computing)InformationElectronic program guideFile formatComputer programmingNormal (geometry)Structured programmingComputer animation
04:47
String (computer science)Data typeFlagRevision controlVirtual machineType theoryInformationField (computer science)Function (mathematics)Social classEmailSheaf (mathematics)String (computer science)Program codeComputer programmingFile formatComputer animation
05:19
Kernel (computing)Attribute grammarWeb pagePrinciple of maximum entropySystem callSet (mathematics)Computer programmingComputer animation
05:54
Computer programStapeldateiLink (knot theory)Generic programmingLevel (video gaming)MultiplicationSystem callFunctional programmingPersonal identification numberOperator (mathematics)Wrapper (data mining)File systemObject-oriented programming1 (number)Constraint (mathematics)Computer programmingComputer animation
06:50
Computer programEmailComputer programmingMacro (computer science)Sheaf (mathematics)BootingProgram codeCompilerDiagram
07:26
Computer programOpcodeContent (media)Sheaf (mathematics)MultilaterationNumberOpcodeSource codeStructural loadObject-oriented programmingHydraulic jumpComputer animation
08:14
Computer programCore dumpSystem callFunctional programmingComputer programmingCore dumpSheaf (mathematics)Address spaceSymbol tableObject-oriented programmingCompilerComputer animation
09:48
Computer programIEC-BusPointer (computer programming)Kernel (computing)Computer programmingBootingSemiconductor memorySpacetimeComputer animation
10:22
EmailCubeMapping2 (number)Term (mathematics)MereologyInformationComputer programmingMultiplication signDiagramComputer animation
11:04
Core dumpKernel (computing)Pointer (computer programming)MultilaterationLine (geometry)Computer programmingRevision controlSpeicheradresseIntermediate value theoremComputer animation
11:42
DisassemblerEmailSymbol tableLevel (video gaming)Structural loadKernel (computing)Computer fileDataflowPoint (geometry)InformationIdentifiabilityComputer animation
12:25
MappingBit
12:46
MappingInformationLevel (video gaming)DataflowType theoryMereologyComputer animation
13:40
Data typeAttribute grammarInformationFile formatLevel (video gaming)Revision controlInformationSymbol tableType theoryParameter (computer programming)NumberNormal (geometry)Functional programmingMappingFormal verificationMoment (mathematics)Computer animation
14:58
Manufacturing execution systemCore dumpBootingSystem callAttribute grammarDifferent (Kate Ryan album)Computer animation
15:27
Data typeFile formatAttribute grammarInformationComputer programmingRevision controlCompilation albumKernel (computing)Line (geometry)InformationComputer fileBootingComputer animation
15:58
Interior (topology)StrutType theoryPoisson-KlammerNumberInformation
16:36
Data typeGame theoryDataflowKey (cryptography)Descriptive statisticsKernel (computing)Level (video gaming)Sheaf (mathematics)
17:14
Data typeType theoryInformationCodierung <Programmierung>Price indexEmailRow (database)Line (geometry)Core dumpInformationLine (geometry)Functional programmingInterior (topology)Field (computer science)MereologyString (computer science)Type theoryRevision controlSource codeQuicksortElectronic mailing listComputer animation
18:23
InformationStructural loadLine (geometry)Type theoryKernel (computing)Field (computer science)Level (video gaming)Computer programmingComputer fileSheaf (mathematics)System call
19:29
Object-oriented programmingCompilerMacro (computer science)Level (video gaming)BootingMacro (computer science)InformationBlogLibrary (computing)QuicksortVideo gameBoiling pointComputer programmingType theoryKernel (computing)CompilerComputer programmingComputer fileCompilation albumSystem callHierarchyGoodness of fitComputer animation
20:43
Gamma functionMaizeCore dumpStructural loadMathematicsComputer programmingNetwork socketHTTP cookieMathematicsStructured programmingCodeCompilerFunctional programmingPoint (geometry)MereologyMacro (computer science)Pointer (computer programming)Arrow of timeComputer animation
21:40
Line (geometry)InformationRow (database)EmailData typeString (computer science)Core dumpRevision controlNumberGoodness of fitKernel (computing)BootingEqualiser (mathematics)Type theoryPatch (Unix)InformationField (computer science)QuicksortStructural loadGeneric programmingMultiplication signSheaf (mathematics)Computer programmingCartesian coordinate systemString (computer science)Personal identification numberInterpreter (computing)Revision controlRight angleMathematicsBinary fileCuboidBinary codeComputer animation
25:11
Program flowchart
Transcript: English(auto-generated)
00:06
All right. Let's get started again. So welcome back, everyone. The next talk is from Dylan about eBPF Loader Deep Dive.
00:21
Yes, hello, everyone. Thank you for attending. Before we start, I have to make a quick confession. I'm only 80% done with my talk. No, but really. Today, I'm going to talk about eBPF loaders.
00:40
And while I'll do my best to go as deep as I can within the time constraints, there is, of course, so much more to go through. So let's start with what is a loader, for those of you who are not even know. So the term can be used in multiple contexts.
01:03
But for the purpose of this talk, I will refer to a loader as any program that interacts with the kernel via syscalls. Or what you more commonly see is a program that uses eBPF Loader library to do most of that work for it.
01:22
So examples of loaders are IP and TC, which can be used to load XDP programs or TC programs, for example, but also BPF tool, which can do the same, or BPF trace, or even your own app if you decide to use a loader library
01:41
and make something great. Loader libraries are basically obstructions on the eBPF syscall to make it easier to use. Kind of like libc, but for BPF, which is the name for
02:00
the first example comes from, libBPF. But of course, there are many others, like Aya, where we had a talk before on this day, or BCC or Selium eBPF, for all examples of loader libraries, libraries that load BPF programs into the kernel.
02:21
So why do we need loaders? This is an example, this is the program example we've been working with today. And it's quite simple. So if we, on the left side, I do declare a map, which we will be using to store flow data, so packets and bytes
02:42
per second, for a combination of source address and destination address. And on the right is a bit of logic that checks that we have enough data, and interprets it as IPv4. Now there's a handle IPv4 function mentioned here, but
03:02
it doesn't fit on the slides, so we'll get to that later. When I compile my program, I get what's called an ELF, an executable and linkable format. Or linkable, now that I think about it. Whatever. If you, a normal C program, if I were to pull any random
03:24
hello world C program from the internet, compile it, like I showed in the above command, we'll get out an executable. And you can use it out of the box, no need for trickery or things. You make it executable, and you execute it, and you get
03:41
hello world on the command line. If you get an eBPF program, and you try to compile it with commands you found on the internet, you'll get a relocatable. Now if you try to execute it, you'll get an error. So it doesn't work. What you need is a loader.
04:02
The executable that we have is like a pre-made IKEA furniture. But the relocatable we get for eBPF is two pieces, and perhaps, if you're lucky, a guide on how to put them together. And this is the job of the loader. Putting the pieces together, and providing the guide to
04:24
make it easy for you to use it. Now, an ELF, as we generated, has the following structure. So we have this large file. We start with an ELF header, which contains information like, this contains eBPF.
04:41
And it's this many bits, this machine. And it has a bunch of sections. These sections have names, and each of them can have a different format. So the string type has a bunch of strings.
05:02
Our programs have a bunch of program code in them, et cetera, et cetera. But those are referred to each other. So you have all the arrows, and they point to each other, and they link to each other. But in this form, it's not that usable, because the
05:22
kernel only understands syscalls and eBPF programs. It doesn't know how to handle such an ELF. So what the BPF syscall looks like is like this, if you pull up the mem page. We have a bunch of commands. Each command has attributes.
05:42
And in the kernel, they're defined in a very big union. And every command has its own set of attributes that you can use to instruct the kernel to ask the kernel to do something for you. I can't go over all of them because of time constraints. But the most important ones are loading your program,
06:01
creating a map, loading BPF, and of course, interacting with the map, attaching it somewhere, et cetera. There are quite a few commands. Each of them does slightly different things. And the loaders, in most cases, provide functions that either call multiple of these to do a batch, like a
06:22
big operation, a high-level operation, or they provide small wrappers for you to do your low-level operations yourself. There are also links, which is a newer concept. And you can pin your objects to the file system. So they live longer than your program. And we have a few other miscellaneous functions for
06:43
doing measurements, statistics, iteration, et cetera. But I can't go in this talk, unfortunately. So back to our program. When we write our program, we have a macro here that says sec. That's quite unique for BPF.
07:02
Every BPF program needs to have this section tag there. And this tells the compiler to put all of the program code in the specific section that we named. And the name of this section follows a convention, which can be used by the loader to inform it that
07:21
this is an XDP program. So it should be interpreted as such. Now, we can dump this section. So if we dump this section with the LVM object dump, then we get out this, which is hard to read if it's not annotated. But it's a bunch of eBPF instructions,
07:43
starting with the opcode. So the actual opcode that tells it if it's add, subtract, whatever. Source and destination registers for various opcodes act on. We have offsets for jumps. These are relative. And intermediate data for, to say, load some data
08:04
into a register, like a constant value. And sometimes we can use two of them together to represent a 64-bit number. But we'll get to that later. We can also ask object dump to decompile this for us. And we'll get the decompiled eBPF program.
08:23
So the bytes on the left side and the actual program on the right side. But you'll notice that there's a call here. So one thing that I didn't tell you before is that the handleIPV4 function that we have is marked in such a way that it won't be in line. So it's a separate program.
08:41
And eBPF can do BPF to BPF function calls. And if you do that, it puts out this instruction, a function call instruction, but with minus zero. Where do we call to? Well, currently nowhere, because we haven't assembled the pieces of our furniture yet.
09:02
So what actually, what also happens is that the compiler will emit relocation information, which we can again visualize. And it says, all right, we have a certain instruction at this given offset, and you should put a relative address of this other function in here.
09:24
Then we can go to the symbol table and we can look up this name and it says, oh, that function lives in the .text section where for eBPF programs, all of the function to function calls, all of the functions live together.
09:41
So we have these two separate pieces of the puzzle and they refer to each other. But the kernel only has one pointer for our instructions. It expects that every program we give it is one contiguous piece of memory with instructions
10:00
and it all should work. So we have some work to do. We need to figure out, or the loader rather, needs to figure out how it wants to lay out our program, so piece all of the puzzles together, find all of these references, and then put in the correct offsets. All of this happens in user space before we even go to the kernel.
10:22
Now, second fun thing is that we can define our map. So again, we have the sec part, .maps, put it in the .maps section, and this is the part that I have been hiding from you until now.
10:41
It's also quite simple in terms of eBPF programs. We get an IPv4 header, check that we can use it, and we write, or we get a value from the map, and if it doesn't exist, we write a new one and increment the values every time this happens
11:01
to account for some information. So keep this program in mind, and then if we go look at the instructions again, the disassembled version this time, we see that we have two of these long lines which are zero at the end. So these are the 64-bit intermediate values
11:20
that I was talking about, and they're this long to pre-allocate room for actual memory addresses later instead of relative jumps, but they're zero, and these should be references to our map, and later on, these will become pointers when the kernel gets its way with it.
11:43
And in our case, we again need to figure out what to put in here. So same routine, we have relocation information, the relocation information points to the instructions that we have. It says you need to plug in a flow map here. We go to the symbol table,
12:00
and there it says we have a dot map section, and there lives a flow map. In this case, we handle it slightly differently, so we then have to go load this flow map first, get a file descriptor, which is our unique identifier for the map, and we need to actually put in that file descriptor into these empty values
12:22
so the kernel knows where to go. Creating maps is also a command, so we have the map create command, and it takes these arguments. I cut out a bit of the later ones, but these are the essentials, what type, how big are my values, et cetera,
12:44
give it a nice name, and there are two ways to define these. We have the new way of doing it, which are called BTF maps colloquially on the left, but there's also the old way of doing it using a BPF map definition on the right.
13:03
Don't use it. If you go into libbpf in the part of the libbpf which is used during eBPF construction, it will warn you that you shouldn't use it and go for the left side. But the odd thing is that if you use these newer BPF maps on the left
13:21
and you go look at what's actually then written to your .map section, it's all zero. There's no information. It still allocates room for your map, but they'll all be zero and there's no information. All information instead is in the type information of the flow map.
13:41
So we have to get in what is BTF. BTF stands for BPF type format. It's derived from DWARF, so the actual DWARF debug symbols that already are used for normal C programs, but as a way, compact or smaller version of it,
14:01
which only really is concerned about type information and not about where at which moment a variable lives. And these are used because eBPF itself is just too limiting and we want to do more, especially in the verifier. So we have, for example, features like spinlocks,
14:20
which should only be used on maps that have spinlock values in them, or we have callback functions so we can define these BPF functions, but instead give them to a helper function. But this helper needs to then know that it's the correct number of arguments
14:40
and the correct type. So all of this type information we can give to the kernel, and that's why it's, especially if you want to use these new fancy features, it's important to use the BTF information. It also allows for flexible map arguments. So for example, if I go back, we have the definition,
15:01
and one of the things you'll notice is that we have pinning as an attribute here, but you will not find it in the syscall attributes. This is purely something that we communicate to the loader library. That we communicate to the loader library, not just lib eBPF,
15:20
but that's just the name that it has currently. And we can do a lot of different cool things with that. It also provides debug information for us. So if we go look at loader programs, it will be annotated with the line information and from which file we can read. And perhaps one of the coolest feature
15:41
is compile once, run everywhere, which allows the loader and or the kernel to modify our program slightly, so it will run on multiple versions of the kernel, even if the internals have changed. So if we dump this BTF that we have
16:02
from our example program, it looks like this. Things, features to note are the numbers on the left in square brackets. Those are the type ID. Besides its actual type, so we have pointers, integers, arrays, you can basically represent every C type
16:21
within BTF this way. There's an optional name, and then there's a lot of information about the specific type. And they refer to each other, so you'll notice a lot of type ID is something else. So you can also visualize it by nesting it. I've done this manually. By the way, there's no comment that does this,
16:41
but this is how you can do it yourself. So we have the, for example, a map section with our flow map in it, and you can see that we have the type, the key, the value, and we have this very detailed description of exactly how it's structured that which offsets which things live and the names for it which are used
17:02
to check all of these certain things, and also to create a de-loadable, we'll use this to infer the actual value and key sizes to give to the kernel. This BTF is structured in, so it lives in a .btf section, and it's sort of structured like this,
17:21
so we have this header, then types and a lot of strings, and each type starts with the same three fields, so we have a name offset, so an offset into the strings. We have information and a size or type, depending on what the information says. This translates into the name
17:41
and the type of the BTF information, and then the last part is specific to that type, so encoding for ints or a list of fields for a structure, et cetera. We also have the .btf.ext, the extended version of it, and this contains function information, line information, and optionally core relocations.
18:05
So the line information contains a bunch of lines, so it will annotate this instruction as part of this line of your original source program, and functions to label every one of these BTF functions
18:20
that you have defined. Loading the BTF itself is quite simple. You use the load BTF command in the BTF syscall over the BPF syscall, give it the blob that we have. It needs to be slightly changed, especially for the data size, the data section type,
18:41
but that's more details to explain exactly why, and a bunch of logging information. Once you have it, we get a file descriptor of the BTF object, and of course we have all of these type IDs, so when we are loading our map again, there are these fields where you can say,
19:01
this is my BTF object, which contains all of my types, and this is the type of my key, this is the type of my value. That's how we wire everything together. The same goes for programs, so we give it the program, the BTF the program uses, and we give it these file, these func information and line information blobs,
19:24
which will make sure that everything is nice and annotated in the kernel. So we end up with a sort of hierarchy that looks like this. We start by loading the BTF. We can then load our maps, which use it, and then once we have our map file descriptors,
19:40
we can load our programs after we have, of course, assembled all of the pieces of our program. And that all happens, can happen within one call to a loader library. And for the last part, the core, which I touched on a little bit earlier, like I said, compile once, run everywhere.
20:01
There's this really good blog post for, which I encourage everyone who wants to use the feature, to, which contains information on how to actually use it. But what it boils down to is there are, in libbpf, there are these macros to make your life easier,
20:20
and they boil down to a bunch of compiler built-ins. And they're basically questions to ask the loader just before, or the kernel just before, or while loading the program. Like, where does, what is the offset of this field? Where does this type even exist?
20:41
Do I have this enum value? I have this small program that writes, writes values to, or that captures a certain, or the cookie value of a socket when it closes. Not useful at all, but it does help us to illustrate the point. When we, when this macro resolves, it looks like this.
21:04
And the important part to notice here is that we do a helper call, and where the arrow starts, we have the socket pointer, and we have an offset, and we add an offset, which we get from this built-in function. This offset, this offset is then,
21:23
gets encoded in the 104 that we see here. This is this offset that we add to the pointer in the actual code. But the compiler will also emit this relocation, which will tell us to, that this might be a piece of the code that we want to tweak, depending on if the structure changes.
21:41
So if we, again, look at this relocation, there, unfortunately, as far as I'm aware, there is not a good command line tool to visualize or to decode this, so I decoded one manually. It looks like this, so it says, okay, instruction number two, which is the instruction that we were, that we were at. Instruction number two refers to type ID 18,
22:04
and it has this accessor string. And this accessor string is a bunch of numbers, which is basically offsets, like the field number that it tries to access. So the sockets, then the second field would be SK common, and then cookie, and so forth.
22:23
Now, this type information that we knew when we created the program is included in the BTF section. But the kernel also has BTF types for all of its types it has. So we can do a sort of diff, do a comparison, and see that, for example, fields changed position,
22:42
or we can't find a certain field. And our loader can do this, can resolve this, see it, and then patch our code, change this offset value right before we actually load it, which makes it possible to use it on so many different kernel versions.
23:00
I'm out of time. That's everything I can offer you for now. Are there any questions, and thank you. Thank you, any questions?
23:21
There's one in the back, all right, okay. It's difficult now. Can you pass this on? Hey, thanks for the great talk. So I haven't dealt that much with BPF,
23:40
but since we have those binaries that we cannot really launch, because we have to load them in another ELF, right? At least as I understand. Would it make any sense to make either a loader that would just work out of the box for those binaries, or use the bin fmtem misc feature from the kernel
24:03
to be able to load those BPF ELF files, and use some kind of generic or general interface, and just load them and run them? Yeah, but I think it does make sense to some extent. There, for example, the IP tool
24:22
that doesn't have anything additional, so it takes this ELF and just loads it as best as it can, and there is probably some way to use the interpreter in the ELF itself, just like we do for dynamically loaded executables.
24:41
As far as I know, no one has tried it so far, but I think it could work, at least for a limited, use case, where you don't have to, where you would only load something and pin it, and then allow some other application to actually work with it afterwards. Yeah, thank you. All right, thanks, we are out of time.
25:00
If you have more questions, you can find Dylan in the hallway, and yeah, thanks again.