Introducing kubectl-trace
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 561 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/44349 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Process (computing)BlogProduct (business)Data managementKernel (computing)SpacetimeLevel (video gaming)Computer animation
00:19
Slide ruleDemo (music)Information securityPoint (geometry)Computer programKernel (computing)Focus (optics)CodeJust-in-Time-CompilerModul <Datentyp>InfinitySoftwareComputer programmingSemiconductor memoryCompilerFunctional (mathematics)Projective planeSystem callDependent and independent variablesNormal (geometry)Pointer (computer programming)Kernel (computing)WordTracing (software)Row (database)Wave packetVideoconferencingAlgebraLine (geometry)Table (information)Metropolitan area networkInformation securitySpecial unitary groupModule (mathematics)Computer fileComputer animation
02:07
Formal languageComputer programmingType theoryContext awarenessDemo (music)Computer animation
02:35
Hash functionRevision controlPrincipal ideal domainComputer virusType theoryLevel (video gaming)Binary codeFunctional (mathematics)Line (geometry)ResultantStapeldateiSimilarity (geometry)Process (computing)Reading (process)Motion captureMultiplication signTracing (software)Computer programmingMusical ensembleStress (mechanics)Computer animation
04:17
Alpha (investment)Demo (music)Distribution (mathematics)WorkloadFunction (mathematics)Medical imagingBitFunctional (mathematics)Distribution (mathematics)Kernel (computing)Cartesian coordinate systemTracing (software)2 (number)Demo (music)Web applicationHand fanWeb 2.0Computer animation
05:21
Multitier architectureStack (abstract data type)PRINCE2Order (biology)Library catalogComputer multitaskingVariable (mathematics)Integrated development environmentDebuggerNamespaceCartesian coordinate systemDifferent (Kate Ryan album)Group actionLatent heatSpacetime
06:10
Order (biology)StatisticsComputer programmingFunctional (mathematics)Group actionLetterpress printingSystem callParameter (computer programming)Process (computing)Open setKernel (computing)Cartesian coordinate systemMultiplication signTracing (software)Physical systemFamily
07:38
Product (business)Computer fontAsynchronous Transfer ModeQuery languageHTTP cookieClient (computing)Abelian categoryStatisticsCross-site scriptingComputer fileDemo (music)Computer animation
07:56
Alpha (investment)Demo (music)Distribution (mathematics)WorkloadMaxima and minimaMusical ensembleComputer animation
08:17
Hill differential equationClient (computing)Installation artVolumeRule of inferenceEmailKernel (computing)CodeProcess (computing)CompilerAsynchronous Transfer ModeComputer configurationInformation securityPrincipal ideal domainPoint (geometry)AerodynamicsFluid staticsFunction (mathematics)Inclusion mapMenu (computing)PredictabilityMathematicsDemo (music)SoftwareCASE <Informatik>CodeInformationComputer programmingFile systemType theoryLevel (video gaming)Computer configurationArithmetic meanFunctional (mathematics)Letterpress printingContent (media)Moment (mathematics)Physical systemEmailConfiguration spaceServer (computing)Operator (mathematics)Process (computing)Information securityLaptopPoint (geometry)Spherical capKernel (computing)Installation artLatent heatEqualiser (mathematics)Source codeClient (computing)Run time (program lifecycle phase)Program codePlug-in (computing)Tracing (software)Service (economics)Demo (music)Volume (thermodynamics)Game controllerCore dumpMusical ensembleSoftware design patternMixed realityTwitterComputer animation
12:39
StatisticsInterior (topology)GUI widgetTracing (software)Computer animation
13:02
Computer programElectric currentImplementationPhysical systemComputer multitaskingAxiom of choiceBootingFunction (mathematics)Execution unitComputer configurationFormal languageImplementationMathematicsComputer programmingIntegerComputer configurationBitComa BerenicesFunctional (mathematics)MultiplicationConfiguration spaceRevision controlParameter (computer programming)Translation (relic)Principal ideal domainOcean currentKernel (computing)Latent heatElectronic mailing listContext awarenessMultiplication signData managementTheory of relativityGroup actionContent (media)Field (computer science)Musical ensembleComplete metric spaceInternet service providerGUI widgetWebsiteEndliche ModelltheorieService (economics)Source code
15:23
Physical systemParameter (computer programming)Inclusion mapRevision controlMultiplication signProjective planeComputer animation
15:48
Web crawlerIdentity managementFormal languageDigital filterOperations researchVertex (graph theory)Kernel (computing)Slide ruleMoment (mathematics)ResultantEmailKernel (computing)Multiplication signGodMusical ensembleTangible user interfaceComputer animation
17:56
Point cloudComputer animation
Transcript: English(auto-generated)
00:06
So, my name is Albin, I'm in Berlin, I work at Kinfolk, I love Linux and BPF. And at Kinfolk, we care about Kubernetes on low level Linux development, things like BPF as well. So, I will start, I will not introduce too much eBPF.
00:25
How many of you know about eBPF? Okay, almost everybody. I will just introduce it quickly, and then I will go to a few demos, and then I will explain how things works.
00:41
So, if I just say a few words of eBPF, that's like small programs that run inside the Linux kernel, and that can be used for security, for tracing or networking. In this talk, I will focus only on tracing. So, how to trace your Kubernetes cluster using BPF.
01:03
And I will use, talk about concepts like Kprob, Uprob, UDSG, and response. Using eBPF is safe, if I compare to running other things in the Linux kernel, like kernel modules, it's not like that, it's safe,
01:21
it will not crash your kernels. And it's safe because there are restrictions about what you can do. On the kernel, there is a verifier that will verify that your program, your eBPF program, will not run indefinitely or do unauthorized access to some memory or so on. And it's safe as well, because it's,
01:42
BPF is a bytecode, but it will be compiled to native code, and then that will run as fast as doing a normal function call in the kernel. If you want to learn more about BPF, that's a good pointer to look at, the BCC project for BPF compiler collection.
02:01
If you use Go, you can look at GoBPF as well. So I will start first demo. This is about BPF trace. BPF trace is not, it's a tool that's run on one node. You can start from the command line, and you can type a one-liner small BPF program.
02:22
So it has its own language that you can type on the command line. It will be compiled on running on the node. So this is not a cluster aware on anything, and I will prepare a demo.
02:41
Bigger, is it good enough? So here I have, okay, this one is not exactly a one-liner, but it's a small BPF trace program. What it does, it will attach, it will use a Uprob to trace in the other batch process.
03:05
Every time you type a command in a batch, it will capture the function read line, and it will print the return value of read line. So that's the command.
03:20
So if I start this program, so far nothing happened, but I go to another terminal, I think, I type some command. Yeah, okay, that was some commands.
03:42
And here I see the results. So it says, every command I type here are captured by BPF trace. It does that on all the process on the system, that one batch, it puts a Uprob on the specific function
04:03
in the batch binary and capture that. So the idea of kubectl trace is to do similar kind of things, but on a cluster level rather than on a single node. So now we'll do the second demo on kubectl trace.
04:23
And for that, I have a bit more complex setup. I have a Kubernetes installation. I use Flatcar Linux as a base Linux distribution that we do at Kinfolk. I use a Kubernetes distribution based on a typhoon
04:42
with a special image to have the latest Linux kernel because I need some BPF function in the latest Linux kernel. And I will do a demo with a microservice application that was developed by two companies, Container Solution and Weaveworks. So I will show you right now,
05:02
how does this application look like. So the application is, it is a web app where you can buy socks or pretend to buy socks and click on the article you can buy. And this is running on Kubernetes.
05:22
So let me stop that, I'm sorry. So I have a Kubernetes cluster with a namespace for this application. And I have about a dozen of different containers.
05:41
On here, I want to trace what happened in one specific container, the front end. Before this talk, I prepared a few environment variable. So I have the pod environment variable that I want to trace is running on this node
06:01
under a C group, which is listed here. And now I will copy paste the command that I prepared before, this one. What it does is I use kubectl trace
06:22
and I specify on which node I will run that. And it will attach to the kernel function with kprobe. The kernel function is do-sys-open. So every time the system call open is called, the BPF program will be executed. On the BPF program, we'll do something like printf
06:41
on print the program name and the first argument of the system call open. So I have a node.js application. So the name of the program is node. On the file, it's opening some files, but nothing important so far.
07:00
On here, I don't want to trace all the process on the system, but only the process that run on a specific pod. So I added a filter here. That's something you can do with BPF trace. Here, I say I only want to print the trace whenever it's gone. I will show you, let me stop that.
07:26
I only show the trace when the currency group is the one of the container. So now, I start this again. And if I go to the Firefox and I refresh this page,
07:44
here I see it has a lot more file that it does open. So the node running in a container has opened a lot of files. So that's it for this demo.
08:02
Thanks. So to sum up with kubectl trace, you can deploy pod with BPF trace and inspect some pods with BPF. So that was in case network was not working,
08:21
but the network is working here. So that's the same thing. So how does it work? So kubectl trace is a client side plugin to kubectl. So it doesn't run on the Kubernetes cluster, but it runs on my laptop. So when I type kubectl trace, it execute the plugin. And this plugin will not do any SSH to the server
08:42
or anything like that. It only use Kubernetes primitives. It will create Kubernetes native resources like config map or job or pod or thing like that. So the first thing it do us is it creates a config map with a content of the program.
09:00
Okay. It create a job that will be deployed on a worker node. The job is a trace runner pod that will run BPF trace. And that thing will install the BPF program. So I will zoom in on that pod here to see what it does.
09:23
So this process will fetch information from the config map to get the program code. It will get the kernel headers from the host and compile it into BPF by code. It use LLVM to do that. And then it will install the BPF program
09:42
in the kernel with a BPF system call. With a BPF system call, we can file descriptor on representing the program. And then we attach that program to specific hooks to specific points with kprobs or Uprob and so on with a trace FS file system.
10:00
And a lot of this is done with libbcc. To be able to do that on the Kubernetes cluster, we need a lot of privilege. For example, we need cap system mean because a lot of BPF operation need this privilege. We need to have access to some volumes like for example, the trace FS volume
10:20
or the to get access to the kernel headers. So in Kubernetes, we actually use some pod options like privilege equal true using the volumes and so on. If your Kubernetes cluster is configured to use pod security policies, you need to be careful to configure that correctly.
10:41
So you need to configure service account on cluster roles and so on. So that's the usual airbag things, role based access control. With kubectl trace, you can do different kind of tracing in kernel or in user land. In kernel, you can do trace print that's statically defined trace print
11:02
that are defined in the Linux source code or kprobs, which are more dynamic. You can give any kernel function and put a trace on that. Or in user's program, you can use udst or uprob. Trace print work like that.
11:23
They are statically defined. And at the beginning of a function of this code, you might have a trace print that is executed or not, depending if there is a trace print at the moment installed. And then it will execute the BPF program
11:40
that will emit some events. kprob is very similar, but the code is patched at runtime. So you can do that on any function. It will replace the first instruction by a jump, or save some register, do the code to the BPF program, and then return after having executing the original instruction.
12:06
User level, statically defined tracing. That's something here I gave an example how to know if one of your program do that. You can use the readelf command with dash n, and you can see some trace defined here.
12:21
On uprob, that's the demo that I did before with to read the command line from another process from bash. So that's the four main thing you can do. So when preparing this demo,
12:42
my biggest challenge was how to do this filter here. So if I show again, how do I do this filter? So this was not implemented at the beginning, and implemented that in BPF trace and in kubectl trace.
13:04
So to select the pod, the issue is that BPF program are installed in the kernel on our global, they are not installed for a specific PID or cgroup or container, but they are in globally. So I need a way in the BPF program to check what is in the current context.
13:23
So I look at the list of BPF helper functions, and some of them are going to help us. So this one, get the name of the program, with a com short name of the program. That's not really so useful because you can have that
13:40
multiple program and that's easily changeable. The most interesting thing is the cgroup ID, which is in the last recent kernel version. So looking at the documentation of that, it returned a 64 bit integer giving the cgroup ID.
14:01
So based on that, I did a implementation in BPF trace in the language specific to BPF trace to have a built-in cgroup that return the current cgroup ID based on this BPF helper function. And then another built-in cgroup ID that does the translation of, you give a path of a cgroup and it returns that.
14:25
And another issue is in the kernel, we have two version of cgroups, cgroup version one and cgroup version two. Yes, so both can exist at the same time if you select both, but that's a bit complicated to manage.
14:44
The BPF helper function only care about cgroup version two. So the problem is Kubernetes normally use only cgroup version one. So I needed to do some change there. So first I configured systemd on the host to use both version one and version two
15:01
with this parameter. And then there is a configuration in Docker to say, don't do cgroup yourself, but ask systemd to do it. And then because systemd is configured to do both version one and two, it will do that. And there are similar options you can give to the kubelet on to container D.
15:21
So after doing that, the container will run with version one and two at the same time. So I could filter on the pod with a cgroup version two ID. So to finish this talk, I will give just three short ideas
15:41
about things I would like to have in the future in this project that maybe I could implement or you can implement. First to improve the user interface, because the way I did it here, I give the node available, I give the cgroup full path. Ideally, there will be a way for kubectl trace
16:02
to get that automatically. So that should not be too complicated to implement to this kind of automation. Another idea is to have aggregation. So usually when we do deployment of on Kubernetes, we have several replicas, different pods,
16:20
running potentially on different nodes. And it would be good to gather the result of those different pods and aggregate the result on two kinds. To finish, so at the moment kubectl trace needs to have privilege on the cluster.
16:41
But sometimes it's not that good that every user have complete access to the cluster. It would be nice if a user could inspect their own pod without being admin on this cluster. So that's the idea, but with a clear idea how to implement that, that would require a lot of thinking, I think.
17:03
Okay, thank you. Is there any question? Got time for a couple of questions, if there are any.
17:22
Great talk, thanks. Do you need the kernel headers to run BPA trace? Yes, so I think it depends what kind of pod you do. If you use kprops, yes, you need kernel headers to know what things you're inspecting the kernel. With TracePrint, I'm not sure.
17:40
I think it should be possible without, but I've not tested it. Anything else? Doesn't look like it. Thank you.