What we learned from reading 100+ Kubernetes Post-Mortems
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 56 | |
Autor | ||
Lizenz | CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/67159 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
Berlin Buzzwords 202245 / 56
22
26
38
46
56
00:00
Formation <Mathematik>Ultraviolett-PhotoelektronenspektroskopieSchar <Mathematik>MultiplikationsoperatorXMLUMLVorlesung/Konferenz
00:41
EreignishorizontSoftwareentwicklerVollständigkeitVorlesung/KonferenzComputeranimation
01:23
SoftwareentwicklerMultiplikationsoperatorSoftwareentwicklerBitVorlesung/KonferenzComputeranimation
01:54
GammafunktionRandwertKategorie <Mathematik>SoftwareentwicklerProzess <Informatik>TopologieProdukt <Mathematik>GrundraumVorlesung/Konferenz
02:28
FlächeninhaltSoftwareentwicklerComputersicherheitPunktwolkeSelbst organisierendes SystemKonfigurationsdatenbankKontrollstrukturFluss <Mathematik>Fächer <Mathematik>Rechter WinkelComputerspielExpertensystemKartesische KoordinatenDokumentenserverProgrammfehlerVorlesung/KonferenzComputeranimation
04:46
ProgrammfehlerSoftwareentwicklerVorlesung/Konferenz
05:11
SoftwareentwicklerComputersicherheitProgrammfehlerSoftwaretestInverser LimesStabilitätstheorie <Logik>SoftwarewartungDokumentenserverCodeHecke-OperatorVirtuelle MaschineHalbleiterspeicherRechter WinkelElektronische PublikationComputeranimation
06:10
SoftwareentwicklerHalbleiterspeicherInverser LimesVorlesung/KonferenzComputeranimation
06:36
DatenverwaltungComputersicherheitTabelleSoftwareentwicklerQuick-SortComputeranimation
07:12
VerschiebungsoperatorSoftwaretestComputersicherheitSoftwaretestVerschiebungsoperatorComputersicherheitExogene VariableInstantiierungProzess <Informatik>Bildgebendes VerfahrenSoftwareschwachstelleSelbst organisierendes SystemDatenverwaltungSoftwareentwicklerZusammenhängender GraphKonfigurationsraumGemeinsamer SpeicherProdukt <Mathematik>IdentifizierbarkeitHinterlegungsverfahren <Kryptologie>TermXMLComputeranimation
08:45
Prozess <Informatik>TermArithmetisches MittelCodeSoftwareentwicklerVorlesung/Konferenz
09:16
E-MailNewsletterSoftwareentwicklerGemeinsamer SpeicherSelbst organisierendes SystemFront-End <Software>PixelNewsletterCross-site scriptingMultiplikationsoperatorProzess <Informatik>Computeranimation
10:51
Hill-DifferentialgleichungStapeldateiProzess <Informatik>DefaultMultiplikationsoperatorDatenparallelitätSpieltheorieKonfigurationsraumKonfiguration <Informatik>Rechter WinkelSchlüsselverwaltungFormation <Mathematik>EnergiedichteComputeranimation
12:40
BildschirmmaskeProzess <Informatik>KonfigurationsraumDatenparallelitätComputeranimation
13:05
StapeldateiDatenparallelitätInverser LimesMereologieProzess <Informatik>Druckverlauf
13:49
SpezialrechnerVideo GenieKonfigurationsraumServerRechter WinkelComputeranimation
14:39
SpezialrechnerApp <Programm>ROM <Informatik>HalbleiterspeicherInverser LimesKartesische KoordinatenMathematische LogikVorlesung/Konferenz
15:10
CASE <Informatik>Data MiningProgramm/Quellcode
15:52
Stabilitätstheorie <Logik>ComputersicherheitProdukt <Mathematik>ComputeranimationVorlesung/Konferenz
16:16
Selbst organisierendes SystemVollständiger VerbandSelbst organisierendes SystemPlastikkarteKonfigurationsdatenbankComputeranimation
17:01
Selbst organisierendes SystemCodeStellenringSoftwaretestProgrammbibliothekSoftwareentwicklerQuellcodeSoundverarbeitungReelle ZahlSelbst organisierendes SystemCodeFehlermeldungTypentheorieProdukt <Mathematik>SoftwaretestElektronische PublikationStellenringProgrammbibliothekZweiNamensraumQuellcodeSoftwareentwicklerSystemaufrufDifferenteMereologieExogene VariableVirtuelle MaschineDokumentenserverVerschiebungsoperatorMathematikComputeranimation
19:48
VersionsverwaltungFehlermeldungProgrammverifikationMobiles EndgerätOpen SourceDefaultElektronische PublikationUmwandlungsenthalpieTypentheorieCoprozessorSchedulingMultiplikationsoperatorDateiformatBefehlsprozessorFehlermeldungInverser LimesCLISelbst organisierendes SystemProzess <Informatik>Bildgebendes VerfahrenHalbleiterspeicherCluster <Rechnernetz>ValiditätKonfigurationsdatenbankProdukt <Mathematik>Computeranimation
21:20
ProgrammverifikationDateiformatFehlermeldungElektronische PublikationRechter WinkelKonfiguration <Informatik>WürfelServerFahne <Mathematik>Computeranimation
21:48
DateiformatFehlermeldungProgrammverifikationStrategisches SpielKonfiguration <Informatik>Open SourceEinfach zusammenhängender RaumServerDokumentenserverClientTopologieWürfelSoftwareentwicklerSelbst organisierendes SystemÄußere Algebra eines ModulsComputeranimation
23:02
VersionsverwaltungKonfigurationsdatenbankSchlussregelBitSoftwareInverser LimesProzess <Informatik>BefehlsprozessorSoftwareentwicklerBeanspruchungSelbst organisierendes SystemSchedulingInstantiierungLimesmengeVorlesung/KonferenzComputeranimation
24:03
SoftwareentwicklerExogene VariableVorlesung/Konferenz
24:27
KontrollstrukturProgrammierumgebungVersionsverwaltungDatenverwaltungImplementierungInstantiierungSoftwareentwicklerArithmetisches MittelDokumentenserverGeometrische FrustrationZentralisatorVorlesung/KonferenzComputeranimation
26:06
ZentralisatorDatenverwaltungOpen SourceTopologieSoftwaretestGamecontrollerVorlesung/Konferenz
26:39
DatenverwaltungSoftwaretestSoftwareentwicklerQuellcodeVerschiebungsoperatorRechter WinkelDatenverwaltungExogene VariableMathematikOpen SourceTopologieSelbst organisierendes SystemImplementierungBeanspruchungProjektive EbeneGamecontrollerZentralisatorURLCLIVorlesung/KonferenzComputeranimation
27:44
GEDCOMCLIStellenringSoftwaretestStrom <Mathematik>SchlussregelEin-AusgabeSchaltnetzTopologieSoftwaretestElektronische PublikationValiditätComputeranimation
28:16
CLISoftwaretestBeanspruchungSchlussregelMultiplikationsoperatorProjektive EbeneComputeranimationVorlesung/Konferenz
28:41
CLISoftwaretestInstallation <Informatik>Virtuelle MaschineStellenringPlug inBetragsflächeDesintegration <Mathematik>CodeDynamisches SystemKonfigurationsraumProjektive EbeneVirtuelle MaschineSoftwaretestTopologieDatenverwaltungPlug inSchlussregelWürfelComputeranimation
29:30
Selbst organisierendes SystemSchlussregelExogene VariableSoftwareentwicklerFormation <Mathematik>Vorlesung/KonferenzJSONXMLUML
Transkript: Englisch(automatisch erzeugt)
00:07
So, actually, I'm very excited to be here because the story behind this talk began a long time ago. One day my dear friend Niv, who is also a DevOps engineer in the company
00:21
that we work for, he's also a colleague, asked me to join him to a meetup. And I said, yeah, sure, what's the meetup? And he said, I don't remember, something about Flux, definitely DevOps. And, well, first of all, he told me he had made DevOps. I mean, come on. But there was still a small problem because this is
00:45
clearly DevOps meetup. And I'm a developer, so I told Niv, I don't think I should be there. Like, I'm not like a real DevOps, you know. I do work a lot about Kubernetes and I do a lot of stuff that related to DevOps, but I'm a full-stack, I shouldn't be there. And Niv completely got angry at me and
01:06
he told me that it's complete nonsense that I should be there, tomorrow 8 a.m., be there. And I said, yeah, okay, let's do it. And we went. And it was the best
01:20
meetup that I have ever been to. Why? Two reasons. First of all, it was the first time that I realized how much I love DevOps. And the second reason is because it was the first time that I ever said out loud why I believe that every developer should practice DevOps, at least a little bit. But now, after
01:43
that I've said it, now we can really start talking. So, hi, my name is Noa Baracki. I am a developer advocate and a full-stack developer. I've been a full-stack developer for about seven years. I'm also a tech writer and one of the leaders of GitHub Israel community, which is the largest GitHub community in
02:01
the whole universe. And I work at an amazing company called The Tree, where we help developers and DevOps engineers to prevent Kubernetes misconfigurations from ever reaching production. Now, why am I telling you all of this? It's because partly my job at The Tree is not only to understand how Kubernetes works, what are the best practices, but it's also to learn
02:22
how you can blow up your own cluster. So before we launched The Tree, we wanted to learn as much as possible about the common misconfigurations and the pitfalls in the Kubernetes area, so we read more than 100 Kubernetes failure stories. And this is exactly what we're going to talk about and how you can prevent them from ever happening to you. But let's go back to the
02:45
meetup story. So we went to the meetup and obviously I was the only developer there. I remember that everybody seemed to be like so grown up, like they figured everything in life, they solved all the puzzles. And they started with a 40-minute session about Flux, which was very interesting.
03:04
I'm way more a fan of Argo City, but you know, potato, potato. And after the session, the organizer said, pizza's in the bag guys, let's take a short break and then we'll have a panel. So I looked at Niv and apparently they
03:21
had a cloud-native experts panel where people can ask them whatever they want. And you know how it usually goes with panels, right? People are too embarrassed to ask anything. So after three, five, four minutes of silence, awkward silence, one guy raised his hand and said that they started to use JFrog registry in the company that he works for and
03:42
that he's very frustrated with the developers in his organization because not only that they don't know how to use the registry, they get so mad all the time because they're getting failed. And they talked about it and then another guy raised his hand and said that, yeah, I have the same problem with security. What should I do? What are the best
04:02
practices with shifting left? And they talked about it and they said that those developers, they don't understand what to do. And then one guy shared that he don't know where to put the Kubernetes resources. If he puts it, all of them in the application repository, then he's very
04:21
afraid that all the developers, all those developers will ruin everything. And they talked about those developers who don't care and those developers who don't understand. And I was like sitting there and I was like, how can they say that? But I was too embarrassed to say something, you know, because I'm one of those developers. So after three,
04:43
four minutes, I decided that, no, no, this is not a bug. This is a feature that I am the only developer here. So I raised my hand and I said, hi, my name is Noah. I am a full stack developer. I guess that
05:03
I'm the only one here, but may I speak in the name of my people? No, I didn't say that. I didn't say that. Just ask permission to speak and everybody like turn around to see who's voice is speaking here. Everybody looked at me and I said that, well, you say that we don't
05:22
understand. You say that we don't care and that we might ruin everything. And you're right. You're right. You're totally right. But first of all, give me some credit here. I mean, come on. I have lots of things to do. And you forget the most important thing. You forget
05:40
that we're different personas. We work on different tools and we have different goals. I wake up every morning to be the best feature machine that this world has ever seen. I have code to write, tests to run, bugs to fix, tons of pull requests to review. I also need to worry about best practices, maintenance, stability and security.
06:02
And now I also need to manage and look for your YAML files that you put to my repository. What the heck is Terraform and why is the memory limit so important? How do you expect us to work together in the same pipeline on the same technology when I don't even
06:29
ask me the question that I fear the most? He said, so what do you suggest? And I hesitated. I smiled and I said, well, if you think
06:42
about it, Kubernetes sort of like brought everybody to the table, all of us. And now we need to play on the same technology. We are the DevOps and the developers and the IQA and the security managers and the IT and everybody need to work on the same technology on the same table. But the absurd is that
07:04
the only persona, the only one persona that actually know how to operate with Kubernetes is DevOps. So here it goes with the blink of an eye, the DevOps become a bottleneck. And I see a lot of people that talk about shift left, shift left testing, shift left security. I even saw some articles about how
07:23
to shift left data management, which is awesome. But the one thing that nobody talks about is how to shift left responsibility because now if we take security, for instance, now this is my job. This is my responsibility as DevOps engineer to scan my images. This is my responsibility to
07:42
make sure that I don't have vulnerabilities in my images, but nobody tells me how exactly I'm supposed to do it. So let's talk about how to shift left responsibilities. And the first thing that you need to do is to delegate the knowledge. You should learn what are the best practices in the industry and teach others in the organization. Educate
08:04
the developers. What is Kubernetes? What are its main components and why specific configurations are so important for their own work? For instance, don't tell all the developers, don't ask them to use JGS CHAS instead of TUGS.
08:21
No, explain to them that JGS CHAS are basically hash identifiers and they work pretty much the same thing as Git commits. And if you use JGS CHAS instead of TUGS, which are mutable, you prevent images in production from changing unexpectedly because attackers can override a TUG,
08:40
for instance. This way they will understand why it's important for their work and you kind of explain it to them in terms they will understand. So this is very important, but you're probably thinking to yourself, is she crazy? I'm not a Udemy here. I have a job. So yeah, it don't mean that everybody
09:01
needs to understand everything about Kubernetes. No, do it wisely. Choose your champions. Pick those developers who are most interested in infrastructure code in DevOps technology and delegate your knowledge to them. Think about front-end and back-end developers. Every organization has those
09:20
front-end developers who will never do back-end development. But on the other hand you have those back-end developers who will never do front-end development because it's only pixels and styling and CSS and blah. We don't want it. But every once in a while you have those true full-stack developers who do both. This is the kind of developers that you look for but with DevOps technology,
09:44
with Kubernetes, because they belong to both tribes and if you will delegate your knowledge to them they can become your ambassador and they will delegate the knowledge to the rest of the developers. And when they will gain your trust, and it will take time, it's a process, believe me, but then grant them
10:04
permissions. Allow them and permit them to educate the rest of the developers. So when a developer will have a question, I don't know about CI-CD, he or she will go to your champion instead of you. But let's talk about how to actually learn and share knowledge.
10:23
And there are many ways to do it. When it comes to learn, when it comes to share knowledge, you can have internal meetups, you can share newsletters, white papers, emails, there are many ways to do it. And when it comes to learn about best practices,
10:40
trust me, I've been there, I've done that. The most efficient way to do it is by learning from other companies' failure stories. And from my research, I would like to welcome you to my very own private show What's the Mistake? Game Show. Are you ready? Let's do it. Okay, yeah, this is the energy. Okay, so the game goes like this.
11:02
I'm going to show you two Kubernetes manifests. Each time I'm going to point into a specific key which configured differently on every manifest, you will have to look very carefully and tell me which one you will deploy. Left or right? Are you ready? Let's do it.
11:23
Okay, this is cron job configuration. Pay attention to the concurrency policy. Which one you will deploy? Left or right? Okay, I really needed to like add the music like
11:44
What? Intense? Yeah, I liked it. Intense? Depends. Depends on what? Well, that's the answer to everything, right? Especially in DevOps. Okay, but let's talk about it.
12:01
The right answer is right. Why? Because whenever a cron job gets failed and we set the concurrency policy to forbid, the failed cron job will not replace the previous one. It will just create a new one, a new one, and you will spawn cron jobs on your cluster. And yeah, that depends. That's the beauty in Kubernetes. It's like a cockpit,
12:24
like a pilot cockpit. You have so many buttons and so many options that it's not the Kubernetes that is wrong, it's the fact that you can use it everywhere. And sometimes the default configuration is not the one that's suitable for you. This is something that happened to Target. They had one failing cron job that created
12:44
thousands of pods that were constantly restarting. And not only that they took their cluster down, but it also cost them a lot of money. Said, let's move forward to the next question. This is another cron job configuration. And once again,
13:02
pay attention to the concurrency policy. Which one would it be? Left? Nice. Thank you. All right. Think about it. No pressure. Whose here says left? Whose here says right?
13:29
No right answer. Okay. And the right answer is right again. You see here, let me go here.
13:40
Here on the left side, the concurrency policy isn't part of the cron job spec. So we end up with cron job basically without any limits. And this is something that happened to Zalando, which is an online fashion company with over 6,000 employees. They used the correct configuration. However, they placed it incorrectly in their yaml, which of course took the API server
14:06
down. And yeah, cost them a lot, a lot of money. Let's move forward to the last and the most, almost most important question, just to make sure that you're with me and awake after lunch. Pay attention to the containers. Which one would it be? Left or right?
14:33
This is a short one. Whose here says left? Whose here says right? And of course, obviously we always want to make sure that we set the memory limit. And you all
14:46
know that, but Blue Metador, unfortunately they didn't. They were a small startup company back then. And one of their pods hosted a Sumo logic application whose containers were memory hogs. And without any limits, nothing stops from those pods to take up all the memory
15:04
in the cluster, which obviously took them into out of memory issues. But Blue Metador, Target, Zalando, they aren't the only companies who suffered from these pretty innocent mistakes. I'm talking about big companies, Google, Spotify, Datadog, Airbnb,
15:24
Toyota, Tesla, who's not there and Docker and Microsoft, a lot of other companies that share their own Kubernetes failure stories. Nobody is immune to misconfigurations, trust me. This became a hobby of mine. Now, first of all, I highly encourage you
15:43
to read about other companies' failure stories. Not only that it will inspire you to think about other use cases and what are the best practices in the industry, other ways to learn about the best practices, it will ultimately force you to ask yourself the question of how can I make sure it will never happen to me? What is the stability and the security that I want
16:05
to achieve for my production and how do I get that? How can I make sure that I won't become one of those failure stories? And the question is enforcement, policy enforcement. Now, before we talk about policy enforcement in practice, I want us to talk about the most
16:22
important thing that I see a lot of companies tend to forget, to start small, to do it in small steps. Remember the guy from the meetup that said that they started to use JFrog registry and that he was very frustrated? Remember that guy? His problem, his issue was that they
16:42
dropped on everybody in the organization gigantic restrictions on one day. Don't do that. Start small, pick one team, have a meeting, make sure everybody understand the scope of the restrictions, why is the policy enforcement needed, why we do it, when we want to do it,
17:04
and have agreement with everybody. Do it gradually, then add another team and another team and another team and this is the way to actually have effective policy enforcement in your organization. But let's talk like real business. In Israel we say let's talk douglie.
17:21
So let's talk real business. Now I believe in two things. I believe in shift left and I believe in GitOps. I believe that as soon as you find a mistake the less it might take your production down and that every Kubernetes resource should be handled exactly the same as your source code in your Git repository and to be validated in the CI. And furthermore if you will validate
17:47
your resources in the CI with tools that can be used as a local testing library it will extremely help you nurture the DevOps culture in your organization because developers they're used to local testing. This is actually part of their policy. Every developer runs local
18:04
testing on his or her machine before they submit a pull request and guess what? They expect those tests, at least those tests, to be run again in the CI. So allowing the developers to do the same with infrastructure code will allow you to delegate more responsibilities and to liberate you
18:23
from the constant need to fence every Kubernetes resource from any possible misconfiguration. So the way I see it we should automatically validate our resources on every code change in the CI. Now there are three types of misconfigurations usually. The first one
18:45
is what I like to call syntax errors which combines all the mistakes that happen because we accidentally submitted invalid YAML file or invalid Kubernetes resource with incorrect schema and you will be surprised I know that it may sound very basic but you will be surprised to know
19:02
how many companies share their stories because they accidentally submitted invalid YAML file. My favorite story is about Skyscanner who accidentally deleted one of the of their curly braces from their helm chart and basically they corrupted all the namespaces and created the
19:21
new namespace that nobody used and they had five hours of production downtime so yeah it happens. The next type of misconfiguration is what I like to call knowledge issues and because as I said when it comes to Kubernetes we have a lot of different personas working on the same technology
19:42
and usually we lack the knowledge on how to actually use Kubernetes. What are the best practices? Memory limit, liveness probe, readiness probe, making sure that every cron job has a deadline or that the scheduler is valid. There are many best practices that we should follow and there are many default behaviors that we may or may not use and I see a lot of misconfigurations that
20:06
happen because many personas don't know how to use, they don't know what are the best practices but learning about best practices is not enough because you also want to make sure that you are aligned with your teammates, that you follow the internal best practices in your
20:24
organization for things like using a private registry for images or a specific amount of limit or CPU for your production clusters. So let's talk about each type of misconfigurations and
20:41
how we can prevent it from happening and starting with validating syntax error. So first of all we want to verify that our file format is valid whether we use JSON, XML, YAML, Dockerfile, whatever the file format that you need you need to make sure that it's valid
21:01
and I highly recommend you to use the YQ which is a portable CLI YAML processor. I use it all the time it's very easy it's open source and so it's free and very very easy to use but once we've verified that our file syntax is correct we also want to make sure that whatever
21:23
is written in that file is also correct so we want to make sure that our Kubernetes resource is following the correct schema right. Now the built-in option is to use kubectl with the dry run flag which basically tells the API server to only validate the file the resource
21:41
not to apply it. Now there are a couple of issues with this approach and I usually prefer the built-in option but this way not so much. Let me explain why. First of all I need everybody in the organization to have kubectl access which usually the developers don't have
22:00
but let's say that they have. To work with the dry run you need to work with two strategies, one from two strategies. The first one is the client strategy which is not very helpful because it only prints out the resource that's supposed to be submitted. This is not really what we're looking for. The other strategy, the second strategy is the server which is exactly
22:25
what we look for but it requires cluster connection and we certainly don't want to give everybody in the organization, all the developers, cluster permissions, cluster connection so as an alternative I highly encourage you we use it at the tree to use kubeconform which is another
22:41
open source. It basically allows you to do the same thing but it pulls all Kubernetes schemas from Kubernetes API into a GitHub repository and it just sends a request to that repository instead of your cluster so much better option, highly recommended.
23:00
But let's talk about best practices. We talked about it a little bit, let's dive in. So first you need to define the policies and the rules that you want to enforce whether it is about liveness probe or readiness probe or to make sure that your network policy is set according to your needs or cron job and the scheduler and CPU limit set or not set, I don't
23:27
know what you prefer, which side you want but the policies and the rules that you will define are really dependent on the requirements of your workload. The real question is how will you distribute those policies? Where in the pipeline you will enforce them? Because where you will
23:46
decide would be the most suitable place for you might affect your entire organization. So this is a very important question because for instance if you will enforce your policies in the CI it will affect all the developers. So think about it, where would be the most suitable place
24:02
for you? Now the good news is that now you have policies and rules that you want to enforce and you have developers champions that you can trust and delegate responsibilities to and let's say that yeah happily ever after, not really no because this is only the beginning.
24:26
You also need to manage all your policies and by that I mean to have a place an intuitive environment where you can dynamically adjust your policies and I see a lot of people that tend to think that git is the place and git is great, like I'm really into git but git is
24:44
there for version control, maybe implementation but not for management because git won't provide you anything that you need. For instance git won't provide you a way to control over which policies are being used in practice or not. Git won't provide you a way to grant permissions
25:01
over who can delete or create a policy and when you have dozens of repositories and dozens of policies now you have another nightmare to control your versions. Not only that you need to make sure that everybody uses the same version of your kubernetes, you also need to worry about people not using the same version of your policies. So you need to find a way where you
25:22
can control, review and monitor all your policies and another thing that is very important is to provide guidelines along with your policies. Tell people, tell the developers what to do when one of the policies actually gets failed. I mean after putting so much effort in creating, defining,
25:45
enforcing your policies you certainly don't want to make any developer feel frustrated because they don't know how to add liveness or readiness prob. So it's really important to provide guidelines. Tell them why it happened, why it got failed, how to fix it and I promise you they won't
26:03
the same mistake again. And this is why having a centralized policy management is crucial. Now there are many open source tools that you can use. This is the good news. They are all open source so you can use them today. You can use Gatekeeper, you can use ConfTest, you can use
26:23
Gator, you can use Kiverino. There are many open source tools out there but the tool that I want to talk about is the tree and to show you how we believe is the right way to control and do policy enforcement. So what is the tree? The tree is an open source centralized
26:41
policy management solution and it was built with the shift left mindset to help you, the DevOps engineer, delegate more responsibilities to the developers. So if you think about it we have the CICD pipeline right and on the one hand you have the DevOps. You are the cluster admins, you know what are the best practices, what your organization needs, what are the
27:03
requirements of your workloads, you know what to do and on the other hand we have the developers who actually need to follow these best practices on development. So the tree is right there in the middle and it allows DevOps to implement, control and review all the policies in centralized location and on the other hand it provides the developers a way to scan
27:26
and validate their resources on development locally on their machine or in the CI. So if the DevOps actually modify one of the policies the tree will propagate all those changes along all the pipeline. So how does it work? We have a CLI open source project with
27:46
thousands of stars. I highly encourage you to join our community and it kind of combines everything that we just talked about. You can run the tree test, what just happened? You can run the tree test with the path of all the files that you want to test and the tree will run
28:04
a checks to see that the file is YAML valid file that it stands for all the policy checks and the best practices in the industry which I'll talk about later and to make sure that the Kubernetes resource is correct, that the schema is correct.
28:26
Now we know how much effort it takes to create policies and to define the policies according to your workloads and we don't think that you should waste your time on that. We've got your back. We already provide built-in rules and policies for Kubernetes, RQCD and other CNCF projects.
28:46
To install it you basically need to run curl command on your local machine and you can scan all your resources using the tree test command and we also have kubectl plugin to scan your cluster and as I said you can integrate the tree in the CI just like that. We provide a management
29:08
solution where you can review and monitor all your policies and rules using code or in the SaaS and yeah you can see all your policies execution and to sum up I really hope that this session
29:28
inspired you to start thinking about what are the policies and the rules that you want to enforce in your organization and how you will delegate more responsibilities to the developers because DevOps it's not a role. Thank you very much.