We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

KubeVirt scale test by creating 400 VMIs on a single node

00:00

Formale Metadaten

Titel
KubeVirt scale test by creating 400 VMIs on a single node
Serientitel
Anzahl der Teile
287
Autor
Mitwirkende
Lizenz
CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
As the number of VMs per node gets larger, using more powerful nodes (i.e. with more CPUs and RAM), the scalability of Kubevirt's control plane becomes a bottleneck, slowing down the VMI creation process. This talk will cover the motivations and concepts around general benchmarking of the KubeVirt control plane, as well as explaining the journey to running a density test with hundreds of VMs per node. Kubevirt's performance and scalability are determined by several factors. As the number of VMs per node gets larger, using more powerful nodes (i.e. with more CPUs and RAM), the scalability of Kubevirt's control plane becomes a bottleneck, slowing down the VMI creation process. This talk will cover the motivations and concepts around general benchmarking of the KubeVirt control plane, as well as explaining the journey to running a density test with hundreds of VMs per node. In addition, I'll provide some performance metrics comparing VM build time in various scenarios. Participants will have a high-level knowledge of the on-going KubeVirt's sig-scale community performance assessment and the single-node scalability characteristics of KubeVirt.
GefangenendilemmaGruppenoperationCoxeter-GruppeSoftwaretestSkalierbarkeitDiagrammJSONXML
SystemprogrammierungKlumpenstichprobeKontrollstrukturEbeneElement <Gruppentheorie>InformationsspeicherungServerRechnernetzDichte <Physik>AnalysisProgrammverifikationREST <Informatik>Objekt <Kategorie>GamecontrollerDämon <Informatik>DatenverwaltungVirtuelle RealitätMAPSoftwaretestSpezialrechnerLokales MinimumVersionsverwaltungBitrateBootenBefehlsprozessorGEDCOMOverhead <Kommunikationstechnik>Physikalisches SystemPunktwolkeDefaultIntelROM <Informatik>Interface <Schaltung>Mini-DiscLeistung <Physik>Twitter <Softwareplattform>Arithmetisches MittelAutomatische HandlungsplanungDichte <Physik>KonfigurationsraumPhysikalisches SystemGamecontrollerBefehlsprozessorFokalpunktDomain <Netzwerk>HalbleiterspeicherDefaultStapeldateiInverser LimesOrdnung <Mathematik>VirtualisierungTaskMereologieLastSpeicherbereinigungDämon <Informatik>Coxeter-GruppeMathematische LogikLokales MinimumBitrateSoftwaretestClientZahlenbereichPaarvergleichZusammenhängender GraphNetzbetriebssystemZweiMooresches GesetzKonfiguration <Informatik>AnalysisSichtenkonzeptPunktwolkeInformationInformationsspeicherungUmwandlungsenthalpieSpieltheorieGesetz <Physik>GeradeIntegralLie-GruppeSystemaufrufGefangenendilemmaSchlussregelWhiteboardFigurierte ZahlMehrwertnetzSpeicherabzugPunktFlächeninhaltWarteschlangeMultiplikationsoperatorSchlüsselverwaltungVierzigPatch <Software>QuellcodeWeb SiteÜberlagerung <Mathematik>HyperbelverfahrenJSONComputeranimation
Meta-TagPhysikalisches SystemPunktwolkeDefaultGamecontrollerIntelBefehlsprozessorROM <Informatik>RechnernetzBitrateInterface <Schaltung>Mini-DiscZählenApproximationLokales MinimumPatch <Software>SchlüsselverwaltungClientProzess <Informatik>EreignishorizontQuellcodeInformationsspeicherungEinfacher RingBefehlsprozessorEreignishorizontGamecontrollerZusammenhängender GraphWarteschlangeVirtualisierungZweiBitrateInverser LimesMultiplikationsoperatorProzess <Informatik>Migration <Informatik>Domain <Netzwerk>Objekt <Kategorie>PunktKontrollstrukturZahlenbereichDifferenteDefaultInformationsspeicherungResultanteSynchronisierungFehlermeldungNichtlinearer OperatorPhysikalisches SystemKonditionszahlPhasenumwandlungForcingEnergiedichteDemoszene <Programmierung>BildverstehenFaltung <Mathematik>Arithmetisches MittelData MiningDatensatzErwartungswertEinsSchedulingOrdnung <Mathematik>GeradeGRASS <Programm>BestimmtheitsmaßFamilie <Mathematik>MeterAutomatische DifferentiationQuellcodeSystemaufrufFigurierte ZahlGradientComputeranimation
Physikalisches SystemBefehlsprozessorGamecontrollerEbeneKontrollstrukturFreier LadungsträgerWurzel <Mathematik>Domain <Netzwerk>GamecontrollerInverser LimesAutomatische HandlungsplanungKonfigurationsraumAggregatzustandOrdnung <Mathematik>Prozess <Informatik>QuellcodeComputeranimation
ATMMaßstabMaßerweiterungOverhead <Kommunikationstechnik>HalbleiterspeicherFehlermeldungObjekt <Kategorie>CASE <Informatik>Metropolitan area networkZahlenbereichMultiplikationsoperatorVersionsverwaltungGemeinsamer SpeicherElement <Gruppentheorie>SpieltheorieWeb logSchlussregelSpeicherbereinigungGesetz <Physik>Gewicht <Ausgleichsrechnung>EinfügungsdämpfungBefehlsprozessorSystemprogrammDämon <Informatik>ZweiDomain <Netzwerk>LastTeilbarkeitLinearisierungSpeicherverwaltungComputeranimationBesprechung/Interview
StrukturgleichungsmodellComputeranimation
Transkript: Englisch(automatisch erzeugt)
Hi everyone, welcome to my presentation. I'm Marcello Maral from IBM Tokyo and Today I'm going to present the scalability test of kubivir to create 500 VMI's on a single node. I will make a short introduction and
Describe the goals the background the experiments and then concludes with some final remarks Okay, so what's the motivation of this work because there is a trend of using more powerful nodes for kubernetes and powerful nodes means here nodes that has more CPU and memory and additional to that there is also the advent of
composable system that has layers with a lot, you know compute power and Storage that can be disaggregated and you know, flexibly allocated to workloads and
Because of that there is a dis requirement to increase the number of pods and VMI's and VMs to be running those nodes and in order to fully utilize this Amount of resource that will be available in the node However, creating a large number of pods number of pods per node impose some performance challenge to the control plane
Will be both for kubernetes control plane and then for kubernetes control plane Okay, some previous work has shown that properly configuring the kubernetes control plane. It's possible to do and in order to run
500 pods per node in Efficient way. Okay So the question here is if it's possible to create 500 pods per node Is it also possible to create 500 VMs per node Is it true? so what's the impact in the VM creation latency and
What's the impact in the kubernetes control plane? when creating 500 VMs in only one node and what's the impact of the covert control plane and What's it's needed to be configured to be able to create 500 VMI's
And all of those questions I'm going to go through these presentations so the goal here of this presentation is to measure the kubernetes control plane performance and and also to show how kubernetes can handle a large number of requests with means
create 500 VMs per node So in order to do that I define a burst density test that creates a batch of VMs on a specific node So the focus here will be the performance of the control plane. So the data plane analysis
That's the performance that VM itself will be not part of this another Okay, just in a glimpse view of the kubernetes, sorry the covert control plane background covert is an
Add-on for kubernetes that allows the user to run VMs alongside containers So it's composed basically by the VertiPI component that Expose, you know the rest of the endpoints to be able to interact with the CRDs that
it's created for covert for example VMIs and VMs and It's also It's composed by a controller the Verti controller that it's the core, you know logic to Make all the components working. So the controller is responsible to watch for the CRDs VM and VMIs
for example and You know create the pod in and then make the VM component Running this in the cluster So the the Verti controller is also responsible to create the Vert launcher
Which is actually where the VM runs. Okay, so the Vert launcher it's where the libvert daemon it's run inside and Then it will create the libvert domain and start the VM
And the Vert handler, it's a daemon that it's running in all worker nodes and basically, it's Managed VMIs as kubelet managed pods. So it's managed off the lifecycle of a VMI Okay, so
What's what will be the experiments? And how it's configured In order to create a large number of VMs we need to use some is a small operation system to allocate few resource as possible so that we can be
impact a lot of VMs in one node and To further minimize the the resource users. I try to allocate with the resource requests and limits Less his resource as possible Just the resource that is enough to be to create the VM the libvert domain and start the libvert domain
It's booting the operation system is not necessary Especially because it will not introduce any load to the covert control plane Okay, so What's the task that we run? So run some burst test that creates a batch of VMs and wait for them to be created and
I vary that for 50, 100, 200, 300 and 400 VMIs I also tested for 500 but was not possible to be created since it Introduced too many load too much load in the system. So and then it was not working properly
Between each experiment. I have also cooling interval that of 30 minutes to allow the garbage collector work, I will talk more about that later and I create the VMIs with a rate of 20 requests per second. So
Did I use the tool kubeburner? So I extend kubeburner to create VMs object, you know convert object and Collect some detailed latency information that I will describe later
Regarding the configuration of the system so kubeburner is configured in order to create more 400 VMIs per node It's configured the virtual handler maximum device so before this experiment actually this configuration was hard-coded of
110 which is the default maximum amount of pods per node But now it's configurable now and actually by default it has 1000 also When we are creating many VMIs in only one node it has some slowdown and
By default the QEMU timeout to create to deliver to interact with the QEMU Timeout was the original option to interact with the QEMU was 240 seconds I increased 900 seconds to be able to create 500 VMIs in the system and
I also increased the virtual controller, curves per second and burst configuration for the Clients that make requests to the kubeburner API by default it has 5 and 10 only and I increased to 200, 400 and I will show this
comparison between the default and the custom rate limiter configuration here, so and The the kubernetes configuration. I also need to increase the kubelite max pods I increased to 1000 and I also
configured that with the kube-api Curves per second and kube-api-burst increased that for 50 and 100 and it will be especially impacting the kube-lab here so the cluster configuration and the experiments Was running the IBM cloud bare metal nodes
And those nodes are used in the kube-verde CICD system to run performance tests now And it has 48 CPUs with a large node that we can be impacting more VMIs on that
Okay, so as I mentioned to you I'm running two different scenarios one that has the default rate limiter Which has you know only five curves per second and then I increase that with a custom rate limiter for 200 curves per second I Run two different scenarios and All of these scenarios I vary, you know, the number of VMIs from 50 100
200 300 and 400 and also has some interval between each run Okay, so regarding the results The most important metric here to understand the performance
It's the VMI latency how long it takes to create the VMI so from Create the VMI object up to the VMI is ready. It's defined as ready in which means delivered create the domain and start the domain and
We can see here also the Latency breakdown Which means when it's creating a VMI it has many, you know many phase and conditions For example when the VMI is created the virtual controller also is you know
Scheduled the request the creation of a pod and then the pod is scheduled And then the VMI will be scheduled as well And then they have also some synchronization between this phase of the pod to the VMI and how affects the latency and
We can we can see here, you know that the the rate limiter impacts differently When we are using the default and the custom one is most Especially for the scenario with 400 but we can see the difference the performance difference
Now the scenarios so considering here the p99 VM creation latency break and break down and We can see when was creating 50 VMs 50 VMI's it's By increasing the rate limiter. It's improved, you know, 32% and
then it was varying about around 20% improvement and Only in the scenario with 300 that it got worse performance, but I would discuss that later what happened here and With the scenario with 400 VMI's it's actually got
50% improvement in the performance here in the VMI latency okay, so and Additionally we can see in the break in the breakdown here the latest breakdown where the latencies so
We can we can see that the most you know important point points here where the latency is related is When the pod it's initialized so the pod is created and initialized So it's the time that creating the VM domain. It's waiting for that for the
virtual handler center also they start requests, so it's all this process and After the VMI domain is created and delivered. It's also need to you know Start and be recognized by the controllers that it it's is running now
So and then we can see that the VMI ready latency is also high here It's those latency here that are the most important latency that we can see in the VM migration Okay, so
Regarding the The what's happened when we increase the rate limiter the cars per second in the virtual controller, especially What's it's impacting so it's increasing the the the throughput so it has a higher throughput in the requests the rest Requests so we can see that them a higher rest
Rest regret rest request rate here when we increase the the the cars per second also, it's because it's You know It's be able now to do more requests the kube API Is
Also is speeding up, you know, the processing time in the work queue So it's able to process more faster and more, you know queues in the work queue and improving the overall performance of of the kube virtual components okay, so
now Why what's happened to the scenario with 300 VM eyes with the custom rate limiter? So it's show some slowdown Processing some events we can see here in the virtual handler. It has like a big spike here when the work queue Also, there are a lot of you know
retries especially in the virtual controller VMI and In the virtual handler also, there are a lot of retries here. So it's something it's happening and not Properly working as compared to the other scenarios
And To further, you know, try to understand that we can also see some big spikes in the virtual API regarding the CPU usage and That's happening to the scenario that wasn't happened out scenarios and Also in the work queue the Kubernetes work queue is more ad rate
Regarding the admission quota controller, so it's doing more requests In both the Kubernetes and Kubernetes API here Okay, so and what's the main reason about that it's Impact from previous execution so we can see that the scenario that was running to creating 200 VM eyes
It's got a lot of you know storage operation errors and It's it's related those errors are related to unmount to delete the VMs In fact, I I have the problems to many VMs
VMI's were object were stuck in the system and I need to force delete them and I also we can also see that it is You know force delete because so the CRD finalizers were not being removed from the object and we can see that Kubernetes
CRD finalizers had some slowdown here, especially in the work here. We can analyze that it was You know to Taking the 20 seconds to process CRD finalizer and also the the longest, you know
More than two minutes for example here, which actually it was very slow this process And was slowed down the deletion of the CRD object and because of that was also impacting the the next
Experiments, okay. Finally, I want to conclude here saying that It's regarding the overall Resource usage of the cluster. It's being you know properly, you know using other resource
it's as expected and Except from the the scenario with 500 VMI's where we can see some spike in the CPU usage and Which actually expected also because we are overloading the node
Okay, so find the final considerations in this work We demonstrate how to configure Kubernetes and Kuber to be able to create more than 110 VMI's most specifically to create 500 VMI's per node and
We also show that the resource in the node is not the only limitations but the control plane You know performance can also be a limitation when we want to create more VMs per node because they can be heavily overloaded and We also demonstrate that
Increasing the root controller queries per second and burst improves the performance. We also describe it that There is like high latency between the pod is ready and the VMI become ready which basically related to the performance of creating the VMI domain in the libvert and
how the controller reconcile and make the VM state ready and We also show that previous experiments impact the performance of the subsequent executions
questions
Okay, can you hear me? I'm not the main Let me check this
Okay, it should be fine now, okay, I'm seeing here one question It's the three minutes cooling interval between the experiments
So this is basically You know when we delete everything All the VMI's or the Kubernetes object it takes some times for the garbage collector to remove these objects from the etcd for example
And and also To you know decrease the the VMS sorry to decrease the heap memory usage For the garbage collection works. So it depends for many factors when the garbage collector is actually triggered and
Because of that just for a safe to merge To avoid as much as possible some experiments to interfere the performance of the other are easily 30 minutes But as you saw it's actually happened that one of the scenario
I experiment You know, I previous execution Had some fail failures and actually impacted in the performance of the Subsequential experiment, so that's was the main reason for that. Okay
I'm seeing here another question about how much overhead is running on libvirt daemon per VM Okay, so it's well, this is interesting. It's
It's basically The overhead that's more important here. I would say that it's for the memory So in my case each VMI, so I think I have it in these lights. My case each VMI. I think it's using 200 something megabytes of memory
additional to the memory utilization of the VM Yeah, just for locate for the overhead. So libvirt and the other pods that are running alongside the VM so and For the CPU itself
It's I don't remember now that The amount of CPU overhead, but it's short because libvirt is only using more CPU when it's actually creating the domain and starting the domain and After that the all the CPU usage is more related to the to the VM itself, but the
Memory overhead, it's actually it's something that it's must be taking in consideration. Yeah any other question
Okay, so regarding the VMI creation latency That someone is asking it's depends. So depends how many VMs are creating at the same time. So For example in my experiments, I have this I would say highly high load of 20 VMS creation per second
So it's a lot of VMS being created per second, so it's when I create only 50 VMS, it's around one minute to create each VM in the worst case and However, it's I
Think it's lost. However, when you create more VMS, it takes more time to be created it's not like pods that it's Constant the VM creation time the VMI actually Linear increase with the number of VMS to be created the time to predict