We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Do you really see what’s happening on your NFV infrastructure?

00:00

Formal Metadata

Title
Do you really see what’s happening on your NFV infrastructure?
Subtitle
(and what can you do about it?)
Title of Series
Number of Parts
490
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
As CoSP’s accelerate their adoption of SDN and NFV technologies, the increased need for metrics, performance measurement and benchmarking becomes a focus, to ensure the continued delivery of “best in class” services. As NFV environments have grown in size and complexity, the tools required to gain this greater visibility into the NFVi need to continue to evolve to meet the requirements for manageability, serviceability and resiliency. Using Collectd as a metrics collection tool, OPNFV Barometer monitors the performance of the NFVi resources and has the capability to expose these insights via open industry standard interfaces to analytics or MANO components for potential enforcement or corrective actions. Barometer works with related open source technologies and communities (collectd, DPDK, OpenStack, Prometheus, SAF, etc.) to provide numerous metrics and events that address various different use cases such as service healing, power optimization and ensuring application QoS.
Computer animation
Personal digital assistantProjective planeVector potentialCASE <Informatik>Computer animation
Web serviceEnterprise architectureAverageSystem programmingComputing platformDifferent (Kate Ryan album)CausalityMultiplication signService (economics)Physical systemOrder (biology)Complex (psychology)WorkloadLevel (video gaming)Data centerComputer hardwareAdditionSoftwareTerm (mathematics)Sound effectPoint cloudSoftware maintenanceFunctional (mathematics)Metric systemCartesian coordinate systemIntegrated development environmentFood energyEnterprise architectureComputing platformComputer animation
CASE <Informatik>Multiplication signMeasurementTerm (mathematics)Projective planeMereology
Function (mathematics)Time evolutionComputer networkVirtualizationOpen sourceSoftwareComputing platformBuildingComponent-based software engineeringStandard deviationOpen setComputing platformÜbertragungsfunktionVirtualizationProjective planeSoftwareComputer animation
Software testingMetric systemDisintegrationComputing platformComputer networkData managementSystem programmingSoftware developerMetric systemINTEGRALGroup actionPhysical systemData managementLevel (video gaming)Projective planeComputer animation
Metric systemPlug-in (computing)StatisticsDemonOpen sourceEvent horizonType theoryThresholding (image processing)Computer networkJava appletProjective planeInformationCAN busCommunications protocolINTEGRALComputer fileMultiplication signCASE <Informatik>Metropolitan area network40 (number)Core dumpClient (computing)SoftwareServer (computing)PlanningEvent horizonSound effectMetric systemOperating systemMultiplicationBlogVotingPlug-in (computing)Endliche ModelltheorieMereologyComputer hardwareScripting languageLatent heatTime seriesComputing platformAnalytic setPerfect groupDemonRepository (publishing)Cartesian coordinate systemJava appletOpen setComputer animation
Metric systemStandard deviationFunction (mathematics)Data managementArchitectureWeb serviceData structureDatabaseData modelComputer hardwareSimultaneous localization and mappingComputing platformTraffic reportingVirtual realitySimulationStandard deviationMetric systemSet (mathematics)Self-organizationComputer architectureAnalytic setLoop (music)Decision theoryLatent heatCartesian coordinate systemPhysical systemComputing platformMereologyFeedbackGroup actionEndliche ModelltheorieTournament (medieval)Uniform resource locatorPlug-in (computing)Basis <Mathematik>Computer animation
Remote Access ServiceMetric systemTime domainIntelBubble memoryOpen setVirtual realityDisintegrationComputing platformPoint cloudRAIDPlastikkarteData storage deviceFrequencyBendingElectronic meeting systemData storage deviceSoftwareCASE <Informatik>NeuroinformatikCoprocessorBand matrixDrop (liquid)Level (video gaming)Hand fanProcess (computing)Computing platformBlogBit rateINTEGRALError messageLattice (order)Latent heatSystem callMoment (mathematics)Plug-in (computing)Product (business)Branch (computer science)Cache (computing)Event horizonProjective planeSemiconductor memoryPower (physics)Core dumpPoint cloudPCI ExpressUtility softwareParsingComputer animation
Metric systemElectric currentMetric systemGrass (card game)DatabaseMultiplication signIntegrated development environmentPersonal digital assistantSoftware developerHand fanInformationSoftware testingData managementStructural loadWechselseitige InformationValidity (statistics)CuboidProjective planeLevel (video gaming)Computer virusCore dumpInternet service providerSampling (statistics)Installation artPhysical systemFile formatHookingSoftwareMathematical analysisGraph (mathematics)1 (number)Plug-in (computing)Time seriesComputer animation
Server (computing)1 (number)Bounded variationSelectivity (electronic)Web 2.0TimestampBitMetric systemEndliche ModelltheorieFigurate numberWritingPlug-in (computing)Remote procedure callMultiplication signBit rateServer (computing)Different (Kate Ryan album)Arithmetic meanMatching (graph theory)Scaling (geometry)Point (geometry)Data storage deviceComputer animation
PlastikkarteGateway (telecommunications)Plug-in (computing)MiniDiscRead-only memoryWeb pageEvent horizonSoftware frameworkWeb serviceShift operatorServer (computing)Roundness (object)Cartesian coordinate systemInstance (computer science)Shared memoryAsynchronous Transfer ModeService (economics)Computer animation
Shift operatorPlug-in (computing)Latent heatMiniDiscRead-only memoryWeb pageEvent horizonGateway (telecommunications)Web serviceSoftware frameworkStatisticsPlastikkarteOpen setCartesian coordinate systemSoftware frameworkGateway (telecommunications)Moment (mathematics)Multiplication signPlastikkarteRemote procedure callMetric systemLocal ringService (economics)Event horizonPlug-in (computing)CASE <Informatik>Figurate numberGame theoryOpen setPoint (geometry)Right angleCubeShift operatorGroup actionComputer animation
Metric systemMetric systemAxiom of choicePhysical systemImplementationBitWeightComputer animation
Proof theoryOpen sourceComputer networkStatisticsSystem callPhysical systemSoftwareMetreOpen sourceWater vaporMereologyPerfect groupOffice suiteVirtualizationStorage area networkComputer animation
Virtual realityMaxima and minimaWeb serviceComputer networkGateway (telecommunications)Read-only memorySimultaneous localization and mappingEndliche ModelltheorieGroup actionSemiconductor memoryNetwork topologyProcess (computing)Proof theoryAsynchronous Transfer ModeCrash (computing)Error messageProgrammable read-only memoryComputing platformPoint (geometry)Interrupt <Informatik>MereologyOpen setInstance (computer science)Service (economics)Different (Kate Ryan album)Multiplication signDialectHigh availabilityFlow separationComputer animation
Thresholding (image processing)Pressure volume diagramMetric systemRange (statistics)Read-only memoryError messageStructural loadBand matrixSemiconductor memorySimilarity (geometry)Metric systemCache (computing)Band matrixPrice indexBefehlsprozessorWorkloadLevel (video gaming)Noise (electronics)Insertion lossHeat transferNeuroinformatikUniformer RaumEvent horizonComputer animation
WorkloadIntelScheduling (computing)Web serviceContext awarenessComputing platformProof theoryScheduling (computing)2 (number)Service (economics)Extension (kinesiology)InformationMetric systemHeat transferContext awarenessComputing platformPrice indexDecision theoryConnectivity (graph theory)Default (computer science)Stack (abstract data type)Reading (process)Film editingCommutatorComputer animation
Analytic setBefehlsprozessorIntelMountain passServer (computing)Power (physics)Computing platformDrop (liquid)InformationCross-correlationResultantSocial classFrequencyLink (knot theory)Multiplication signLine (geometry)Operator (mathematics)BefehlsprozessorSlide ruleDemo (music)Bit rateDevice driverAsynchronous Transfer ModeCore dumpComputer animation
Power (physics)Default (computer science)Analytic setFrequencyOperator (mathematics)Food energyPattern languageTurbo-CodeResultantLine (geometry)Set (mathematics)Power (physics)BefehlsprozessorCore dumpMultiplication signMathematicsService (economics)Physical lawTorusGreatest elementDiagramProgram flowchart
Mathematical optimizationIntegrated development environmentSoftware testingWorkloadDecision theoryComputing platformPredictionScale (map)Scaling (geometry)Green's functionRun time (program lifecycle phase)Data managementCASE <Informatik>Power (physics)Envelope (mathematics)Context awarenessOnline helpIntelInformation securityWeb serviceAbelian categoryService (economics)Phase transitionSoftwareMathematicsThread (computing)Projective planeFood energyPosition operatorCASE <Informatik>Computing platformGroup actionSoftware testingComputer animation
CollaborationismSoftware testingEvent horizonRAIDWeb pageCodeSoftware repositoryFeedbackEmailElectronic program guideSoftware frameworkTouch typingMoving averageBenchmarkSoftware testingTerm (mathematics)Figurate numberPhysical systemPoint (geometry)Inheritance (object-oriented programming)InformationMountain passSoftware developerElectronic mailing listWikiGame theoryPlastikkarteFluid staticsMultiplication signCollaborationismData storage deviceMetric systemWeb pageUniform resource locatorService (economics)EmailReading (process)Plug-in (computing)Moment (mathematics)Lattice (order)MathematicsUsabilityMereologyDynamical systemValidity (statistics)Link (knot theory)Projective planeTotal S.A.Metropolitan area networkMixture modelControl flow2 (number)File archiverSource codeStrategy gamePatch (Unix)Order (biology)HypermediaOnline helpRepresentation (politics)Roundness (object)Process (computing)SpacetimeStaff (military)Demo (music)Network topologyMiniDiscPlanningScaling (geometry)Computer architectureFunctional (mathematics)Configuration spaceScheduling (computing)Open sourceSoftware bugCodeDifferent (Kate Ryan album)Complex (psychology)Event horizonBitFormal verificationSystem callForcing (mathematics)ScalabilityTask (computing)Computer animation
Open sourcePoint cloudFacebook
Transcript: English(auto-generated)
Hello everyone. I'm Emma Foley, this is Christoph Gepke, and we're going to talk to you today about how to tell what's really going on in your NFV infrastructure, why you need
to do it, and what you can do about it if you can't see what's going on. Legal requirements have been met. First I'm going to do an introduction and talk briefly about barometer and barometers, and then Christoph is going to talk about
collectD, and I will talk about barometer again and how the two projects relate to each other, and then Christoph will talk about potential use cases, and I will switch over to plans, upcoming features, and open the floor to questions. So why do I need to know what's going on in my
infrastructure? Well, as telcos and enterprises move towards a cloud-based IT infrastructure, they start moving their workloads from fixed function network appliances to commodity hardware in order to reduce costs. But
as they move to general-purpose hardware, they become more and more reliant on the data center, and they become more vulnerable to the costs associated with data center downtime. Those costs are not just financial, although even a minute of data center downtime is very, very costly. The cost
also comes in terms of additional complexity required and service availability. So as they move from fixed function network appliances to the NFV environment, the tooling required to actually maintain, host, and
orchestrate this becomes more and more complex. At the same time, the requirements the customers have for maintaining service assurance, QoS, and same levels of availability, they remain constant, they need to be met or exceeded. This requires more and more complex tooling and more metrics to be
available in the environment. So that's additional complexity for deploying, additional hardship when actually maintaining the level of performance required, and then even more additional complexity in monitoring what you have
going on now. Because it is vital to monitor the systems, because there are many different things that can have an effect on performance, and many different things as you move up in complexity that can actually cause
downtime. And you move from not only having to monitor the platform itself, but also having to monitor the applications running on top of this, because you don't want something like OBS or DPDK or open-cycle communities to go down, because that would be disastrous.
This is where barometer comes in, and first off, a barometer, as in the scientific instrument, is a device for measuring atmospheric pressure. It is usually used for short-term weather forecasts, and another use that many people aren't aware of is that it can be used to
measure altitude or height above sea level. Now when scientists were designing a barometer, they probably didn't expect this to actually be a use of it. And the same way when the barometer project was created, there were a lot of use cases that have since emerged that we did not foresee at the
time. So barometer itself is part of OP-NFE, and I'll explain briefly what that is, because a barometer's relationship to OP-NFE dictates the activities that the project actually undertakes. So OP-NFE is the open platform for network function
virtualization. It's a Linux Foundation networking project, and it tries to ease the adoption of NFE. It does this by developing more NFE friendly features in upstream projects, and then
providing tooling to deploy, test, and integrate these same features. So that is what barometer does. Barometer is concerned with collecting metrics to help you monitor the NFE infrastructure, and exposing these metrics to higher level fault management systems that can actually
introspect, and analyze, and automate the management and fault detection in your data center. So like I said, barometer does testing, integration, deployment, and upstream development on a metrics collection. And that's what Kristof is going to
talk to you about, the upstream projects which barometer actually does contribute to. And I will try very briefly to explain what that project is, and Kristof will actually give you some more useful information.
Yes, collectd is pretty major piece of software. It is kind of veteran in the... Okay, perfect. So yeah, collectd is pretty
major piece of software. It is kind of veteran in the
deployments across the industry, very well deployed. It is there for about 16 years. During those years collectd was continuously evolving and adapting to industry needs. It is written in C, especially core daemon. It doesn't have any dependencies
and is built with small footprint in mind. It's open sourced, mostly MIT. Some older plugins are still GPL. As it doesn't have any dependencies, it is platform independent around on most of the available operating systems that are there.
It is providing you ability to collect multiple metrics and events included in the correctd repository. There is over 140 of them of various types. Some of them are reading the telemetry from various pieces, either from applications or from the platform
hardware or many other places. It's also able to write this telemetry to multiple ways, to Norbond, either to the file of the CSV or to some time series databases like InfluxDB or any other. There are also binding plugins.
In case if the plugins that are there are not enough for you, you can write some Python scripts or Java applications and feed them into the collectd core daemon to dispatch those applications for further integration with your analytics stack. There are also modules for logging, handling notifications,
aggregation, thresholding, filtering metrics. Interesting plugin is the network because it is able to read and write the data over the network with the collectd specific protocol. So collectd can be treated as a client that is producing the data,
but also as a server that is receiving them, do something with them, for example, aggregating and forward them somewhere further. So we know that collectd provides us ability to collect the metrics,
but which are kind of interesting for you. Why would you like to choose the collectd actually? There are existing standard organization bodies like ETSI or C-Entity that are working on the documenting specifications that are listing out the set of metrics
and capabilities that you are particularly interested in the NLV architecture. Today we are focused mostly on the NLVI, so the platform telemetry and part of the traffic telemetry, but there are also possibilities about to scrape the application telemetry
directly from the VNFs with some of the plugins, for example, DPTK telemetry to push all this data to some telemetry databases for analytics engine and closing the loop with the providing feedback to back to the manual systems to make decisions about corrective actions.
So what's more available in the collectd to monitor the NLVI? There are plugins like MCLock, PCIe errors or lock parser that are able to provide you specific counters about, for example, memory errors that are happening on your dreams
through the IntelliRansure technology or RAS features, which are basically features built for reliability, availability and serviceability, which are helping your platform to serve you longer, even if there are any failures occurring. The IntelliRasure technology allows you to monitor
per process ID or per core your cache utilization of the last level cache or memory bandwidth. VERT plugin provides you the insights into the delivered domains, so the compute storage or networking inside the VMs.
There are integration for OVS and DPTK that allows you to see what is happening on your network with some packet processing counters, including errors and the drop rates that are occurring there. There are Python-based plugins that allows you to write
this telemetry to the OpenStack for consumption. You can also push this data to the Kafka, to AMQP, to Prometheus or, for example, to the VNF event stream, which is a project in the ONAP. You can monitor the health of the storage or power consumption,
something closer to the platform in case you are, for example, selling the resources of the cloud to someone. You may be interested in out-of-bend telemetry, which is provided via Redfish or APMI. And there are also PMU counters that may be interested for you,
which are monitoring the low-level counters in the processor, which may be useful in some cases, like branch misses, mispredictions or cache misses. So now let's get back for a moment to Barometer.
This one's working again. Okay, you may have noticed that I like asking questions. So how does Barometer relate to CollectD? Well, CollectD helps us to collect metrics,
and that's the core of what we want to do. Because no matter what you're going to do with those metrics, no matter how you want to manage your NFV environment, you still need those metrics to be available and easy to access in whatever format you want and whatever higher-level management or automation that you use.
So CollectD helps us collect the metrics. And if this project didn't exist, basically we'd have a lot more work to do in Barometer. So it's only fair that we try to give back to the CollectD community. We do this not only by upstreaming our own features,
but also by helping the CollectD community in general. In general, onboard new contributors, review pull requests, and also Barometer itself provides a load of testing and deployment tooling, which feed back into the upstream CollectD CI
and provide validation information to developers on their pull requests, and also assists release time to actually validate the CollectD releases and make sure everything is working.
So if I want to play around with Barometer or CollectD and take advantage of all these new NFV features there, what can we do? Barometer takes care of some deployment tooling as well that makes it easier to install and integrate CollectD into whatever system you have.
You could also install CollectD from a package manager and configure it yourself, but this gets a little bit tedious after one or two servers. So what we've done in Barometer is we've containerized... We've containerized CollectD, and we've written a bunch of Ansible playbooks
to automatically configure all the plugins that we think are relevant. You can also put in your own. So this one-click installer will let you install CollectD as is, or install alongside InfluxDB and Grafana, or alongside Prometheus.
And this is a few examples of how the metrics can actually be consumed, and I will talk about some of the pros and cons of these reference deployments. So first up is InfluxDB and Grafana. This is a very simple architecture.
You dispatch the metrics from CollectD via its network plugin, and these are sent to the time series database InfluxDB. From here, you can grab the metrics for whatever offline analysis you want to do, hook it into any existing tooling you have that talks to Influx,
or create your own tooling around that and pull those metrics. Or very simply, you can right out of the box get some nice graphs from Grafana, and if you're running Grafana 4.0 or above, you get some basic alerting as well.
So Prometheus is very popular, especially when you're talking about Kubernetes and cloud-native infrastructures. But there is a slight problem when you try to deploy Prometheus with CollectD, in that CollectD has a push model for metrics, and Prometheus has a pull model. So CollectD, as it doesn't have any in-built storage,
has to put those metrics somewhere until Prometheus pulls them. So there's two plugins that do this. There is the write Prometheus plugin, and there is a CollectD exporter. Both of them work in the same way, in that they create a small little web server which hosts the metrics until Prometheus comes along and scrapes that remote endpoint.
As your infrastructure scales, this becomes a little bit problematic, because, well, Prometheus is scraping a whole bunch of remote endpoints, and that takes a non-zero amount of time. Eventually, what happens is the time it takes for Prometheus to scrape the metrics
from all the hosts. In that time, CollectD will have created more metrics, and those will overwrite the existing ones. So you end up with larger infrastructure, missing data. And another issue with this is that the timestamp recorded by Prometheus
is the actual scrape time. And this may not be the same as the collection time for the metric. So if it's a small deployment, you can probably get over that, because it's a small variation. But as you scale up, the differences become more and more profound.
And this actually limits the rate at which you can collect metrics. So there's a lot of trade-offs that have to be made. I'm pretty sure it's something else to say about that. But I'll figure that out.
So the issues here would be the metrics selection time is not being preserved, and the latency involved means you have to trade off as you scale up. I thought I remembered for a second what else was wrong. Oh, yeah. Normally, when this happens, you would just deploy more instances of your application.
But Prometheus explicitly operates in a single server mode. So there's always only one Prometheus instance. If you want high availability, you deploy two or more Prometheus instances. But you can't share the data between them.
Each Prometheus instance will be scraping all the endpoints, and that doesn't really solve the problem of latency. So that's where the service telemetry framework comes in. This is pretty new, in that up until Wednesday, it was called SAF. But we had to change the name. So in this case, you still have CollectD running.
You still have these same plugins configured. But instead of exposing a local scrape endpoint, a remote scrape endpoint for Prometheus, all the metrics are dispatched over AMQ and then received on the other side in the STF application,
which is hosted at the moment on OpenShift. The metrics then are pulled off the AMQ bus, by an application called Smart Gateway, which exposes the metrics on a local scrape endpoint to Prometheus. And Smart Gateway also takes care of the issue
with the write time versus the scrape time. So then the metrics are available in Prometheus the same way as they were before. And this also takes into account events. And those are available through Elasticsearch. And this looks complicated. It actually ends up not being very complicated
because all the orchestration for that is taken care of by the Service Assurance Orchestrator, which actually deploys all of these for you. So with that, we have the metrics available.
And you can make them available to whatever other system you want, with maybe a little bit of effort, maybe a lot of effort, but there's a lot of reference implementations and a lot of choices. I like to use the STF acronym here. So you can use these metrics to stop your system from being stressed to failure,
or you can use them to see the future. Or Kristof can tell you some ways that we're actually using them. Okay, so let's start with the first one that was actually being used pretty recently in November.
So during the KubeCon in San Diego, they have deployed... You're gonna have to start again. Is it not working? It should be working. Can you not hear? You can hear me? Okay, perfect. So yeah, they have deployed full open source 5G network.
They have made a call from one city to another. And as part of this virtual center office there, for the monitoring, the actual barometer have been used. So here we can see a Grafana dashboard that show us some statistics from the system.
So pretty cool big stuff. Now let's get to something simpler. This is actually just a proof of concept. Here on the one server, we have two VBNG instances that are running in hot standby mode. So one is actively processing the traffic.
And the other one is waiting in one standby mode. They are deployed on two separate NUMA nodes. So they have different memory regions. And the resiliency part here is that if the one point on active VBNG instance
is memory is getting corrupted during that time, it's just getting degraded. The telemetry from there is scraped from the memc log plugin. It is dispatched to the Prometheus. And if the increase of the corrected memory errors
is increasing too fast, because usually memory is starting to send generate some corrected memory errors but small amount in the time. But as they are appearing more and more, it is more probable
that we will hit the uncorrected memory errors that could crash our platform. So before that actually happened, we may find out the increase of the happening of the correct memory errors and do something with it. So in this proof of concept, we are just triggering the remediation action which moves the traffic from one VBNG instance
to another, just to simulate the high availability. So that was one of the first proof of concept to show that it is possible based on monitoring the platform telemetry prevent the outage time or shorten
as much as possible any service interruptions. But there are more than just memory being corrupted. You can also watch, for example, temperature headroom to prevent any CPU throttling, or you can watch for the last level cache occupancy or memory bandwidth to prevent any noisy neighbor
impacting or affecting your workloads. You can combine all of those metrics into some similar indicators about half of your platform or compute node, which leads us to the second proof of concept
that was doing actually that. So we have two compute nodes managed by Kubernetes. We are scraping the RDT, PMUI, PMI and transfer technology metrics. We are pushing this to the Kafka stack for streaming analytics, which we are calculating in this host indicator
and it is providing this information to the Prometheus. Now let's take a look at this new component there that we are seeing, the telemetry aware scheduler. This is extension to the default Kubernetes scheduler that is making it aware about the telemetry
to help it with the scheduling decisions. So you can feed their policies which are monitoring the particular metrics and you can say if the platform, for example, is healthy, you can deploy there something new. If there are some minor issues or the resources are being saturated,
then you can just keep what's already there but do not schedule anything new or even if there are any critical issues, you can evacuate everything and reschedule on more healthy nodes.
So by monitoring these metrics, you can perform those actions and prevent and do some service healing and platform resiliency. Then I can just quickly tell briefly about the parsing demo. On all of the slides at the bottom,
there is a link to where you can find more detailed information about those demos. So here we have a Kubernetes cluster that was running the vcmts pods. That were using pull mode drivers. So they were eating 100 CPU all of the time. The platform telemetry has been pushed to the InfluxDB
and then it was being monitored by an ITX engine that was previously trained to find the correlation between platform telemetry, CPU core frequencies for performance and packet drop rates. And let's see the results.
The red line here is the actual traffic pattern from one of the operators. From peak to peak, it's 24-hour period time on the top. And the blue line shows us the power consumption. On the top, we are using the default Linux power governance performance settings.
So it is keeping all of the cores always on the turbo. In the middle, we are seeing the on-demand, but due to the 100 CPUs utilization, it's also keeping very high power consumption. And at the bottom, we can see possible saving of the energy due to the lower course frequency
being managed by this analytics engine. And as we don't have much time, I will just skip that. In summary, there's much positive changes that you can do by monitoring the platform telemetry going through the service heating, energy optimization,
quality of your service. There's also possibility with the Intel thread detection to find the threads if someone is not trying to attack your platform, which are based on the PMU metrics, for example. But also, there are OP-NLV projects
that are utilizing Barometer and Colang-D. For example, VSPare, Bottlenecks, and T-Artstick that are using them in the testing phases. And as use cases are still growing, the software still needs to adapt and evolve to match them, which leads us to the next plans,
for example, in the Colang-D and in Barometer. So we don't have much time left raised through this. Up next, Barometer, in the next six months, we hope to help contribute to the Colang-D 511 release, particularly for a new DPDK telemetry plugin, which will use a new telemetry API in DPDK
and to precede the existing DPDK stats plugins that are available. We want to get the capabilities plugin merged, and that provides some static system information. Redfish plugin and MD events plugins are also in flight, as well as a bunch of bug fixes.
We are hoping to do more work on our Collect DCI to actually run more validation tests and help to verify CollectD patches and releases in an automated fashion. Always documentation updates.
And there are a bunch of metrics requests and collaboration requests from VSPurf, from the Mano API working group, and from the CNTT group, which is the common NFV testing task force.
So they want to provide a bunch of new representations and unify efforts across a bunch of different projects in the Linus Foundation and outside. So if you want to get in touch, we have a weekly Barometer meeting, Tuesdays at 5 p.m. UTC,
and biweekly CollectD meetings, Mondays at 3 p.m. Information about both of these is available in the mailing list archives, and you can get in touch by contacting the relevant mailing lists for both projects. And if you want to try out some of what you've seen,
the best place to go is GitHub for Barometer and CollectD source code, as well as the service telemetry framework. To get started with documentation on CollectD, their wiki is pretty comprehensive and lays out all the configuration instructions for each individual plugin.
If you want to dive into plugin development in CollectD, we put together a plugin development guide in Barometer, which is focused solely on getting your first plugin up and running. And if you want to get involved by contributing features or share your own requests or requirements,
that information is on the OPNFE wiki on Barometer's page. All of these links will be up on the schedule later, so you can find them there. If you want to contribute to CollectD, there is a bunch of different things you can do.
Down to starting with simple testing, or you can contribute changes, both features and bug fixes, or you can provide code reviews, more information on that link.
And if you just want more information, you are welcome to comment on pull requests, asking for clarification, or catch us on IRC, CollectD. And if you still want to get involved, there's actually a CollectD meetup
happening later this month in Munich. This is the second one, and we're going to be discussing things like features, testing strategies, upstream processes, release processes, and discussing architecture and requirements for CollectD 6.0,
which would be the next major release of it, and would represent a lot of efforts to actually make CollectD more cloudy. So things like an API for submitting and querying metrics, and for dynamic reconfiguration of CollectD, because at the moment, it's pretty static in its configuration,
and also features like adding labels to metrics. So that they'd be a little bit closer in functionality to other Collectors that are available. Information on that schedule on Etherpad,
and meetup information on the mailing list. And before I finish, I would like to note that it was not just us helping with this work. Usually when you get someone up presenting, it's easy to forget. There's actually a lot of people
also contributing to the projects as well. So I'd like to thank these people that helped with various demos, development, and I suppose requirements, and driving the projects. And does anybody have any questions?
I had a call to look at CollectD recently, and the whole, every time you add a new data source or add a plugin, it seems to somewhat limited scalability. You're kind of bound to waiting for a new CollectD
release to add a new plugin to get that new piece of functionality. Are there any plans to make that more dynamic? Yeah, that's part of the discussion for 6.0, as it may require major re-architecture of the CollectD internals. So there are plans to make it more dynamically reloadable as part of this qualification.
Did that answer your question right? Yeah, the question actually was that there is an issue with the CollectD that if you want to change the configuration, you have to restart it. And are there any plans to change it? So yeah, the answer is yes.
You're presenting an approach to check extra layers on top of CollectD, so my question is, including benchmarks or how the scale,
what's your opinion on how to size the system for creating this? So the question was that CollectD produced a lot of metrics. This takes up a lot of disk space. How does this actually scale?
And what we've presented here shows additional layers of complexity. And do we have any benchmarks? Benchmarks for scaling or guides for scaling. We occasionally run benchmarks in terms of storage.
Mostly the metrics are dispatched to remote locations. And things like Prometheus aren't designed for long-term storage of the metrics. So typically they will be aggregated and archived to reduce the amount of metrics
you have to actually store. Unfortunately, the time's up. If anyone has any more questions, feel free to come up afterwards. Thank you.