Barometer: Taking the pressure off of assurance and resource contention scenarios for NFVI
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 644 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/41168 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | |
Genre |
FOSDEM 2018239 / 644
2
3
5
6
7
8
10
12
16
18
27
29
30
31
32
34
39
46
47
48
55
57
58
61
64
67
76
77
80
85
88
92
93
98
101
105
110
114
115
116
118
121
123
128
131
132
133
134
140
141
142
143
146
147
149
162
164
171
173
177
178
179
181
182
184
185
187
188
189
190
191
192
200
201
202
204
205
206
207
211
213
220
222
224
229
230
231
233
237
241
242
243
250
252
261
265
267
270
276
279
280
284
286
287
288
291
296
298
299
301
302
303
304
305
309
310
311
312
313
316
318
319
322
325
326
327
329
332
334
335
336
337
340
344
348
349
350
354
355
356
359
361
362
364
365
368
369
370
372
373
374
376
378
379
380
382
386
388
389
390
393
394
396
400
401
404
405
406
407
409
410
411
415
418
421
422
423
424
426
427
429
435
436
439
441
447
449
450
451
452
453
454
457
459
460
461
462
464
465
470
472
475
477
478
479
482
483
486
489
490
491
492
493
494
496
497
498
499
500
501
503
506
507
508
510
511
512
513
514
515
517
518
519
522
523
524
525
527
528
534
535
536
538
539
540
541
543
544
545
546
547
548
550
551
553
554
555
559
560
561
564
565
568
570
572
573
574
576
578
579
580
586
587
588
590
593
594
596
597
598
601
603
604
606
607
608
610
613
614
615
616
618
619
621
623
624
626
629
632
633
634
635
636
639
641
644
00:00
Service (economics)Projective planeInformationService (economics)Computer animationLecture/Conference
00:35
Service (economics)System programmingPhysical systemSpecial unitary groupData managementMetropolitan area networkMedical imagingMathematical analysisReading (process)Plug-in (computing)DisintegrationPhysical systemCore dumpPlug-in (computing)Data storage deviceFunctional (mathematics)Cycle (graph theory)ChainMetric systemEnterprise architectureInstallation artComputing platformSoftwareMedical imagingGroup actionMobile appQuality of serviceReading (process)Expected valueBitMultiplication signCartesian coordinate systemSlide ruleAreaWeb pageAdditionProjective planeDifferent (Kate Ryan album)Interactive televisionConfiguration spaceWahrscheinlichkeitsfunktionMoment (mathematics)INTEGRALSoftware testingService (economics)Data managementConnectivity (graph theory)Software developerGame controllerCompass (drafting)MereologyWritingDemo (music)Black boxEvent horizonOpen sourceAnalytic setStandard deviationOpen setSet (mathematics)Keyboard shortcutArithmetic progressionNormal (geometry)Translation (relic)Revision controlJava appletSemiconductor memoryPersonal digital assistantComputer architectureLevel (video gaming)WorkloadType theoryData centerDemosceneRight angleVirtual machinePoint (geometry)Scripting languageCache (computing)Computer animation
09:43
Web pageConfiguration spaceMetric systemBefehlsprozessorParameter (computer programming)Structural loadCASE <Informatik>Graph (mathematics)Order (biology)Plug-in (computing)Revision controlReading (process)Data storage deviceDifferent (Kate Ryan album)Matrix (mathematics)Polygon meshVisualization (computer graphics)FluxSource code
13:18
Plug-in (computing)WikiService (economics)Plug-in (computing)NumberMetric systemCollaborationismFrequencyData managementDemo (music)Multiplication signElectronic mailing listSlide ruleMereologyAbstractionGroup actionNeuroinformatikCASE <Informatik>Power (physics)WorkloadBitAnalytic setConfiguration spaceComputer animation
16:21
MultiplicationPoint (geometry)Plug-in (computing)DatabaseTime seriesDifferent (Kate Ryan album)FrequencyMoment (mathematics)Metric systemComputer clusterServer (computing)SubsetMultiplication signCASE <Informatik>Connected spaceSoftwareOverhead (computing)BefehlsprozessorFreewareUtility software2 (number)Interrupt <Informatik>Core dumpLecture/Conference
19:01
WikiRow (database)Electronic mailing listEmailComputer animation
19:17
Service (economics)Computer animationProgram flowchart
Transcript: English(auto-generated)
00:06
So next speaker up is Emma Foley. From Intel, we'll be presenting on Barometer, which is an OP-NFE project. Hi folks, I'm going to be presenting on Barometer, which is an OP-NFE project.
00:24
So instead of actually just telling you a lot of information, I'm going to answer a lot of questions, and at the end you can ask some questions as well. So first up is, what is Service Assurance and why do we need it?
00:45
So basically, as we become more and more reliant on the Internet, data centers have played a bigger and bigger part in our lives. And as we move from traditional network deployments, so fixed-function network appliances, to NFV, data centers have become more and more important.
01:05
So as Terkel and Enterprise do this transition, we end up with a lot of tooling, a lot of infrastructure that's becoming more and more complicated. And because industries are going to have to meet or exceed the expectations that customers have for Service Assurance, QoS and SLAs.
01:35
They're going to need additional tooling, additional metrics available to actually monitor their systems for malfunctions and misbehaviors that can cause downtime.
01:47
Unfortunately, existing solutions may not actually be enough here because as the tooling gets more complicated, you need to be able to monitor not only the platform, but also software applications as well,
02:02
and relay these metrics to management and analytics engines that will manage your virtualized infrastructure. So this is where CollectD comes in initially, and I know CollectD has been around for a very long time. However, this is good because it is widely deployed and the industries that are moving across to NFV,
02:28
it's a tool that they're likely already using which will help ease the adoption and ease the transition into NFV. So a bit about CollectD first is it's got a plugin-based architecture which makes it really flexible and really configurable.
02:44
And these plugins come in a few different types. Read plugins actually access the metrics from your system. Write plugins to relay these metrics up to higher level analytics engines. And notification plugins, which would be equivalent to producing events from your system.
03:03
Logging plugins, which is pretty self-explanatory. And also a set of binding plugins, so you're not limited to actually writing these CollectD plugins in C. You can extend it using Perl, Java, or Python if you want to.
03:20
CollectD sounds great, however there are some gaps, and this is where Barometer comes in. First of all, Barometer is an instrument for measuring atmospheric pressure. It's also a project in OP-NFV. And for those of you that missed the last session, OP-NFV develops and improves NFV features in upstream ecosystems,
03:49
and also provides integration, testing, and installation to produce a reference platform for NFV, which helps industries to adopt, it's designed to facilitate the adoption of NFV.
04:04
Barometer is one of these projects, and it is concerned with feature development, primarily in CollectD, to cover the gaps that we've found in that and make it more suitable for NFV deployments.
04:24
We've produced a lot of plugins to help monitor the platform and make more data available. So not only can you monitor generic compute networking and storage, you can also get more in-depth details from your platform.
04:40
This is metrics that were already available on Intel platforms, but is now exposed through CollectD. And also metrics from applications like DBTK and OVS, which would not be relevant in traditional deployments, however they are very, very relevant as we move towards NFV.
05:02
So once those metrics are available in CollectD, they're pretty much useless unless you can actually talk to your management and orchestration and analytics engines and interact with components such as OpenStack, ONAP, Kubernetes, and so on. So along with the read plugins, we've also produced a bunch of write plugins
05:26
to talk to OpenStack via Anyaki and send notifications to OpenStack through A. We've demonstrated how you can integrate with CollectD, CAdvisor, relay all your metrics to Prometheus
05:41
and actually use that platform data and application data in Kubernetes and produce some plugins for those so we can relay the metrics up to ONAP. As well as that, we've done some work on sending these metrics via SNMP so that legacy systems can actually use the metrics.
06:03
Again, this is to help ease the adoption so you don't actually have to change your whole toolchain to use NFV. These are supposed to be pretty quick slides, more details on our read plugins. So DPDK stats, vSwitch stats, huge pages stats, cache monitoring, additional memory.
06:26
Again, libroach is one here so you can actually monitor your workloads running on virtual machines without installing CollectD on the VMs themselves which means that you're not interfering with black box commercial VNFs
06:42
and you still get the same level of metrics as you would have if you had more control over your VNFs. Again, write plugins, SNMP, Nyaki and Vez and as well as feature development in CollectD,
07:02
Barometer has worked on standardization and making sure that the metrics produced actually are compliant with open standards for metrics collection so that, again, if you have other tools, you don't have to spend a lot of time writing normalization or translation plugins that you can supplement
07:24
and interact and interoperate with different applications. We've also provided installer integration so Barometer and CollectD wouldn't be much use in OPNFV if you couldn't actually install them so at the moment we have support for Fuel, Compass, Apex as well as Cola Ansible in OpenStack
07:48
and if you're interested also technically DevStack support. During the last cycle there was a lot of work done producing a reference container so if you want to get started with Barometer and CollectD you can pull down a Docker image from the OPNFV Docker hub
08:04
and start using it and this will include all the Barometer features that have been upstreamed. We're working on installer support for that reference container so that we'll always have the latest and greatest version of CollectD actually on the system just by installing that container.
08:21
That brings me up to a demo. It's a bit of work in progress at the moment to automate the configuration and deployment of CollectD using Ansible. So what I'm going to show is installing CollectD for compute nodes
08:40
from your master node or your controller node and configure them, deploying CollectD and then on your master node aggregating your metrics to that one point and storing and displaying them.
09:05
So first of all our Ansible script is going to create CollectD configurations on our compute nodes.
09:29
This is a short demo, it's about four minutes and it hasn't been sped up. I don't think you can read it anyway.
09:40
So what's happening here is on our master node we're using Ansible to first of all configure CollectD on our compute nodes. What it does is for each parameter plugin it checks if the requirements are met and then enables and configures appropriate plugins.
10:09
So now we're done configuring on four nodes. Just going to check that those configurations exist. As well as enabling the read plugins, this is also configuring the compute nodes
10:21
to send the metrics back to our master node. Now we're going to actually deploy the container. I'll first check that there is actually no container running in case anyone had doubts. So that's CollectD deployed on four different nodes.
10:44
I'm going to check that it's running. So next up we have to set up storage using InfluxDB
11:03
and also we want to set up Grafana so that we can see the metrics that are actually produced in a nice visual dashboard. I'm having trouble reading this from here so the back of the room don't worry.
11:24
So we're using Docker Compose to set up those two containers, Influx and Grafana. Not only does it actually deploy Grafana but it also sets up a load of preconfigured dashboards so you don't have to spend hours going through the metrics that are available and picking what to put on your graphs.
11:53
Just want to add that this hasn't been sped up and we're about two and a half minutes in. So as you can see there's a lot of metrics coming in
12:01
and we can see what's going on on various different nodes.
12:32
What we're seeing is just a compute usage per node. You can get a cumulative aggregated version or you can see per CPU metrics as well.
12:47
In order to show you that there's actually something happening we're just going to stress the CPU so you can see how the metrics do change and how quickly they're collected and updated. So we can see that activity that we just kicked off.
13:24
So that was a four-minute demo on how to set up Barometer. I think that's the first time we've actually shown Barometer being deployed. Although it's not the first time we've actually shown Barometer in action. Whether you knew it or not, and all these demos that have been showcased at OpenStack and OP-NFV summits,
13:42
anything to do with metrics collection, with Docker, with Retrage, with OpenStack Watcher. What they were doing underneath was collecting metrics using those Barometer features. So if you can look at those later, I think the slides will be put up soon.
14:02
After that, where does Barometer go from here during our next release? More plugins, obviously more plugins. I'm not going to go through them here. There's a list on the OP-NFV Wiki, the Barometer Wiki, on what's actually planned. However, if you have any plugins that you want to see enabled
14:21
or that you're enabling any plugins, Barometer team is usually happy to help with reviewing pull requests on CollectD. I'm going to do some work on CollectD Cloudification. This is to address some issues or some gaps we saw at the start with actually the configurability of CollectD
14:41
and actually deploying it over multiple nodes. Namely that if you want to reconfigure CollectD, you have to restart the service. And as you might be collecting metrics at a very high frequency over a lot of nodes, this could obviously take a lot of time
15:01
but also cause a discontinuity in the metrics so you have gaps in your history, which is not ideal. So what we plan to implement is a bit of an abstraction, an API on top of it so that you can configure it on the fly, which is handy in situations where, for example, at peak times you may want to collect metrics
15:23
at a much higher frequency, or if you migrate your workloads and consolidate them into a smaller number of hosts, for example, to conserve power, you may want to increase the intervals so you're not collecting metrics as often,
15:41
or you may want to enable for certain workloads and certain compute hosts, you may want to change over time the metrics that are actually available. So it's part of the motivation to make it more configurable and more dynamically configurable.
16:02
Of course we're always open to collaborations and would like to see more people consuming barometer and barometer features. Basically the goal in the next release is to enable more services to consume data and telemetry for all kinds of use cases, including orchestration, management, governance, and audit and analytics and so on.
16:25
Does anybody have any questions over here? The question was what are we using as our time series database for CollectD?
16:41
CollectD supports multiple time series databases. You could use Nyaki or you could use InfluxDB or any other database that it actually supports. You're not limited to the features that I've outlined here.
17:03
How many data points are collected per host per second? That depends on a lot of things. CollectD has over 100 plugins available at the moment. You're only going to want to enable a subset of these plugins. Each plugin would have many different metrics available and it would also monitor or collect metrics
17:22
on many different resources at the same time. For example, CPU, you'd use utilization, free interrupts, a bunch of other things. That would be per CPU core, per host, and that's just one plugin. The frequency of which you collect them really depends on your use case as well.
17:40
I think we've tested down to sub-milliseconds. How much overhead does that collection impose? I don't have the answer for that right now.
18:01
If you follow up, I might be able to find out or provide you some tools to find out. Any more questions? Did I speak too fast?
18:29
If I want to run Barometer, can I run it on hosts and containers and VMs and so on? Yes, you can, as long as there's network connectivity between them. You can relay the metrics from any host that's running CollectD
18:42
to a designated CollectD server via its network plugin. So if there are no more questions, I will turn on the light again
19:00
so that we can see everybody. Thank you very much Emma. So if there are more questions, wiki, mailing list for the recording. Thank you very much.