Why favour Icinga over Nagios?
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Teil | 17 | |
Anzahl der Teile | 79 | |
Autor | ||
Lizenz | CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/19596 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
FrOSCon 201517 / 79
1
2
10
11
12
13
17
21
23
24
26
28
29
30
31
32
33
34
35
38
39
40
42
43
44
47
50
51
53
57
59
60
61
62
63
66
67
70
71
75
76
77
78
79
00:00
SoftwareOffene MengeFreewareEreignishorizontSelbst organisierendes SystemSoftwareentwicklerMultiplikationsoperatorSelbst organisierendes SystemEDV-BeratungBitForcingTermHilfesystemTwitter <Softwareplattform>FaserbündelProjektive EbeneAutomatische HandlungsplanungXMLUMLVorlesung/KonferenzComputeranimation
01:10
Offene MengePhysikalisches SystemMomentenproblemZellularer AutomatProgrammierumgebungOpen SourceUnternehmensarchitekturServerDatei-ServerComputeranimationXML
02:29
Offene MengeMetropolitan area networkDiskrete-Elemente-MethodeInterface <Schaltung>W3C-StandardVersionsverwaltungStochastische AbhängigkeitGradientPhysikalisches SystemWeb SiteSpeicherabzugFormation <Mathematik>Datei-ServerParametersystemFokalpunktDienst <Informatik>RechenzentrumStabBenutzerbeteiligungWeb-ApplikationUnternehmensarchitekturCodeBenutzeroberflächeProgrammierumgebungVersionsverwaltungServerProgrammfehlerStrömungsrichtungWeg <Topologie>Fitnessfunktionsinc-FunktionSoftwareentwicklerBitQuaderVorlesung/KonferenzComputeranimation
04:42
Regulärer GraphEreignishorizontKlasse <Mathematik>Installation <Informatik>BenutzerbeteiligungInterface <Schaltung>BenutzeroberflächeBitMaschinenschreibenEndliche ModelltheorieDatenbankHardwareLastPhysikalisches SystemSpeicherabzugMini-DiscWeb ServicesMessage-PassingQuick-SortSystemaufrufRuhmasseRegulator <Mathematik>Vorlesung/KonferenzComputeranimation
06:30
EreignishorizontWeb logWurzel <Mathematik>AdditionRechenzentrumServerRouterIntegralMetrisches SystemSichtenkonzeptInterface <Schaltung>Graphische BenutzeroberflächeWorkstation <Musikinstrument>Vorlesung/KonferenzComputeranimation
07:37
IntegralDesintegration <Mathematik>QuaderCodeIdeal <Mathematik>VersionsverwaltungEin-AusgabeMomentenproblemDämpfungGebäude <Mathematik>BildschirmfensterStellenringStichprobenumfangQuick-SortAdditionQuaderBimodulCluster <Rechnernetz>ProgrammierumgebungAchtProgrammfehlerVorlesung/KonferenzComputeranimation
09:01
Desintegration <Mathematik>QuaderCodeCodeProgrammbibliothekMaßerweiterungQuaderDokumentenserverEinfache GenauigkeitVorlesung/Konferenz
09:49
SoftwareKeller <Informatik>Funktion <Mathematik>BildschirmfensterProgrammierumgebungBildschirmmaskeGeradeGruppenoperationPhysikalisches SystemZentralisatorInternetworkingZusammenhängender GraphNotepad-ComputerFächer <Mathematik>Installation <Informatik>Plug inDienst <Informatik>BenutzeroberflächeKonfigurationsraumSkriptspracheStandardabweichungDefaultComputeranimation
11:18
GammafunktionMetropolitan area networkSpeicherabzugt-TestVerweildauerDualitätstheorieOffene MengeLokales MinimumOpen SourceLoopEinfache GenauigkeitSingularität <Mathematik>Interface <Schaltung>ZustandsdichteRechenschieberMomentenproblemFlächeninhaltFokalpunktOpen SourceWeb SiteBenutzerbeteiligungCodeMAPProgrammierumgebungBefehlsprozessorInverser LimesLoopProjektive EbeneSpeicherabzugCoprozessorPunktOffene Menget-TestMultiplikationsoperatorFreewareMikroblogSoftwareStereometriePhysikalischer EffektArithmetisches MittelKomplex <Algebra>SchedulingSchnittmengeComputeranimation
14:12
Interface <Schaltung>SpeicherabzugSystemaufrufDienst <Informatik>Interface <Schaltung>ProgrammierumgebungBefehlsprozessorInstallation <Informatik>StatistikSoftwaretestLastRechenschieberRechter WinkelVorlesung/KonferenzComputeranimation
15:30
Singularität <Mathematik>ATMBimodulBinärdatenVarianzModul <Datentyp>ComputerspielProgrammbibliothekInverser LimesMaßerweiterungServerNabel <Mathematik>SkriptspracheMatrizenrechnungEndliche ModelltheorieE-MailAbfrageBimodulElektronische PublikationEreignishorizontCompilerVerschlingungReelle ZahlVorlesung/KonferenzComputeranimation
16:52
Kompakter RaumGraphComputerspielDatenbankInformationMathematikFunktion <Mathematik>IntegralEinfach zusammenhängender RaumMomentenproblemResultanteSpeicherabzugZellularer AutomatKonfigurationsraumAdditionInteraktives FernsehenBimodulElektronische PublikationEreignishorizontPasswortInterface <Schaltung>EinsStabMAPAggregatzustandMereologieExogene VariableSummierbarkeitBeobachtungsstudieEndliche ModelltheorieVorlesung/KonferenzBesprechung/InterviewDiagramm
19:14
Total <Mathematik>SkriptspracheMetropolitan area networkDokumentenserverARM <Computerarchitektur>Euler-WinkelFontRemote AccessReelle ZahlSoundverarbeitungSpeicherbereichsnetzwerkSimulationPhysikalisches SystemVerschlingungKonfigurationsraumVerzeichnisdienstElektronische PublikationMultiplikationsoperatorDefaultSchnittmengeFigurierte ZahlComputerspielProgramm/QuellcodeXMLComputeranimation
20:02
Total <Mathematik>Metropolitan area networkSoundverarbeitungEuler-WinkelInklusion <Mathematik>BinärdatenDienst <Informatik>DivisionDemo <Programm>Physikalisches SystemKonfigurationsraumElektronische PublikationComputerspielBimodulParametersystemInstantiierungEndliche ModelltheorieSocketProzess <Informatik>QuaderDigitaltechnikMathematikGeradeGewicht <Ausgleichsrechnung>DatenfeldMailing-ListeMultiplikationsoperatorComputeranimationVorlesung/Konferenz
21:49
InformationProgrammierumgebungVerfügbarkeitPhysikalisches SystemSpeicherabzugWechselsprungZentrische StreckungKonfigurationsraumHeegaard-ZerlegungElektronische PublikationWeb SiteARM <Computerarchitektur>ZweiRechter WinkelAggregatzustandZentralisatorGüte der AnpassungServerElektronische UnterschriftRandomisierungURLComputeranimation
23:48
Bitmap-GraphikMetropolitan area networkKonfigurationsraumInstantiierungDifferenteSoftwarePhysikalisches SystemResultanteTUNIS <Programm>VersionsverwaltungObjekt <Kategorie>InformationStellenringZentralisatorURLDokumentenserverVorlesung/Konferenz
25:14
AggregatzustandEinfach zusammenhängender RaumLastPhysikalisches SystemKonfigurationsraumInstantiierungElektronische PublikationObjekt <Kategorie>Rechter WinkelVersionsverwaltungMini-DiscDatenreplikationTesselation
26:26
PortscannerEreignishorizontTelekommunikationEinfach zusammenhängender RaumForcingInjektivitätKugelPhysikalisches SystemResultanteInstantiierungComputersicherheitNabel <Mathematik>Dämon <Informatik>NetzadresseChiffrierungTLSPlug inDigitales ZertifikatArithmetisches MittelZellularer AutomatZentralisatorPublic-Key-KryptosystemDatenfeldMonster-GruppeComputeranimation
28:50
KonstanteVersionsverwaltungMessage-PassingCodierung <Programmierung>VerzeichnisdienstLastMAPSpielkonsoleFormation <Mathematik>Physikalischer EffektZentralisatorServerPunktClientPasswortDigitales ZertifikatInstantiierungVorzeichen <Mathematik>Programm/QuellcodeXML
29:52
EreignishorizontDemo <Programm>RechenschieberKonfigurationsraumSystemzusammenbruchBildschirmmaskePunktspektrumComputeranimationVorlesung/Konferenz
31:13
Dienst <Informatik>AliasingServerSoftwaretestProgrammierumgebungDatensichtgerätE-MailRouterAttributierte GrammatikInformationMathematische LogikRechenzentrumProdukt <Mathematik>SoftwaretestProgrammierumgebungGruppenoperationPhysikalisches SystemZahlenbereichE-MailSystemaufrufKonfigurationsraumFamilie <Mathematik>ServerInternetworkingTemplateComputersicherheitNetzadresseCAN-BusKartesische KoordinatenMailing-ListeAdressraumDomain <Netzwerk>Objekt <Kategorie>Einfache GenauigkeitDienst <Informatik>BenutzerbeteiligungDefaultApp <Programm>Topologischer VektorraumMAPLokales MinimumZellularer AutomatGüte der AnpassungFontCASE <Informatik>Formation <Mathematik>RadiusWeb SiteMechanismus-Design-TheorieComputeranimation
36:14
Metropolitan area networkRouterParametersystemDatentypDienst <Informatik>SoftwaretestModemArithmetisches MittelStellenringE-MailServerParametersystemNabel <Mathematik>SkriptspracheEndliche ModelltheorieAttributierte GrammatikRechenzentrumBildschirmfensterSoftwaretestProgrammierumgebungInjektivitätInverser LimesKette <Mathematik>TUNIS <Programm>ZahlenbereichQuick-SortAbfrageATMRouterEigentliche AbbildungSchlussregelURLDienst <Informatik>Interface <Schaltung>DefaultComputerschachMathematikBildschirmmaskeMaßerweiterungService providerEuler-WinkelClientSkeleton <Programmierung>ThumbnailVorlesung/KonferenzComputeranimation
39:55
Interface <Schaltung>Nabel <Mathematik>VarianzDienst <Informatik>ProgrammbibliothekBildschirmfensterBitPhysikalisches SystemMetropolitan area networkSkriptspracheCodeTemplateSystem FPlug inDefaultVorlesung/Konferenz
41:11
Nabel <Mathematik>Interface <Schaltung>VarianzDienst <Informatik>Logik höherer StufeEuler-WinkelMetropolitan area networkPortscannerReelle ZahlProgrammierungComputervirusDienst <Informatik>DefaultVarianzProgrammbibliothekSpielkonsolePhysikalisches SystemGemeinsamer SpeicherComputeranimationVorlesung/Konferenz
41:59
MathematikBildschirmfensterKonfigurationsraumElektronische UnterschriftElektronische PublikationPlug inTemplateClientComputeranimationVorlesung/Konferenz
42:50
Dienst <Informatik>ServerDatensatzFormale SpracheFrequenzBefehlsprozessorBitFunktionalZahlenbereichKonfigurationsraumParametersystemLuenberger-BeobachterPoisson-KlammerSkriptspracheAvatar <Informatik>MultiplikationsoperatorRechter WinkelDienst <Informatik>CodeInverser LimesLastSpeicherabzugServerRPCHidden-Markov-ModellSchreiben <Datenverarbeitung>Computeranimation
45:35
Metropolitan area networkFormale SpracheInformationSchaltnetzBitEinfach zusammenhängender RaumMultiplikationResultanteDatensichtgerätE-MailVerschlingungKonfigurationsraumProzess <Informatik>Coxeter-Gruppep-BlockZellularer AutomatElektronische UnterschriftPunktEreignishorizontVorlesung/Konferenz
46:44
Personal Area NetworkMetropolitan area networkOISCW3C-StandardLokales MinimumReelle ZahlInterface <Schaltung>NP-hartes ProblemDatenbankInformationBenutzeroberflächeSoftwaretestArithmetisches MittelBitDatenloggerInverser LimesMultiplikationPhysikalisches SystemSpeicherabzugKonfigurationsraumServerCoxeter-GruppePunktFramework <Informatik>Trennschärfe <Statistik>MultiplikationsoperatorBenutzerbeteiligungGeradeQuadratzahlKlasse <Mathematik>SchnittmengeElektronische PublikationEinsComputeranimation
49:29
W3C-StandardZustandsdichteDemo <Programm>InformationIntegralVersionsverwaltungProzess <Informatik>AuthentifikationBimodulFramework <Informatik>DifferenteMultiplikationsoperatorBenutzerbeteiligungApp <Programm>StabMAPIndexberechnungBasis <Mathematik>QuaderEndliche Modelltheorie
50:33
GammafunktionMetropolitan area networkLokales MinimumARM <Computerarchitektur>VerweildauerAppletMehrwertnetzLogarithmusMachsches PrinzipServerPasswortDienst <Informatik>App <Programm>ZweiEinsInformationComputeranimationVorlesung/Konferenz
51:49
Metropolitan area networkAppletFormation <Mathematik>GammafunktionVerweildauerMehrwertnetzCloud ComputingARM <Computerarchitektur>LogarithmusLokales MinimumPortscannerBenutzeroberflächeObjektorientierte ProgrammierspracheGrenzschichtablösungBildschirmmaskeDivisionVisualisierungZahlenbereichZentralisatorKonfigurationsraumRichtungBildgebendes VerfahrenBitProgrammfehlerMini-DiscMultiplikationsoperatorComputeranimation
53:13
MereologieComputersicherheitEndliche ModelltheorieMultiplikationsoperatorInterface <Schaltung>DatenbankParserBenutzeroberflächeBitResultantePasswortBenutzerbeteiligungVorlesung/Konferenz
54:41
Metropolitan area networkGammafunktionGroße VereinheitlichungBildschirmsymbolLogarithmusSpeicherbereichsnetzwerkRegulärer Ausdruck <Textverarbeitung>PortscannerStabTopologieMAPBitEreignishorizontDienst <Informatik>AggregatzustandVerkehrsinformationComputeranimation
55:27
Demo <Programm>Metropolitan area networkBildschirmsymbolMehrwertnetzInklusion <Mathematik>ZustandsdichteRückkopplungW3C-StandardRechnernetzExogene VariableGruppenoperationURLSmartphoneBenutzerbeteiligungBildschirmfensterBitPhysikalisches SystemKonfigurationsraumDistributionenraumDämon <Informatik>Inklusion <Mathematik>Elektronischer ProgrammführerDemo <Programm>NetzbetriebssystemTropfenQuick-SortEndliche ModelltheorieBestimmtheitsmaßXMLFlussdiagrammComputeranimation
57:01
FeldgleichungSoftwareMomentenproblemProjektive EbenePunktVorlesung/KonferenzComputeranimation
57:48
ZustandsdichteSoftwareOffene MengeFreewareComputeranimationVorlesung/KonferenzXMLUML
Transkript: Englisch(automatisch erzeugt)
00:08
So welcome anyone everyone I think it's time to start and Welcome and thanks for to show up here. I want to talk about Isinga today and Maybe why you want a favorite?
00:22
instead of Naryos or considered choosing Isinga migrating to it, whatever First of all, who am I I'm a consultant working for a German company called Netways where the one of the driving forces behind Isinga and I joined the Isinga team back in 2012 and
00:44
my main Kind of motivation there is to bring a bit of organization into Isinga to how everything works how to plan what features we would do and in terms of our project and On my private side. I'm also a packager for Debian. I'm the one who brought Isinga to to Debian and Ubuntu
01:05
With help from a few others and Yeah, find me on Twitter as well so First of all, who is using Isinga 1 at the moment if any Okay, quite a few Isinga 2
01:22
Yeah, yeah, yeah, yeah Okay It won't be a too deep Too deep intro into What do you do with Isinga but quite a brief overview So our target as I think it's called DevOps, right? Okay
02:04
Okay The target of our efforts is to bring something we call an open source enterprise monitoring solution That Means it should be scalable. It should work in very big environments. It should make it easy for you to use it there, but
02:24
It should be as easy as to use for a small home user I myself have like two servers one at home, which is like a file server server at German data center just to play around with staff a few web apps web WordPress and so on and
02:44
For a big enterprise company and that's what we want to do with our focus There are a lot of people involved in Isinga. A few are doing packaging stuff For Red Hat April Even free BSD not my kind, but if you want to use it
03:04
but we are always you are looking for new people to get involved and Even if you just want to contribute you find something in a documentation that doesn't fit you don't understand you might Could write it better send us a pull request. You're welcome and
03:21
Where we started actually was way back in 2009 When Nagios development Was a bit frozen and it still is kind of and we wanted to make it better so The idea came up to forget to make
03:42
Ourself kind of the developers of Isinga and a few guys did and I think Our main focus back there was to fix bugs to make small adjustments just to improve the overall behavior of Nagios back then and
04:01
We came a long way since and I think in 2012 we started working on Isinga 2 and Until the first week what we consider stable version took us almost two years, but I think It's quite interesting. And of course we won't
04:21
Not only want to provide you with a nice code that checks things but also web interface where you can see what is your Environment doing it's a server working so In our team we have some approach for two track development currently. We still have the old Isinga stuff
04:43
Which is basically Isinga 1 The web interface of the so-called classic UI with us and the Isinga web 1 interface Which we used mainly on bigger installations Gets quite a bit complicated on the other side. We have Isinga 2 and Isinga web 2
05:04
Both of them are still supported and I think Isinga 1 will be supported for a few years to come But maybe you'd like Isinga 2 better So Maybe a few of you never had to touch something like Nagios or so, so
05:25
What do you want to achieve with Isinga monitoring? We want to monitor everything basically We want to monitor if hardware is working, if a web service is working, if a SQL database does what it should and We want to do that in a regular interval. So like every minute does it work? Does it work? Does it work? And
05:47
The idea is To prefer active checks in any way. Active checks means the Isinga core runs a command that reaches out to the target system Verifies that it's working and tells Isinga. Yeah, you're fine
06:01
So we are gathering all the status and Saving it for you Including collecting some performance data Which is like CPU load Disk usage, whatever you would might have We want to modify people
06:22
On every channel you would like to know. Maybe a mail, maybe SMS, maybe a cheddar message of some kind We want to provide a way to set dependencies So Isinga knows, okay, that's a server in that data center and if the router there is down Because it's not responding anymore. Maybe the whole data center is down and
06:44
Maybe I only want to get notified for that one router that's broken and not every 3,000 hosts there So that should be as easy as possible And in addition we want to support add-ons
07:01
So all the data we know like stuff that changes performance data of any kind we want to pass to add-ons for them to save it and We got a view a lot of integrations nowadays We can forward to lockstation graylock We can send metrics to graphite, openTSDB and influx DB or whatever
07:22
supports like a graphite interface to send metrics and We want to extend that support in the future. So every tool that makes sense to integrate just let's do it Like of the lockstash approach with like 1,000 inputs and 3,000 outputs, but let's see where it goes
07:43
So where we add with Isinga 2 at the moment, we are at version 2.3.8 Which is like the third major version on the 8th bug fix version of that currently. This was just released in July So we're even working on 2.3.9 at the moment some smaller bugs
08:02
Some windows stuff And the main feature of Isinga 2 is it has been completely rewritten from scratch Now our efforts to Make Nagios better. We decided maybe it would be better to avoid trademark issues or trademark
08:24
Questions in the future and Of course improve the quality of the overall code, so it's now C++ and some boost We could have written it in Ruby or Java, but well C++ is quite cool
08:41
What we wanted to keep from Nagios is this the ideas behind it because they're pretty good easy and In addition, we want to enable users to use Isinga 2 in their environment. So there's a puppet module There are Chef receives, there are Ansible playbooks who set up Isinga 2. Some of them are
09:03
Managed by members of the Isinga team. A few of them like I think Ansible and Chef are made by guys That just are interested in it, but still can contribute the code to our Git repository And of course packages and vacant boxes are available So everyone can just start setting up without worrying about compiling
09:23
a binary or Extensive library like boosters. Hello computer, okay
09:43
Before we say what is better in Isinga, what is good about Nagios? As I said monitoring is easy. You just have to install it Configure a few hosts, configure a few services and you have a basic monitoring setup The stack is so simple. You just need to install
10:01
Nagios, no big dependencies, a simple web interface and You're done with your basic setup and everything you need to do is now describe to Isinga How does your environment looks? What should be monitored? And we wanted to keep that Those active checks I said before are really powerful because it avoids dependencies
10:26
Many monitoring solutions rely on some component somewhere to send the central monitoring system status we want to avoid that wherever possible and Want to give you the way
10:42
To do it centrally or ask the system centrally. Nagios has a pretty huge community especially USA Still today So if you search for Nagios problems or questions, you might get a lot of answers out there And of course all the plugins that exists on the internet for basically a lot of vendor stuff a lot of basic
11:07
default standard plugins They are just easy usable just a bash script few lines Some check output and a return code, that's all we need So, but why go to Isinga?
11:25
I thought a lot about including that slide here Because I want to avoid to bash on Nagios, so that's the only bashy like term, but it gives you an idea about What the focus of Nagios is at the moment. That's the download area on nagios.org
11:46
So their open source website and it tells you pretty much because Nagios is an open core project meaning you have a Nagios core. That's pretty nice and working But all the other stuff they want to do around it
12:01
you have to pay for it or it's at least they want you to pay for it and You notice maybe that small paypal buy now button for a student VM You can get our wake-on boxes for free At any time So let's start
12:22
Talking about Isinga Our main goal is and even as a company of NetWaze behind that We want to be 100% open source 100% free software. That's really important for us to support a community and
12:40
I think we're That's quite well received at least I hope So we welcome contributors. We have an active community support a lot of people even not directly from the Isinga project Are talking to users that have questions There are a lot of channels just shown on the website
13:03
So Another problem with Nagios and that's maybe something you Wouldn't really see in your home environment or a small company is scaling I Started like three years ago and Hadn't really a big idea about what Nagios is inside
13:22
And if you start reading the code and understanding it there were big mistakes made in the very beginning and if you had a very really big setup, you would notice that quite early because There's only a single loop doing jobs. So it's like a scheduler. That's just running running a while loop and
13:42
executing checks doing notifications doing Status updates and that costs a lot of CPU time, especially because it's only runs in one CPU core at a time and That has been there for a lot of time. It works pretty good
14:00
But there is a Limitation level you might reach at some point and what I personally noticed if you have an IMD processor Which like the opto runs with about 24 cores or more Each core grows smaller and Since it can utilize only one CPU core it gets slower and slower
14:22
the more CPU cores you have so large installations are pretty difficult with that even like 10,000 services you might have at a mid-sized company can give you a pretty trouble and of course the external interfaces we're talking here about
14:45
Have pretty Nasty problems you wouldn't you wouldn't see in a small environment So our goal was to go multi-threaded from the start Be able to load To do a lot of stuff in parallel to avoid cluttering to avoid plugging in any kind
15:05
And we wanted to be able to distribute load about a cluster just automatically and In our tests We run like you 1 million checks in one second and a lot
15:22
Of course I had a benchmark slide in here and it's it's a pretty How should I say? nice statistic but There is actually no limitation and you sing it to how many checks advance The limitation is how many checks can handle can be handled by the server
15:41
If you have some check that runs the extensive PHP library or Java or whatever You wouldn't of course be able to run thousands of checks every second But if it's just a simple shell script like a SNMP notification or whatever it can be done and
16:02
Who of you used life starters in the past or know what it knows what it does Yeah Basically Modules in our years were pretty nasty So There wasn't really a real library to link against you would have to take a few header files
16:24
compile a module against it and Yeah, it works. You can do events with it you can access data with it, but It's not a very good solution to to access data because
16:43
It can get very really slow if you do a lot of queries and it's very complicated to install so our goal was to go modular and These names are all modules inside a singer to at the moment. There are a few smaller ones as well
17:04
But all the idea idea is to have modules that do their stuff can access internal information of the cluster Can talk to each other Exchange status data of any kind and Just the module
17:21
Recite and doing what it should do so We have a module that Check stuff bronze checks how you configured it I have a module that cares about notifications if the check results changes. We will send you a notifications in that module We have compatibility outputs
17:40
For all the interfaces we have a life starters module that goes brings you the same interface the old life starters integration Brought you but just here inside the core we can write our perf data We have cluster connections and we can write to those various external tools and this modules are pretty pretty easy and
18:03
if you want to wonder maybe send events not to a gal for output, but to a Commercial tool of any kind you can take that module change it sent HTTP stuff somewhere and you're done And what we are currently working at the new API
18:22
That should be done till November and will be included in this 2.4 That brings a lot of additional interactivity with the core So in all the modules we want to give you a way to enable them Just easily and just configure it. So
18:43
we ship that modules only a few are enabled per default and if you want to use life starters enable it restarting the two and you're done and If you use IDO for status output database the only thing you might have to change is to go to the config file and
19:04
Change the database password or change the host name where the database resides or whatever So if we look at a cell for a moment All you have here is a lot of configuration files and
19:23
If you configure the Apache on on Debian based system Debian or Ubuntu in the past that's pretty similar to what we that here we ship a bunch of default configuration files and that available for time directory and Only thing it does is to set a sim link here. So if I do the thing
19:48
to feature enable life status Only thing it does is to set that link
20:02
You could have a look on the config file Everything it does is to instantiate a module called life starters if you would like so you can add a parameter where the Unix socket for life stardom should be and It will be there after we start that's the wagon box. Yes
20:36
So there we have a life shuttle socket and that's me there. We want to go
20:40
Make it easy for user provide all the stuff support as much it makes sense and Just allow you to use it and not compiling weird modules from someplace of course there were packages in the past, but Actually we had a single one had pretty much problems with those modules in the past because
21:06
Every time we try to To the change internal lists in a single one just for adding a new field because we want to provide it a new field for starters we had problems with that modules because they were breaking and
21:21
Now we have a simple module that provides life starters that can Do all the queries and if we change it We just have to make it what one or two lines to life starters, and that's all we need to have a user support for it So let's talk about clustering
21:51
I'm sure not sure how many of you have ever tried to make nagios high-level a See nothing. I know that face
22:04
Nagis was never intended to be high available So you have that one core the right status stuff stuff pretty good but when you try to make it even like a Cold standby system or just have a second system that can take over the work when the other system is down. It's pretty hard
22:23
So you would have to build a lot of tools around it and not even not just pacemaker But really files to transport config files to transport Stay to starter which gets updated every like 10 seconds or so and That's pretty hard, and I did it in the past a lot for customers, but it is not really fun
22:45
so We thought about how can we make it better and not only how highly available but? scaling horizontal and vertical so be able to have a DM set checker to have a
23:02
Company a country continent whatever, so That's what we have We can split the configuration so it knows okay, that's checked central. Let's check somewhere else and that enables scaling To avoid all the latency you have to go through to execute checks
23:24
Just run them on site and trust jump transport the status to the central system And if you go horizontal you can set like Two or three hosts of your central system. They are high level high available and They distribute checks
23:41
So you have two hosts every host us 50% if you have more it gets split on the arm of notes and singer to That we call it master settle as an agent Or that this is basically a singer to everywhere, but your configuration differs, so you have a centralized instance
24:05
That does local checking You can have another one besides it you have a remote a singer somewhere else that does Network checks for the whole location, and you can install a singer to on every single host
24:20
But they sing a tune every single host would just act as an agent Connecting to the central cluster and allowing the cluster to tell them please run that check and give me the result Yes
24:44
Yes There are multiple ways to it you can do them at the central you can put your conflicts everywhere There's a tool inside the whole cluster stuff that called That is called repository
25:01
That makes the central you sing a to instance available to talk to all the stuff to all systems collect their configuration and Provide the central instance with the knowledge. Where is what and what's what object should I know for my state? But what do you always recommend?
25:22
That's the conflict tile about here You can put your configuration on a central instance and The internal Ipi can replicate that configuration. So you have a bike if you have this system with two masters
25:41
the configuration is done on this system and When you change stuff here, we load a singer to here This will tell this system The new connection is here. They will compare their configuration state on file and if he hasn't had to change yet, he will just copy the file reload itself and
26:06
have the new configuration and that goes down to the agent and We try to keep it as simple as possible. So it's all files all just configuration on disk and it's just replicator
26:21
and that makes it pretty easy if you experiment with a little bit, so another problem we always seen in the in the in Nagios sphere of stuff a Security is pretty hot so
26:40
You had an IP in the back Which allowed you to run? execute plugins on a remote host of any kind and That is not secure It's Problem shell injections are possible. It's only security based on IP address whitelist
27:03
The encryption NIP is pretty much not existing it's more like a scrambling of communication with a compiled in private key And NSCA is basically the same way but any say stuff that sends a check result to central instance
27:24
So, of course you could do checks via SSH or whatever But that makes other problems so There is single to clustering forces you to use TLS and TLS means for us not a scrambled connection with some
27:43
pretty private key but certificates I know it's really hard with open SSL to do certification stuff to create something like a CA Create certificates. It's not easy That's why we prayed you provide you CLI commands to it
28:04
So if you have that central single server, just create a CA there create certificates You have to bring only CA certificate and the host certificate to the target system and that's all you need it now or
28:20
If you have something like puppet that already has a CA infrastructure in place just reuse their certificates works No You think a tool that the demon itself is also CLI to and
28:40
It does all the stuff for you internally Basically, yes So you think that who has that Pick a sub command which can create a CA create a sub
29:04
Create a certificate sign a certificate request and If you use the there's something called a node wizard just basically a command line tool to set up a new node there's also way to
29:20
Connect to the central server request a certificate from there and You're done That's what the last point is ticket Ticket in our world is basically a password that Helps to alter indicate you to connect a new client
29:42
So enter that password that password is configured in the center instance easy. Okay, that is valid for me. Here's your certificate
30:00
Hello, my PowerPoint doesn't like me anymore. Oh, sorry LibreOffice. Sorry Sorry, okay
30:29
Never seen impressed crash before that was first. So I
30:47
Want to show off a few configuration stuff we can do Mmm, if you don't get what this what the configuration does here. I will upload the slides afterwards and
31:03
There are a lot of examples and documentations, but I wanted to show you How I do configuration usually how it might help you so Gets back tonight is first It it was called before configuration tricks
31:21
So it allowed you to assign a SSH tag to a white list of host names It allowed you to make host groups Maybe assign checks to that host group But there was a lot of work and there was a lot of transparency How it actually works and I know a lot of people that never understood it. So we brought logic
31:50
and Let's just say we want to have a service SSH that runs SSH check and
32:01
We want to apply that service to every host that is a Linux system Because the host has attribute OS at Linux and Where is some address configured but not on hosts that are test systems maybe so and
32:25
That goes further and further So you can do matching on host names even array lists of host groups You Can do your own role definitions you can filter on other
32:41
attributes that describe your environment and Only you think you think a tool care about okay, I have a new host. Let's see what host group might have So we have a lot of configure passing in back, of course But it's easy for you to configure it. So let's just imagine hosts and
33:03
Sorry about the domain I couldn't resist myself Let's say we have a host somewhere and In the internet that's doing stuff for your company and it has an IP address And
33:20
We want to describe it. So it's a production system. It's a web server It's in a data center. We call number one, for example the application is blood was shot amazing and Who is responsible for it application support so and you can even do your
33:43
Do it easy for you. Can I read templates that describe your hosts? So have a something like a web server default that just sets production web server Do all the host specific stuff in the host? and just include the template on the host and
34:00
Now let's add services How about monitor HTTP, HTTPS on every node that is a web server Or whatever fits your company so That's all you need to do to monitor How about notifications
34:23
Let's say we want to be notified Out of work hours if a production system goes wrong So we said You want to apply a notification? That's the name of the notification to all hosts
34:42
the host must match that definition so only all production systems You want to be notified by mail that's? configured in the template here we want to send a notification to all users that are in data center on call and Want to be notified only on out of our girls and
35:06
If you configured notifications before You had to adjust a lot of contact groups on single nagas objects. That's the only object you need and the whole configuration Everything else is done
35:21
Just on resolving the configuration for you So on the same goes for services let's say we had that brother was shot before and Now we want to be notified or notify the application support team when it goes wrong again only Production only after work hours, and now we want to send the mail to app support on call
35:43
Behind that user groups are users the user might have email pager number jabber address Whatever trima secure idea no idea, but The user is
36:02
only there for be defined as a user to described as a user and The information is used here when defining notifications Yes If you have a shell script that can do it
36:21
That's that's nothing we really care about because we give you the means to notify In back is run some kind of command If it's a sent SMS via the local install GSM modem Or a mail is sent to the local mail server and forwarded to three other mails. We don't care about
36:41
Just need to write a shell script give him the parameters and done and Every magic you wanna do is in your script so Dependencies should be easy Would never wear with Nagios so
37:00
Again, if you have a location number one data center somewhere. I want to define that every host that matches My rule up down here Gets a dependency so the host targeted by that query as router as a parent and
37:21
If that router goes down It's not available for us that means that all dependent hosts are no longer checked because well, we can't check it and We want to get notified for every dependent host So in theoretical you should only get a notification that host is broken, but of course tune it like you want
37:44
so What is not really easy to explain here? How we run commands I seen a lot of user not understanding what we're doing We wanted to provide you an interface
38:03
To describe how a script check notification, whatever can be executed what arguments and supports and Give you a simple way to set this argument So that's for example some pretty window check you might have in your environment
38:21
It supports a lot of Arguments maybe like a host name as an AP community remote what the check should check on the server a warning and a critic limit So you tell a singer to where is the script what arguments does it have and
38:41
maybe default values for this arguments, so you can overwrite them later and Only thing you need to do if you define a service like a fancy vendor test Telecinger to which check amount it is and add bars attributes and
39:04
every attribute that check command is Knows about or tries to find gets reused so I can override the public community you've seen before I can set the Mode switch the warning switch whatever and The magic behind it all that stuff is shell escaped
39:25
But it's not possible to inject every Bash quoting whatever command you might think about because we will have an interface It tells him exactly which arguments. They are so we can make a proper arguments chain and
39:43
Of course if your script is Vulnerable to an SQL injection of any sort We can't fix that Sorry again
40:01
Sorry, that's our it should be critical Yes Talking about Any any executable on a Linux system or basically any executable on Windows or whatever systems you're on It only relies on the so-called nagas plug-in RPI
40:23
Which is run a script It echoes some stuff and it has an exit code same as the old nagas Yes, yes, yes Exactly same plug-in API, but a safe way to use them without writing a fancy command line and
40:47
If you want examples for check commands or Don't want to write your own. There's a Template library called the ITL which we ship in our packages that provides most of the
41:07
It's a bit small film up here Okay Yes, so the ITL provides a lot of default check commands for all the default nagas plugins for Publix or popular stuff like the MySQL has checks from console or whatever
41:25
so there's a lot of in there there's also contributions from commonly used checks and So we we have all in there you would need for basic setup And you just have to define the services set the vars and it runs yeah where to find the ITL library
41:49
If you have a system, it's you It's installed by the package is it is not meant to be changed by you that's why it's in user share
42:03
So if you want to improve a command change it for you either copy it rename it or Import it as a template and change stuff to it and Inside the commands plugins is for the mana Boulon as in the people against some windows plugins
42:24
NS client plus plus checks everything is in there and singer to includes This file ITL in its main configuration file. You can also disable it if you don't want to use it
42:40
so one more thing a Singer to is enhanceable and that's how many thing we included in 2.3. Which is like We've gone a bit further with the configuration stuff. So
43:02
You can you can use functions in there define your own functions and It's a pretty heavy for starter, but just to show an example Instead of assigning a fixed number like a load of two or so to that war
43:23
We give it a function we define a function for that that's what the double brackets do you could also write function Brackets like in other languages and that in here is Something like a script code in the configuration language
43:41
What it does is every time that check is run the function gets evaluated every time and So you can do stuff like hmm are we in a time period nine to five Then use 40 if you're outside you 60
44:18
Okay Remote
44:22
Is easy on one server combining multiple servers is hard The idea is to have to set the the parameters centralized But if you have the knowledge how many CPU cores that note has how many CPUs they're in
44:43
You could set that as a host parameter Maybe from a puppet fact or Any other intelligence you have at your company and then set the load here to say okay that limit multiplied by the CPU course
45:10
Mm-hmm, okay question was to check stuff in a cluster
45:31
Yes, yes, yes You can do that stuff you can connect multiple status informations inside
45:43
Isinga to I think there was an example in our block a few months ago But it still resides on Running multiple checks on multiple endpoints and Have an additional check that does the combination
46:00
But you can do a lot of internal stuff in there. You don't even have to run a check command you can calculate the result inside that configuration language by comparing stuff from multiple nodes Display a shared result, but it's too complicated for that's for the presentation But there is a lot possible. Yes
46:25
Yes, yes Yeah, you could do that stuff internally and I seen you can do business process inside my senior tool, but it's a bit more complicated to explain now
46:40
I seem to get the link for you, and if you get me your email later, so Last point in my presentation Is web interfaces I said Nagios is open core, and I think that's They were worked at a bit for Nagios 4, but that's basically the web interface you all know from the past and
47:06
It's still there. It runs pretty good We Tried to improve it a lot with a single one That's why we called it the classic UI did a bit of multi selection multi command features Make it prettier
47:22
but it still had a lot of limitations and Even if you have ever experienced a single web one I had pretty much fun to make Debian packages for it But it worked yeah, it's pretty complicated It's a lot of setup. It's JavaScript heavy. It's a configuration nightmare. We wanted to make it better, so
47:50
And one of the biggest drawbacks we ever had is To make it easy for users to use them So with the classic UI it was just there it worked there, but
48:04
It was pretty hard to get a user to modern our web interface so We thought about a single web 2 and It's a new framework it's Responsive meaning it should work everywhere and
48:24
It's made like setting up in a few minutes Only drawback for you it requires some database and It's Quite hard to explain especially for small users why they need a database where they need to bother with my squirrel post quiz
48:44
for the two three server setup and Nagios Had that magic status dot to transport status information all of the core and the web interface always read it displayed your information, but it's pretty ineffective at some point and
49:05
The main drawback was always history so if you have a small setup at home And you wanted to say okay, but what did my system three days ago, and I guess would have to pass a test a text log file For that time and depending on how big it is it would but get slow so
49:22
We recommend you to you. Please use the database. It's really easy it can be set up within minutes and give it a try so a single web 2 Actually is modular We spend a lot of time on the base framework
49:42
And that's why it's still in release candidate, and we're still working on it and fixing bugs. We wanted to provide you a framework To do your stuff in it And it's not restricted or monitoring information so the monitoring stuff here is even a module There are other modules to come we're working on and
50:03
Try to release over the next month like business process graphite integration PNP integration log stash and Even our own documentation runs inside a single app too, so if you read the single to documentation It is actually running in a single app to for you
50:21
The only only difference with the online version it and not authentication is disabled That's a big difference so Without Explaining anymore, let's have a look on it So the main idea is to give you a quick overview about the situation
50:49
That's you So let's say you open a single app to you have locked in used your magic password against some L top server Whatever you might want to use
51:01
And you see what's happening What are the service problems there is a host That is critical for like three minutes 40 seconds 41 seconds 42 seconds You have a overview about services that have recovered in the past so I
51:20
Think Wi-Fi is stable since 1407. That's great That's all the information if you want our new exactly what is behind it, okay? Click on it have the information Go down you see three notifications have been sent for this issue the last one one minute 47 seconds ago
51:44
Who is responsible contact? oops But check command is behind it where has the check been executed? So our main approach is to give you overview as fast as possible without a tricky web interface without some
52:01
Some old configuration and just give you an image about what's happening and Make it easy for you to do stuff, so if you run away reschedule a check or check it now just click a button and Not to with a fancy form and the date and put whatever
52:22
so You wanted to make it easy To just to have a few clicks to go so your web interface have overview So look at the disk stuff My home director is a bit full
52:40
Maybe have a visualization for the performance stuff That is collected 67 notifications have been sent for that issue and And Also give you a way To view the history of a check pretty easy just click on history and
53:03
I noticed there's a bug with the sorting in here, so don't worry You Will have a historical overview about Status have changed notifications have been sent down times have been said acknowledgments have been set in one few easy Just from the database
53:22
So yes That's still to come Displaying the command line in the web interface is not that easy We included that in the classic UI with the singer which is basically
53:41
Cheapmates command-line parser What we want to do in the future is to connect the singer web to the new API directly So you can ask a singer to what has the last command line been? Reason for that is also security We can filter out wars
54:02
That's called that are named as an MP community that have password in its name So a use of the web interface won't see your password of your application but if you just have the command line somewhere here, it won't be secure and That what that's what we're trying to do in the future. It is possible via internal commands currently for debugging reasons
54:25
And it would be you can see it over the IPI later Because it's the last status result also shows the last executed command line that's to come but not yet done and What I want to give you on the way is just just experiment with it a bit have a look at fuse we integrated here
54:50
We play around a bit with fumes that gives you insight about when when events happened a lot Reporting stuff Just play around with it a bit and
55:04
It should give you a lot of all of you Where was that yes Well, they're like pretty neat is that service grid tomato go away. I'll go It shows you where on which host Services are in our state on a grid so you can see what is happening
55:26
All what's problematic currently? so And I said it's responsive That's my smartphone. I just give it web access or VPN or HTTPS from somewhere And you will have the the same starters on them the same action the same URL one click check
55:45
Acknowledge it just should work so Inclusion at Where to start if you want to try it read our docs. There's an introduction guide How to install it how to where to start?
56:02
there is a big chapter about migrating from a singer or marios That's the biggest drawback you might have rewriting your configuration, but I promise you you would like it and Again, we can VMS if you just want to start up a VM have a send-off with
56:25
You sing a to running inside a bit demo configuration. I had here my system. It's just set up within a few minutes Packages for the most most of the common operating systems or distributions not to say There's even a Nagio sorry a Windows agent installation as a setup install
56:44
you recommended only for agent use, but it would run like a singer to demon and They around with a singer to a single up to and you would be happy to have you among our users and
57:00
there are big names on it and Most of them are still using a single one But we are running projects with a lot of customers at the moment we're knowing a lot users that wanna go to a single to big users and We're glad that our software's maybe maybe we're doing the right thing at some point so
57:28
If you are from the US or going to the US we have a senior camp in Portland just over the puppet conf and We plan to make a Camp happen in Berlin next year
57:42
It's still about details are still open, but I think it's March 1st So happy to see you there, and thank you