We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Why favour Icinga over Nagios?

00:00

Formale Metadaten

Titel
Why favour Icinga over Nagios?
Serientitel
Teil
17
Anzahl der Teile
79
Autor
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
We try to explain some of the problems Nagios has had for years, what the differences to Icinga are, and how Icinga 2 can ease up monitoring in small, as well as really big environments. Markus Frosch
1
Vorschaubild
09:05
11
23
Vorschaubild
1:03:26
26
Vorschaubild
1:01:01
30
Vorschaubild
58:05
31
Vorschaubild
53:11
43
60
Vorschaubild
42:31
62
77
Vorschaubild
10:59
SoftwareOffene MengeFreewareEreignishorizontSelbst organisierendes SystemSoftwareentwicklerMultiplikationsoperatorSelbst organisierendes SystemEDV-BeratungBitForcingTermHilfesystemTwitter <Softwareplattform>FaserbündelProjektive EbeneAutomatische HandlungsplanungXMLUMLVorlesung/KonferenzComputeranimation
Offene MengePhysikalisches SystemMomentenproblemZellularer AutomatProgrammierumgebungOpen SourceUnternehmensarchitekturServerDatei-ServerComputeranimationXML
Offene MengeMetropolitan area networkDiskrete-Elemente-MethodeInterface <Schaltung>W3C-StandardVersionsverwaltungStochastische AbhängigkeitGradientPhysikalisches SystemWeb SiteSpeicherabzugFormation <Mathematik>Datei-ServerParametersystemFokalpunktDienst <Informatik>RechenzentrumStabBenutzerbeteiligungWeb-ApplikationUnternehmensarchitekturCodeBenutzeroberflächeProgrammierumgebungVersionsverwaltungServerProgrammfehlerStrömungsrichtungWeg <Topologie>Fitnessfunktionsinc-FunktionSoftwareentwicklerBitQuaderVorlesung/KonferenzComputeranimation
Regulärer GraphEreignishorizontKlasse <Mathematik>Installation <Informatik>BenutzerbeteiligungInterface <Schaltung>BenutzeroberflächeBitMaschinenschreibenEndliche ModelltheorieDatenbankHardwareLastPhysikalisches SystemSpeicherabzugMini-DiscWeb ServicesMessage-PassingQuick-SortSystemaufrufRuhmasseRegulator <Mathematik>Vorlesung/KonferenzComputeranimation
EreignishorizontWeb logWurzel <Mathematik>AdditionRechenzentrumServerRouterIntegralMetrisches SystemSichtenkonzeptInterface <Schaltung>Graphische BenutzeroberflächeWorkstation <Musikinstrument>Vorlesung/KonferenzComputeranimation
IntegralDesintegration <Mathematik>QuaderCodeIdeal <Mathematik>VersionsverwaltungEin-AusgabeMomentenproblemDämpfungGebäude <Mathematik>BildschirmfensterStellenringStichprobenumfangQuick-SortAdditionQuaderBimodulCluster <Rechnernetz>ProgrammierumgebungAchtProgrammfehlerVorlesung/KonferenzComputeranimation
Desintegration <Mathematik>QuaderCodeCodeProgrammbibliothekMaßerweiterungQuaderDokumentenserverEinfache GenauigkeitVorlesung/Konferenz
SoftwareKeller <Informatik>Funktion <Mathematik>BildschirmfensterProgrammierumgebungBildschirmmaskeGeradeGruppenoperationPhysikalisches SystemZentralisatorInternetworkingZusammenhängender GraphNotepad-ComputerFächer <Mathematik>Installation <Informatik>Plug inDienst <Informatik>BenutzeroberflächeKonfigurationsraumSkriptspracheStandardabweichungDefaultComputeranimation
GammafunktionMetropolitan area networkSpeicherabzugt-TestVerweildauerDualitätstheorieOffene MengeLokales MinimumOpen SourceLoopEinfache GenauigkeitSingularität <Mathematik>Interface <Schaltung>ZustandsdichteRechenschieberMomentenproblemFlächeninhaltFokalpunktOpen SourceWeb SiteBenutzerbeteiligungCodeMAPProgrammierumgebungBefehlsprozessorInverser LimesLoopProjektive EbeneSpeicherabzugCoprozessorPunktOffene Menget-TestMultiplikationsoperatorFreewareMikroblogSoftwareStereometriePhysikalischer EffektArithmetisches MittelKomplex <Algebra>SchedulingSchnittmengeComputeranimation
Interface <Schaltung>SpeicherabzugSystemaufrufDienst <Informatik>Interface <Schaltung>ProgrammierumgebungBefehlsprozessorInstallation <Informatik>StatistikSoftwaretestLastRechenschieberRechter WinkelVorlesung/KonferenzComputeranimation
Singularität <Mathematik>ATMBimodulBinärdatenVarianzModul <Datentyp>ComputerspielProgrammbibliothekInverser LimesMaßerweiterungServerNabel <Mathematik>SkriptspracheMatrizenrechnungEndliche ModelltheorieE-MailAbfrageBimodulElektronische PublikationEreignishorizontCompilerVerschlingungReelle ZahlVorlesung/KonferenzComputeranimation
Kompakter RaumGraphComputerspielDatenbankInformationMathematikFunktion <Mathematik>IntegralEinfach zusammenhängender RaumMomentenproblemResultanteSpeicherabzugZellularer AutomatKonfigurationsraumAdditionInteraktives FernsehenBimodulElektronische PublikationEreignishorizontPasswortInterface <Schaltung>EinsStabMAPAggregatzustandMereologieExogene VariableSummierbarkeitBeobachtungsstudieEndliche ModelltheorieVorlesung/KonferenzBesprechung/InterviewDiagramm
Total <Mathematik>SkriptspracheMetropolitan area networkDokumentenserverARM <Computerarchitektur>Euler-WinkelFontRemote AccessReelle ZahlSoundverarbeitungSpeicherbereichsnetzwerkSimulationPhysikalisches SystemVerschlingungKonfigurationsraumVerzeichnisdienstElektronische PublikationMultiplikationsoperatorDefaultSchnittmengeFigurierte ZahlComputerspielProgramm/QuellcodeXMLComputeranimation
Total <Mathematik>Metropolitan area networkSoundverarbeitungEuler-WinkelInklusion <Mathematik>BinärdatenDienst <Informatik>DivisionDemo <Programm>Physikalisches SystemKonfigurationsraumElektronische PublikationComputerspielBimodulParametersystemInstantiierungEndliche ModelltheorieSocketProzess <Informatik>QuaderDigitaltechnikMathematikGeradeGewicht <Ausgleichsrechnung>DatenfeldMailing-ListeMultiplikationsoperatorComputeranimationVorlesung/Konferenz
InformationProgrammierumgebungVerfügbarkeitPhysikalisches SystemSpeicherabzugWechselsprungZentrische StreckungKonfigurationsraumHeegaard-ZerlegungElektronische PublikationWeb SiteARM <Computerarchitektur>ZweiRechter WinkelAggregatzustandZentralisatorGüte der AnpassungServerElektronische UnterschriftRandomisierungURLComputeranimation
Bitmap-GraphikMetropolitan area networkKonfigurationsraumInstantiierungDifferenteSoftwarePhysikalisches SystemResultanteTUNIS <Programm>VersionsverwaltungObjekt <Kategorie>InformationStellenringZentralisatorURLDokumentenserverVorlesung/Konferenz
AggregatzustandEinfach zusammenhängender RaumLastPhysikalisches SystemKonfigurationsraumInstantiierungElektronische PublikationObjekt <Kategorie>Rechter WinkelVersionsverwaltungMini-DiscDatenreplikationTesselation
PortscannerEreignishorizontTelekommunikationEinfach zusammenhängender RaumForcingInjektivitätKugelPhysikalisches SystemResultanteInstantiierungComputersicherheitNabel <Mathematik>Dämon <Informatik>NetzadresseChiffrierungTLSPlug inDigitales ZertifikatArithmetisches MittelZellularer AutomatZentralisatorPublic-Key-KryptosystemDatenfeldMonster-GruppeComputeranimation
KonstanteVersionsverwaltungMessage-PassingCodierung <Programmierung>VerzeichnisdienstLastMAPSpielkonsoleFormation <Mathematik>Physikalischer EffektZentralisatorServerPunktClientPasswortDigitales ZertifikatInstantiierungVorzeichen <Mathematik>Programm/QuellcodeXML
EreignishorizontDemo <Programm>RechenschieberKonfigurationsraumSystemzusammenbruchBildschirmmaskePunktspektrumComputeranimationVorlesung/Konferenz
Dienst <Informatik>AliasingServerSoftwaretestProgrammierumgebungDatensichtgerätE-MailRouterAttributierte GrammatikInformationMathematische LogikRechenzentrumProdukt <Mathematik>SoftwaretestProgrammierumgebungGruppenoperationPhysikalisches SystemZahlenbereichE-MailSystemaufrufKonfigurationsraumFamilie <Mathematik>ServerInternetworkingTemplateComputersicherheitNetzadresseCAN-BusKartesische KoordinatenMailing-ListeAdressraumDomain <Netzwerk>Objekt <Kategorie>Einfache GenauigkeitDienst <Informatik>BenutzerbeteiligungDefaultApp <Programm>Topologischer VektorraumMAPLokales MinimumZellularer AutomatGüte der AnpassungFontCASE <Informatik>Formation <Mathematik>RadiusWeb SiteMechanismus-Design-TheorieComputeranimation
Metropolitan area networkRouterParametersystemDatentypDienst <Informatik>SoftwaretestModemArithmetisches MittelStellenringE-MailServerParametersystemNabel <Mathematik>SkriptspracheEndliche ModelltheorieAttributierte GrammatikRechenzentrumBildschirmfensterSoftwaretestProgrammierumgebungInjektivitätInverser LimesKette <Mathematik>TUNIS <Programm>ZahlenbereichQuick-SortAbfrageATMRouterEigentliche AbbildungSchlussregelURLDienst <Informatik>Interface <Schaltung>DefaultComputerschachMathematikBildschirmmaskeMaßerweiterungService providerEuler-WinkelClientSkeleton <Programmierung>ThumbnailVorlesung/KonferenzComputeranimation
Interface <Schaltung>Nabel <Mathematik>VarianzDienst <Informatik>ProgrammbibliothekBildschirmfensterBitPhysikalisches SystemMetropolitan area networkSkriptspracheCodeTemplateSystem FPlug inDefaultVorlesung/Konferenz
Nabel <Mathematik>Interface <Schaltung>VarianzDienst <Informatik>Logik höherer StufeEuler-WinkelMetropolitan area networkPortscannerReelle ZahlProgrammierungComputervirusDienst <Informatik>DefaultVarianzProgrammbibliothekSpielkonsolePhysikalisches SystemGemeinsamer SpeicherComputeranimationVorlesung/Konferenz
MathematikBildschirmfensterKonfigurationsraumElektronische UnterschriftElektronische PublikationPlug inTemplateClientComputeranimationVorlesung/Konferenz
Dienst <Informatik>ServerDatensatzFormale SpracheFrequenzBefehlsprozessorBitFunktionalZahlenbereichKonfigurationsraumParametersystemLuenberger-BeobachterPoisson-KlammerSkriptspracheAvatar <Informatik>MultiplikationsoperatorRechter WinkelDienst <Informatik>CodeInverser LimesLastSpeicherabzugServerRPCHidden-Markov-ModellSchreiben <Datenverarbeitung>Computeranimation
Metropolitan area networkFormale SpracheInformationSchaltnetzBitEinfach zusammenhängender RaumMultiplikationResultanteDatensichtgerätE-MailVerschlingungKonfigurationsraumProzess <Informatik>Coxeter-Gruppep-BlockZellularer AutomatElektronische UnterschriftPunktEreignishorizontVorlesung/Konferenz
Personal Area NetworkMetropolitan area networkOISCW3C-StandardLokales MinimumReelle ZahlInterface <Schaltung>NP-hartes ProblemDatenbankInformationBenutzeroberflächeSoftwaretestArithmetisches MittelBitDatenloggerInverser LimesMultiplikationPhysikalisches SystemSpeicherabzugKonfigurationsraumServerCoxeter-GruppePunktFramework <Informatik>Trennschärfe <Statistik>MultiplikationsoperatorBenutzerbeteiligungGeradeQuadratzahlKlasse <Mathematik>SchnittmengeElektronische PublikationEinsComputeranimation
W3C-StandardZustandsdichteDemo <Programm>InformationIntegralVersionsverwaltungProzess <Informatik>AuthentifikationBimodulFramework <Informatik>DifferenteMultiplikationsoperatorBenutzerbeteiligungApp <Programm>StabMAPIndexberechnungBasis <Mathematik>QuaderEndliche Modelltheorie
GammafunktionMetropolitan area networkLokales MinimumARM <Computerarchitektur>VerweildauerAppletMehrwertnetzLogarithmusMachsches PrinzipServerPasswortDienst <Informatik>App <Programm>ZweiEinsInformationComputeranimationVorlesung/Konferenz
Metropolitan area networkAppletFormation <Mathematik>GammafunktionVerweildauerMehrwertnetzCloud ComputingARM <Computerarchitektur>LogarithmusLokales MinimumPortscannerBenutzeroberflächeObjektorientierte ProgrammierspracheGrenzschichtablösungBildschirmmaskeDivisionVisualisierungZahlenbereichZentralisatorKonfigurationsraumRichtungBildgebendes VerfahrenBitProgrammfehlerMini-DiscMultiplikationsoperatorComputeranimation
MereologieComputersicherheitEndliche ModelltheorieMultiplikationsoperatorInterface <Schaltung>DatenbankParserBenutzeroberflächeBitResultantePasswortBenutzerbeteiligungVorlesung/Konferenz
Metropolitan area networkGammafunktionGroße VereinheitlichungBildschirmsymbolLogarithmusSpeicherbereichsnetzwerkRegulärer Ausdruck <Textverarbeitung>PortscannerStabTopologieMAPBitEreignishorizontDienst <Informatik>AggregatzustandVerkehrsinformationComputeranimation
Demo <Programm>Metropolitan area networkBildschirmsymbolMehrwertnetzInklusion <Mathematik>ZustandsdichteRückkopplungW3C-StandardRechnernetzExogene VariableGruppenoperationURLSmartphoneBenutzerbeteiligungBildschirmfensterBitPhysikalisches SystemKonfigurationsraumDistributionenraumDämon <Informatik>Inklusion <Mathematik>Elektronischer ProgrammführerDemo <Programm>NetzbetriebssystemTropfenQuick-SortEndliche ModelltheorieBestimmtheitsmaßXMLFlussdiagrammComputeranimation
FeldgleichungSoftwareMomentenproblemProjektive EbenePunktVorlesung/KonferenzComputeranimation
ZustandsdichteSoftwareOffene MengeFreewareComputeranimationVorlesung/KonferenzXMLUML
Transkript: Englisch(automatisch erzeugt)
So welcome anyone everyone I think it's time to start and Welcome and thanks for to show up here. I want to talk about Isinga today and Maybe why you want a favorite?
instead of Naryos or considered choosing Isinga migrating to it, whatever First of all, who am I I'm a consultant working for a German company called Netways where the one of the driving forces behind Isinga and I joined the Isinga team back in 2012 and
my main Kind of motivation there is to bring a bit of organization into Isinga to how everything works how to plan what features we would do and in terms of our project and On my private side. I'm also a packager for Debian. I'm the one who brought Isinga to to Debian and Ubuntu
With help from a few others and Yeah, find me on Twitter as well so First of all, who is using Isinga 1 at the moment if any Okay, quite a few Isinga 2
Yeah, yeah, yeah, yeah Okay It won't be a too deep Too deep intro into What do you do with Isinga but quite a brief overview So our target as I think it's called DevOps, right? Okay
Okay The target of our efforts is to bring something we call an open source enterprise monitoring solution That Means it should be scalable. It should work in very big environments. It should make it easy for you to use it there, but
It should be as easy as to use for a small home user I myself have like two servers one at home, which is like a file server server at German data center just to play around with staff a few web apps web WordPress and so on and
For a big enterprise company and that's what we want to do with our focus There are a lot of people involved in Isinga. A few are doing packaging stuff For Red Hat April Even free BSD not my kind, but if you want to use it
but we are always you are looking for new people to get involved and Even if you just want to contribute you find something in a documentation that doesn't fit you don't understand you might Could write it better send us a pull request. You're welcome and
Where we started actually was way back in 2009 When Nagios development Was a bit frozen and it still is kind of and we wanted to make it better so The idea came up to forget to make
Ourself kind of the developers of Isinga and a few guys did and I think Our main focus back there was to fix bugs to make small adjustments just to improve the overall behavior of Nagios back then and
We came a long way since and I think in 2012 we started working on Isinga 2 and Until the first week what we consider stable version took us almost two years, but I think It's quite interesting. And of course we won't
Not only want to provide you with a nice code that checks things but also web interface where you can see what is your Environment doing it's a server working so In our team we have some approach for two track development currently. We still have the old Isinga stuff
Which is basically Isinga 1 The web interface of the so-called classic UI with us and the Isinga web 1 interface Which we used mainly on bigger installations Gets quite a bit complicated on the other side. We have Isinga 2 and Isinga web 2
Both of them are still supported and I think Isinga 1 will be supported for a few years to come But maybe you'd like Isinga 2 better So Maybe a few of you never had to touch something like Nagios or so, so
What do you want to achieve with Isinga monitoring? We want to monitor everything basically We want to monitor if hardware is working, if a web service is working, if a SQL database does what it should and We want to do that in a regular interval. So like every minute does it work? Does it work? Does it work? And
The idea is To prefer active checks in any way. Active checks means the Isinga core runs a command that reaches out to the target system Verifies that it's working and tells Isinga. Yeah, you're fine
So we are gathering all the status and Saving it for you Including collecting some performance data Which is like CPU load Disk usage, whatever you would might have We want to modify people
On every channel you would like to know. Maybe a mail, maybe SMS, maybe a cheddar message of some kind We want to provide a way to set dependencies So Isinga knows, okay, that's a server in that data center and if the router there is down Because it's not responding anymore. Maybe the whole data center is down and
Maybe I only want to get notified for that one router that's broken and not every 3,000 hosts there So that should be as easy as possible And in addition we want to support add-ons
So all the data we know like stuff that changes performance data of any kind we want to pass to add-ons for them to save it and We got a view a lot of integrations nowadays We can forward to lockstation graylock We can send metrics to graphite, openTSDB and influx DB or whatever
supports like a graphite interface to send metrics and We want to extend that support in the future. So every tool that makes sense to integrate just let's do it Like of the lockstash approach with like 1,000 inputs and 3,000 outputs, but let's see where it goes
So where we add with Isinga 2 at the moment, we are at version 2.3.8 Which is like the third major version on the 8th bug fix version of that currently. This was just released in July So we're even working on 2.3.9 at the moment some smaller bugs
Some windows stuff And the main feature of Isinga 2 is it has been completely rewritten from scratch Now our efforts to Make Nagios better. We decided maybe it would be better to avoid trademark issues or trademark
Questions in the future and Of course improve the quality of the overall code, so it's now C++ and some boost We could have written it in Ruby or Java, but well C++ is quite cool
What we wanted to keep from Nagios is this the ideas behind it because they're pretty good easy and In addition, we want to enable users to use Isinga 2 in their environment. So there's a puppet module There are Chef receives, there are Ansible playbooks who set up Isinga 2. Some of them are
Managed by members of the Isinga team. A few of them like I think Ansible and Chef are made by guys That just are interested in it, but still can contribute the code to our Git repository And of course packages and vacant boxes are available So everyone can just start setting up without worrying about compiling
a binary or Extensive library like boosters. Hello computer, okay
Before we say what is better in Isinga, what is good about Nagios? As I said monitoring is easy. You just have to install it Configure a few hosts, configure a few services and you have a basic monitoring setup The stack is so simple. You just need to install
Nagios, no big dependencies, a simple web interface and You're done with your basic setup and everything you need to do is now describe to Isinga How does your environment looks? What should be monitored? And we wanted to keep that Those active checks I said before are really powerful because it avoids dependencies
Many monitoring solutions rely on some component somewhere to send the central monitoring system status we want to avoid that wherever possible and Want to give you the way
To do it centrally or ask the system centrally. Nagios has a pretty huge community especially USA Still today So if you search for Nagios problems or questions, you might get a lot of answers out there And of course all the plugins that exists on the internet for basically a lot of vendor stuff a lot of basic
default standard plugins They are just easy usable just a bash script few lines Some check output and a return code, that's all we need So, but why go to Isinga?
I thought a lot about including that slide here Because I want to avoid to bash on Nagios, so that's the only bashy like term, but it gives you an idea about What the focus of Nagios is at the moment. That's the download area on nagios.org
So their open source website and it tells you pretty much because Nagios is an open core project meaning you have a Nagios core. That's pretty nice and working But all the other stuff they want to do around it
you have to pay for it or it's at least they want you to pay for it and You notice maybe that small paypal buy now button for a student VM You can get our wake-on boxes for free At any time So let's start
Talking about Isinga Our main goal is and even as a company of NetWaze behind that We want to be 100% open source 100% free software. That's really important for us to support a community and
I think we're That's quite well received at least I hope So we welcome contributors. We have an active community support a lot of people even not directly from the Isinga project Are talking to users that have questions There are a lot of channels just shown on the website
So Another problem with Nagios and that's maybe something you Wouldn't really see in your home environment or a small company is scaling I Started like three years ago and Hadn't really a big idea about what Nagios is inside
And if you start reading the code and understanding it there were big mistakes made in the very beginning and if you had a very really big setup, you would notice that quite early because There's only a single loop doing jobs. So it's like a scheduler. That's just running running a while loop and
executing checks doing notifications doing Status updates and that costs a lot of CPU time, especially because it's only runs in one CPU core at a time and That has been there for a lot of time. It works pretty good
But there is a Limitation level you might reach at some point and what I personally noticed if you have an IMD processor Which like the opto runs with about 24 cores or more Each core grows smaller and Since it can utilize only one CPU core it gets slower and slower
the more CPU cores you have so large installations are pretty difficult with that even like 10,000 services you might have at a mid-sized company can give you a pretty trouble and of course the external interfaces we're talking here about
Have pretty Nasty problems you wouldn't you wouldn't see in a small environment So our goal was to go multi-threaded from the start Be able to load To do a lot of stuff in parallel to avoid cluttering to avoid plugging in any kind
And we wanted to be able to distribute load about a cluster just automatically and In our tests We run like you 1 million checks in one second and a lot
Of course I had a benchmark slide in here and it's it's a pretty How should I say? nice statistic but There is actually no limitation and you sing it to how many checks advance The limitation is how many checks can handle can be handled by the server
If you have some check that runs the extensive PHP library or Java or whatever You wouldn't of course be able to run thousands of checks every second But if it's just a simple shell script like a SNMP notification or whatever it can be done and
Who of you used life starters in the past or know what it knows what it does Yeah Basically Modules in our years were pretty nasty So There wasn't really a real library to link against you would have to take a few header files
compile a module against it and Yeah, it works. You can do events with it you can access data with it, but It's not a very good solution to to access data because
It can get very really slow if you do a lot of queries and it's very complicated to install so our goal was to go modular and These names are all modules inside a singer to at the moment. There are a few smaller ones as well
But all the idea idea is to have modules that do their stuff can access internal information of the cluster Can talk to each other Exchange status data of any kind and Just the module
Recite and doing what it should do so We have a module that Check stuff bronze checks how you configured it I have a module that cares about notifications if the check results changes. We will send you a notifications in that module We have compatibility outputs
For all the interfaces we have a life starters module that goes brings you the same interface the old life starters integration Brought you but just here inside the core we can write our perf data We have cluster connections and we can write to those various external tools and this modules are pretty pretty easy and
if you want to wonder maybe send events not to a gal for output, but to a Commercial tool of any kind you can take that module change it sent HTTP stuff somewhere and you're done And what we are currently working at the new API
That should be done till November and will be included in this 2.4 That brings a lot of additional interactivity with the core So in all the modules we want to give you a way to enable them Just easily and just configure it. So
we ship that modules only a few are enabled per default and if you want to use life starters enable it restarting the two and you're done and If you use IDO for status output database the only thing you might have to change is to go to the config file and
Change the database password or change the host name where the database resides or whatever So if we look at a cell for a moment All you have here is a lot of configuration files and
If you configure the Apache on on Debian based system Debian or Ubuntu in the past that's pretty similar to what we that here we ship a bunch of default configuration files and that available for time directory and Only thing it does is to set a sim link here. So if I do the thing
to feature enable life status Only thing it does is to set that link
You could have a look on the config file Everything it does is to instantiate a module called life starters if you would like so you can add a parameter where the Unix socket for life stardom should be and It will be there after we start that's the wagon box. Yes
So there we have a life shuttle socket and that's me there. We want to go
Make it easy for user provide all the stuff support as much it makes sense and Just allow you to use it and not compiling weird modules from someplace of course there were packages in the past, but Actually we had a single one had pretty much problems with those modules in the past because
Every time we try to To the change internal lists in a single one just for adding a new field because we want to provide it a new field for starters we had problems with that modules because they were breaking and
Now we have a simple module that provides life starters that can Do all the queries and if we change it We just have to make it what one or two lines to life starters, and that's all we need to have a user support for it So let's talk about clustering
I'm sure not sure how many of you have ever tried to make nagios high-level a See nothing. I know that face
Nagis was never intended to be high available So you have that one core the right status stuff stuff pretty good but when you try to make it even like a Cold standby system or just have a second system that can take over the work when the other system is down. It's pretty hard
So you would have to build a lot of tools around it and not even not just pacemaker But really files to transport config files to transport Stay to starter which gets updated every like 10 seconds or so and That's pretty hard, and I did it in the past a lot for customers, but it is not really fun
so We thought about how can we make it better and not only how highly available but? scaling horizontal and vertical so be able to have a DM set checker to have a
Company a country continent whatever, so That's what we have We can split the configuration so it knows okay, that's checked central. Let's check somewhere else and that enables scaling To avoid all the latency you have to go through to execute checks
Just run them on site and trust jump transport the status to the central system And if you go horizontal you can set like Two or three hosts of your central system. They are high level high available and They distribute checks
So you have two hosts every host us 50% if you have more it gets split on the arm of notes and singer to That we call it master settle as an agent Or that this is basically a singer to everywhere, but your configuration differs, so you have a centralized instance
That does local checking You can have another one besides it you have a remote a singer somewhere else that does Network checks for the whole location, and you can install a singer to on every single host
But they sing a tune every single host would just act as an agent Connecting to the central cluster and allowing the cluster to tell them please run that check and give me the result Yes
Yes There are multiple ways to it you can do them at the central you can put your conflicts everywhere There's a tool inside the whole cluster stuff that called That is called repository
That makes the central you sing a to instance available to talk to all the stuff to all systems collect their configuration and Provide the central instance with the knowledge. Where is what and what's what object should I know for my state? But what do you always recommend?
That's the conflict tile about here You can put your configuration on a central instance and The internal Ipi can replicate that configuration. So you have a bike if you have this system with two masters
the configuration is done on this system and When you change stuff here, we load a singer to here This will tell this system The new connection is here. They will compare their configuration state on file and if he hasn't had to change yet, he will just copy the file reload itself and
have the new configuration and that goes down to the agent and We try to keep it as simple as possible. So it's all files all just configuration on disk and it's just replicator
and that makes it pretty easy if you experiment with a little bit, so another problem we always seen in the in the in Nagios sphere of stuff a Security is pretty hot so
You had an IP in the back Which allowed you to run? execute plugins on a remote host of any kind and That is not secure It's Problem shell injections are possible. It's only security based on IP address whitelist
The encryption NIP is pretty much not existing it's more like a scrambling of communication with a compiled in private key And NSCA is basically the same way but any say stuff that sends a check result to central instance
So, of course you could do checks via SSH or whatever But that makes other problems so There is single to clustering forces you to use TLS and TLS means for us not a scrambled connection with some
pretty private key but certificates I know it's really hard with open SSL to do certification stuff to create something like a CA Create certificates. It's not easy That's why we prayed you provide you CLI commands to it
So if you have that central single server, just create a CA there create certificates You have to bring only CA certificate and the host certificate to the target system and that's all you need it now or
If you have something like puppet that already has a CA infrastructure in place just reuse their certificates works No You think a tool that the demon itself is also CLI to and
It does all the stuff for you internally Basically, yes So you think that who has that Pick a sub command which can create a CA create a sub
Create a certificate sign a certificate request and If you use the there's something called a node wizard just basically a command line tool to set up a new node there's also way to
Connect to the central server request a certificate from there and You're done That's what the last point is ticket Ticket in our world is basically a password that Helps to alter indicate you to connect a new client
So enter that password that password is configured in the center instance easy. Okay, that is valid for me. Here's your certificate
Hello, my PowerPoint doesn't like me anymore. Oh, sorry LibreOffice. Sorry Sorry, okay
Never seen impressed crash before that was first. So I
Want to show off a few configuration stuff we can do Mmm, if you don't get what this what the configuration does here. I will upload the slides afterwards and
There are a lot of examples and documentations, but I wanted to show you How I do configuration usually how it might help you so Gets back tonight is first It it was called before configuration tricks
So it allowed you to assign a SSH tag to a white list of host names It allowed you to make host groups Maybe assign checks to that host group But there was a lot of work and there was a lot of transparency How it actually works and I know a lot of people that never understood it. So we brought logic
and Let's just say we want to have a service SSH that runs SSH check and
We want to apply that service to every host that is a Linux system Because the host has attribute OS at Linux and Where is some address configured but not on hosts that are test systems maybe so and
That goes further and further So you can do matching on host names even array lists of host groups You Can do your own role definitions you can filter on other
attributes that describe your environment and Only you think you think a tool care about okay, I have a new host. Let's see what host group might have So we have a lot of configure passing in back, of course But it's easy for you to configure it. So let's just imagine hosts and
Sorry about the domain I couldn't resist myself Let's say we have a host somewhere and In the internet that's doing stuff for your company and it has an IP address And
We want to describe it. So it's a production system. It's a web server It's in a data center. We call number one, for example the application is blood was shot amazing and Who is responsible for it application support so and you can even do your
Do it easy for you. Can I read templates that describe your hosts? So have a something like a web server default that just sets production web server Do all the host specific stuff in the host? and just include the template on the host and
Now let's add services How about monitor HTTP, HTTPS on every node that is a web server Or whatever fits your company so That's all you need to do to monitor How about notifications
Let's say we want to be notified Out of work hours if a production system goes wrong So we said You want to apply a notification? That's the name of the notification to all hosts
the host must match that definition so only all production systems You want to be notified by mail that's? configured in the template here we want to send a notification to all users that are in data center on call and Want to be notified only on out of our girls and
If you configured notifications before You had to adjust a lot of contact groups on single nagas objects. That's the only object you need and the whole configuration Everything else is done
Just on resolving the configuration for you So on the same goes for services let's say we had that brother was shot before and Now we want to be notified or notify the application support team when it goes wrong again only Production only after work hours, and now we want to send the mail to app support on call
Behind that user groups are users the user might have email pager number jabber address Whatever trima secure idea no idea, but The user is
only there for be defined as a user to described as a user and The information is used here when defining notifications Yes If you have a shell script that can do it
That's that's nothing we really care about because we give you the means to notify In back is run some kind of command If it's a sent SMS via the local install GSM modem Or a mail is sent to the local mail server and forwarded to three other mails. We don't care about
Just need to write a shell script give him the parameters and done and Every magic you wanna do is in your script so Dependencies should be easy Would never wear with Nagios so
Again, if you have a location number one data center somewhere. I want to define that every host that matches My rule up down here Gets a dependency so the host targeted by that query as router as a parent and
If that router goes down It's not available for us that means that all dependent hosts are no longer checked because well, we can't check it and We want to get notified for every dependent host So in theoretical you should only get a notification that host is broken, but of course tune it like you want
so What is not really easy to explain here? How we run commands I seen a lot of user not understanding what we're doing We wanted to provide you an interface
To describe how a script check notification, whatever can be executed what arguments and supports and Give you a simple way to set this argument So that's for example some pretty window check you might have in your environment
It supports a lot of Arguments maybe like a host name as an AP community remote what the check should check on the server a warning and a critic limit So you tell a singer to where is the script what arguments does it have and
maybe default values for this arguments, so you can overwrite them later and Only thing you need to do if you define a service like a fancy vendor test Telecinger to which check amount it is and add bars attributes and
every attribute that check command is Knows about or tries to find gets reused so I can override the public community you've seen before I can set the Mode switch the warning switch whatever and The magic behind it all that stuff is shell escaped
But it's not possible to inject every Bash quoting whatever command you might think about because we will have an interface It tells him exactly which arguments. They are so we can make a proper arguments chain and
Of course if your script is Vulnerable to an SQL injection of any sort We can't fix that Sorry again
Sorry, that's our it should be critical Yes Talking about Any any executable on a Linux system or basically any executable on Windows or whatever systems you're on It only relies on the so-called nagas plug-in RPI
Which is run a script It echoes some stuff and it has an exit code same as the old nagas Yes, yes, yes Exactly same plug-in API, but a safe way to use them without writing a fancy command line and
If you want examples for check commands or Don't want to write your own. There's a Template library called the ITL which we ship in our packages that provides most of the
It's a bit small film up here Okay Yes, so the ITL provides a lot of default check commands for all the default nagas plugins for Publix or popular stuff like the MySQL has checks from console or whatever
so there's a lot of in there there's also contributions from commonly used checks and So we we have all in there you would need for basic setup And you just have to define the services set the vars and it runs yeah where to find the ITL library
If you have a system, it's you It's installed by the package is it is not meant to be changed by you that's why it's in user share
So if you want to improve a command change it for you either copy it rename it or Import it as a template and change stuff to it and Inside the commands plugins is for the mana Boulon as in the people against some windows plugins
NS client plus plus checks everything is in there and singer to includes This file ITL in its main configuration file. You can also disable it if you don't want to use it
so one more thing a Singer to is enhanceable and that's how many thing we included in 2.3. Which is like We've gone a bit further with the configuration stuff. So
You can you can use functions in there define your own functions and It's a pretty heavy for starter, but just to show an example Instead of assigning a fixed number like a load of two or so to that war
We give it a function we define a function for that that's what the double brackets do you could also write function Brackets like in other languages and that in here is Something like a script code in the configuration language
What it does is every time that check is run the function gets evaluated every time and So you can do stuff like hmm are we in a time period nine to five Then use 40 if you're outside you 60
Okay Remote
Is easy on one server combining multiple servers is hard The idea is to have to set the the parameters centralized But if you have the knowledge how many CPU cores that note has how many CPUs they're in
You could set that as a host parameter Maybe from a puppet fact or Any other intelligence you have at your company and then set the load here to say okay that limit multiplied by the CPU course
Mm-hmm, okay question was to check stuff in a cluster
Yes, yes, yes You can do that stuff you can connect multiple status informations inside
Isinga to I think there was an example in our block a few months ago But it still resides on Running multiple checks on multiple endpoints and Have an additional check that does the combination
But you can do a lot of internal stuff in there. You don't even have to run a check command you can calculate the result inside that configuration language by comparing stuff from multiple nodes Display a shared result, but it's too complicated for that's for the presentation But there is a lot possible. Yes
Yes, yes Yeah, you could do that stuff internally and I seen you can do business process inside my senior tool, but it's a bit more complicated to explain now
I seem to get the link for you, and if you get me your email later, so Last point in my presentation Is web interfaces I said Nagios is open core, and I think that's They were worked at a bit for Nagios 4, but that's basically the web interface you all know from the past and
It's still there. It runs pretty good We Tried to improve it a lot with a single one That's why we called it the classic UI did a bit of multi selection multi command features Make it prettier
but it still had a lot of limitations and Even if you have ever experienced a single web one I had pretty much fun to make Debian packages for it But it worked yeah, it's pretty complicated It's a lot of setup. It's JavaScript heavy. It's a configuration nightmare. We wanted to make it better, so
And one of the biggest drawbacks we ever had is To make it easy for users to use them So with the classic UI it was just there it worked there, but
It was pretty hard to get a user to modern our web interface so We thought about a single web 2 and It's a new framework it's Responsive meaning it should work everywhere and
It's made like setting up in a few minutes Only drawback for you it requires some database and It's Quite hard to explain especially for small users why they need a database where they need to bother with my squirrel post quiz
for the two three server setup and Nagios Had that magic status dot to transport status information all of the core and the web interface always read it displayed your information, but it's pretty ineffective at some point and
The main drawback was always history so if you have a small setup at home And you wanted to say okay, but what did my system three days ago, and I guess would have to pass a test a text log file For that time and depending on how big it is it would but get slow so
We recommend you to you. Please use the database. It's really easy it can be set up within minutes and give it a try so a single web 2 Actually is modular We spend a lot of time on the base framework
And that's why it's still in release candidate, and we're still working on it and fixing bugs. We wanted to provide you a framework To do your stuff in it And it's not restricted or monitoring information so the monitoring stuff here is even a module There are other modules to come we're working on and
Try to release over the next month like business process graphite integration PNP integration log stash and Even our own documentation runs inside a single app too, so if you read the single to documentation It is actually running in a single app to for you
The only only difference with the online version it and not authentication is disabled That's a big difference so Without Explaining anymore, let's have a look on it So the main idea is to give you a quick overview about the situation
That's you So let's say you open a single app to you have locked in used your magic password against some L top server Whatever you might want to use
And you see what's happening What are the service problems there is a host That is critical for like three minutes 40 seconds 41 seconds 42 seconds You have a overview about services that have recovered in the past so I
Think Wi-Fi is stable since 1407. That's great That's all the information if you want our new exactly what is behind it, okay? Click on it have the information Go down you see three notifications have been sent for this issue the last one one minute 47 seconds ago
Who is responsible contact? oops But check command is behind it where has the check been executed? So our main approach is to give you overview as fast as possible without a tricky web interface without some
Some old configuration and just give you an image about what's happening and Make it easy for you to do stuff, so if you run away reschedule a check or check it now just click a button and Not to with a fancy form and the date and put whatever
so You wanted to make it easy To just to have a few clicks to go so your web interface have overview So look at the disk stuff My home director is a bit full
Maybe have a visualization for the performance stuff That is collected 67 notifications have been sent for that issue and And Also give you a way To view the history of a check pretty easy just click on history and
I noticed there's a bug with the sorting in here, so don't worry You Will have a historical overview about Status have changed notifications have been sent down times have been said acknowledgments have been set in one few easy Just from the database
So yes That's still to come Displaying the command line in the web interface is not that easy We included that in the classic UI with the singer which is basically
Cheapmates command-line parser What we want to do in the future is to connect the singer web to the new API directly So you can ask a singer to what has the last command line been? Reason for that is also security We can filter out wars
That's called that are named as an MP community that have password in its name So a use of the web interface won't see your password of your application but if you just have the command line somewhere here, it won't be secure and That what that's what we're trying to do in the future. It is possible via internal commands currently for debugging reasons
And it would be you can see it over the IPI later Because it's the last status result also shows the last executed command line that's to come but not yet done and What I want to give you on the way is just just experiment with it a bit have a look at fuse we integrated here
We play around a bit with fumes that gives you insight about when when events happened a lot Reporting stuff Just play around with it a bit and
It should give you a lot of all of you Where was that yes Well, they're like pretty neat is that service grid tomato go away. I'll go It shows you where on which host Services are in our state on a grid so you can see what is happening
All what's problematic currently? so And I said it's responsive That's my smartphone. I just give it web access or VPN or HTTPS from somewhere And you will have the the same starters on them the same action the same URL one click check
Acknowledge it just should work so Inclusion at Where to start if you want to try it read our docs. There's an introduction guide How to install it how to where to start?
there is a big chapter about migrating from a singer or marios That's the biggest drawback you might have rewriting your configuration, but I promise you you would like it and Again, we can VMS if you just want to start up a VM have a send-off with
You sing a to running inside a bit demo configuration. I had here my system. It's just set up within a few minutes Packages for the most most of the common operating systems or distributions not to say There's even a Nagio sorry a Windows agent installation as a setup install
you recommended only for agent use, but it would run like a singer to demon and They around with a singer to a single up to and you would be happy to have you among our users and
there are big names on it and Most of them are still using a single one But we are running projects with a lot of customers at the moment we're knowing a lot users that wanna go to a single to big users and We're glad that our software's maybe maybe we're doing the right thing at some point so
If you are from the US or going to the US we have a senior camp in Portland just over the puppet conf and We plan to make a Camp happen in Berlin next year
It's still about details are still open, but I think it's March 1st So happy to see you there, and thank you