Why monitoring sucks, and how to improve it
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 163 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/50059 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Coefficient of determinationDifferent (Kate Ryan album)Computer clusterArmSoftware developerOperator (mathematics)Hydraulic jumpGoodness of fitGodComputer animation
00:51
Marginal distributionCartesian coordinate systemArmAreaCellular automatonComputer animation
01:29
Plug-in (computing)QuicksortNumberMultiplication signPhysical systemOffice suiteService (economics)Computer animation
03:36
Level (video gaming)Point (geometry)Computer animation
04:17
Model theoryInternetworkingInternetworkingArithmetic meanNeuroinformatikComputer virusCartesian coordinate systemPhysical systemRational numberService (economics)ConsistencyRight angleAreaMultiplicationSpeech synthesisModel theoryMassMathematicsRoundness (object)Direction (geometry)Slide ruleComputer animation
06:25
Adaptive behaviorNumerical taxonomyMereologySystem programmingGraph (mathematics)Model theoryMathematicsTime domainModel theoryMereologyAreaMeasurementInternetworkingLiquidThresholding (image processing)GodMathematicianProduct (business)MathematicsDomain nameComplex (psychology)Sound effectDirection (geometry)Speech synthesisCASE <Informatik>Shooting methodSemiconductor memoryQuicksortTheory of relativityRight angleDatabaseMultiplication signCartesian coordinate systemArithmetic meanBitMessage passingReal numberNumerical taxonomyGraph (mathematics)Adaptive behaviorDynamical systemList of unsolved problems in mathematicsStandard deviationTorusLine (geometry)PressureSoftware developerSystem callStability theoryGrass (card game)MassState of matterRelational databaseVirtual machinePhysical systemComputer animation
14:15
Model theoryMetric systemCASE <Informatik>Model theoryGraph coloringInternet service providerCross-correlationCentralizer and normalizerMathematical modelSpeech synthesisAreaMathematicianGraph (mathematics)MathematicsContext awarenessTerm (mathematics)Point (geometry)Semiconductor memoryArithmetic meanSemantics (computer science)AutocorrelationWeightXMLComputer animation
16:59
Similarity (geometry)RobotImage resolutionImage resolutionRoboticsException handlingReal numberVulnerability (computing)Multiplication signDirection (geometry)QuicksortVirtual machineSampling (statistics)Complex (psychology)Right angleHydraulic jumpOutlierField (computer science)Series (mathematics)Physical systemComplex systemComputer animation
19:29
Game theoryImage resolutionDefault (computer science)Computer forensicsStreaming mediaGame theoryStreaming mediaQuicksortNear-ringProcess (computing)Information2 (number)Speech synthesisReal-time operating systemAnalytic setImage resolutionWater vaporSampling (statistics)Closed setAreaBinary fileField (computer science)PixelTerm (mathematics)Level (video gaming)Multiplication signCASE <Informatik>Direction (geometry)EmailDemoscenePhysical systemComputer forensicsPressureUser interfaceDatabaseFront and back endsResultantElectronic data processingInsertion loss
24:02
Event horizonVirtual machineArithmetic meanComputer-assisted translationGraph coloringTwitterEvent horizonSystem callLevel (video gaming)Data storage devicePhysical systemComplex (psychology)Virtual machineGrass (card game)MathematicsCuboidAreaSpeech synthesisMoment (mathematics)Order (biology)Interactive televisionStack (abstract data type)Ocean currentPunched cardMultiplication signDressing (medical)Direction (geometry)Computer animation
27:48
Element (mathematics)LogicSystem identificationNegative numberFeedbackEvent horizonNegative numberOperator (mathematics)Model theoryParallel portComplex (psychology)TwitterFacebookCrash (computing)Decision theorySimilarity (geometry)Virtual machineBitAddress spaceMereologyCASE <Informatik>ArmPosition operatorMultiplication signLoop (music)Order (biology)Arithmetic meanLogicVideo gamePressureOffice suiteFeedbackIP addressAlgorithmMathematicsStrategy gameSocial classLink (knot theory)Linear regressionSystem identificationForcing (mathematics)Process (computing)QuicksortPhysical systemWordRight angleRow (database)Classical physicsThresholding (image processing)Computer animation
33:54
CodeConfiguration spaceFeedbackFocus (optics)Operations researchSystem programmingSpeech synthesisDirection (geometry)Expected valueFeedbackConfiguration spaceOperator (mathematics)Image resolutionBasis <Mathematik>Decision theoryPhysical systemTerm (mathematics)PressureTouchscreenAnalytic continuationDatabase transactionMixed realitySurjective functionComputer animation
37:13
Game theoryQuicksortMachine visionMechanism designProduct (business)Order (biology)Multiplication signPhysical systemDistribution (mathematics)Expert systemCorrespondence (mathematics)Analytic continuationAxiomComplex (psychology)Real numberThresholding (image processing)Direction (geometry)OutlierCASE <Informatik>BuildingGraph (mathematics)Semiconductor memoryMusical ensembleOffice suiteWindowComputing platformComputer hardwareAreaDistributed computingVirtualizationMappingArithmetic meanComputer animation
42:25
Computer animation
Transcript: English(auto-generated)
00:00
Okay. I think we're starting. This is one weird spot to speak at. It's actually like, you know, everybody is just walking from there to there, stopping by, sitting down, disappearing again. Everybody is distributed, like, you know, to the edges, just to jump away very fast if the talk is boring.
00:21
And I have almost nobody in the middle here. So that's weird. Okay, let's do the best. How about Humor here in Norway? Humor. Good. How many of you guys are Ops guys? Ops. Ops. Nobody. No. Oh my God.
00:42
Okay, yeah, this is a developer conference. Whatever the difference is. So yeah, just briefly about me, probably you don't just know me. I hope that this is okay that I have, like, margins here, because I didn't expect this widescreen here.
01:04
Oh yeah, anyway, Pablo Baron, I'm from Germany. Yeah, from Germany. CTO of Instana, we are the next big thing in the area of application performance monitoring and monitoring in general. This is not a sales pitch, I promise.
01:21
It's just an accident. No, definitely it's not a sales pitch, I'm never selling anything. So yeah, okay, if you're not in Ops, how many of you are actually doing monitoring of systems actively?
01:42
Amazing. So all the others are just developing software? Okay. Yeah, that's a cool job, man. Well, I know you. Yeah, well, if we speak of...
02:01
Actually, how many of you would agree that monitoring itself sucks? Probably nobody? Well, one guy? Amazing. Two guys, three guys. Awesome. Let's do the best to convince you that it really does. Because actually it sucks big time.
02:20
And this is what I will focus on in this talk. It's sort of popular these days to rant about things. I'm not only ranting, I'm introducing a couple of concepts that are probably pretty new that have always been forgotten and are more or less obvious these days, how to improve the monitoring which is very essential for IT systems these days.
02:45
So here we go. The number one rant is, yeah, there is not one single tool which can do all that that we need from the monitoring itself.
03:01
And even worse, it's like it's a zoo of incomplete tools. So what actually... I mean, I've seen somebody... It's a pretty large insurance shop. They have 50 of them, 50 different solutions added on top of each other.
03:22
Because, yeah, well, yeah, Graphite doesn't support this, but we need Nagios here. So let's add this. Oh, yeah, Librato probably has a plugin for that and so on and so forth. So you just stumble from one into the other and add on top and on top and on top. Do you guys remember Apollo 13?
03:41
It's pretty cool what they actually have tinkered together there, this OC2 filter thingy. But this is actually how our monitoring solutions seem to work right now. It's just like, you know, it's a patchwork. It's even worse than that. It's more like Jenga. So when you just realize, oh, yeah, well, this technology is not supported,
04:03
but we need it because the CTO said we need this technology. Let's add a brick on top of it. So at some point it probably will crash, but we can go up like 36 or 37 levels.
04:22
Speaking of consistency of tools, the problem in the monitoring area is that it's itself, it's too diverse. So one tool cannot promise everything. So what happens is actually tools do. Do you hear me? Can you hear me good? Okay. I never listen to myself.
04:45
Yeah, it's too diverse. But what is really needed, and this is one of the improvements that need to happen sooner or later, or actually sooner rather than later, is that when a tool which is responsible for monitoring
05:00
doesn't understand something exactly, like, okay, when we reach this threshold, this has this meaning, it needs to, well, sort of mimic a human, like try to reason about this in some analytical way. And this is something I will have on a couple of slides as well.
05:22
Like it goes in the direction of math. You probably expect this. It's probably too boring for math right after lunch. Did you enjoy lunch? Are you getting fed up at this conference? This is amazing, right? You can just go down for food.
05:40
Yeah, we'll have to get a couple of rounds in when I get back home. Yeah, the next thing, model. What is the model of IT? What is that? Yeah, we have servers, right? We have applications.
06:04
The problem is that this is actually how typical monitoring solutions are considering the model of IT systems. It's like there is a computer, well, there is an application, there is a printer, there is a switch, there is something like that. And there is this evil internet cloud.
06:21
Yeah, that's historic, actually. The problem that we have, during the past couple of years, we've added multiple layers, and we go on adding layers and layers and layers and layers on top of everything. So we are in the world of Lego bricks, where we're just like, you know, oh, here we go, hypervisor, then we go with a VM,
06:43
then we go with a container, then in the container. You can actually do container inception, did you know that? Did you ever try this? You run container and container and container. This is fucking amazing, I'm sorry. You can do this, but nobody should. I'm not allowed to swear, right? You probably will cut it out after the talk.
07:04
Yeah, so, speaking of adaptive models, do you guys know this meme? Okay, cool. Yeah, we have very complex taxonomy and geography in the world of IT systems.
07:22
Very complex. And it's all moving parts. Actually, nothing is stable there, because everything can interact with everything. Everything can rely on everything. It's not just like every application needs a database these days, and so on and so forth. It's much more complex than that. So instead of having a written-in-stone relational model,
07:42
which is then sort of projected to a relational database, which is behind almost every single solution, which is available right now, sort of relational database, well, they go with MySQL, whatever, we should embrace graphs. It's too dynamic for stone-hard stuff.
08:03
It's a dynamic graph, it's ever-changing. So, kinds of relationships will always change, come and go. That's where graphs kick in. Which brings us to the outdated technology. Those of you who are doing monitoring, what tools are you using right now?
08:26
Awesome. Cool. Nagios? No. Graphite? What's that? Okay, yeah, well, ELK, cool, yeah. It's not outdated, of course.
08:43
Yeah, probably going to an ops conference would reveal a little bit more of the tools that are being used. Of course, when you're a developer, you probably don't know what tools are being used in ops. And you probably just don't care. Yes.
09:00
The next thing. Next thing. Cool. Awesome. Yeah, the next thing. Okay, outdated technology is a little bit interesting thing. It's like when you come with a knife to shoot out. I didn't find a picture for that because the Internet is full of pretty weird pictures.
09:23
But you can compare it with something like this or something like that. I mean, it's a mobile phone still, it works. It probably works yet. But we probably shouldn't use it anymore. So, the messages here are generally.
09:43
Technology itself, and you know this better than me, technology itself is crucial for any solution. Actually, for any solution. Actually, for real, any solution. But also for monitoring solutions as well. And very many, even New Relic and then guys like that,
10:01
they have implemented their whole stack like 10 years ago. And I would claim that it's a little bit outdated because otherwise features could be added much faster. And the model of the modern IT could be, well, implemented a little bit more flexible.
10:20
But the thing is that when you look at your monitoring solutions, you should look at them like at any other technology you use in your stack. And it should go in your future, not in your past. Like, let's use something which is 20 years old. But, well, it might reveal some problems if we have some problems in production.
10:43
So let's speak about math. Do we have mathematicians here? No. Really? Oh my God. Cool. Naive. Well, naive math.
11:06
Do you guys still do oil check manually? Awesome. Well, it's Norway, right? It doesn't mean a thing. I'm not judging, I'm just saying.
11:21
I know people who probably would not know how to do that manually. But what I want to speak about is thresholds. Thresholds are completely useless. Because even with oil, I mean, this is a moving liquid. So it can just shoot over, shoot down, and so on and so forth. And then you just have a totally wrong picture when you look at it and it's somewhere here.
11:44
You know, there. This thing doesn't work here. Well, fair enough. But still, it's like on top of this thing there. What do you call it in English? What is the name of it? Measurement something.
12:01
Stick. Yeah, stick. Okay. So it's, in this case, it's on his left hand somewhere, the oil. Well, wrong picture in this case. Too manual. Thresholds are pretty useless because when you look at memory and stuff like that, well, okay, yeah, you're shooting from time to time over a threshold, 80% of memory usage.
12:21
So what? What meaning does it have? Do you want to be alerted with that just because you went over 80% of memory usage in this machine? Ideally, you have 100% usage and never shoot out. Well, depends on the case, of course.
12:41
The other thing which is very popular in the area of monitoring is baselining. Baselining is a very ambivalent thing because when you want to do it right, you need a lot of math to measure a real stable baseline. Because when you look at the picture like that, well, this area is completely fluted.
13:05
Nothing moves anymore. Everything is stable. So when you measure this as a baseline from the past two hours, it doesn't make sense at all. It's not a baseline because it's real bad state, but nothing changes. So going only after change measurement is also a pretty wrong thing.
13:29
So when we speak of math in the area of monitoring, it's actually the... And this is something that really goes into this direction slowly. When you look at new start-ups popping up, like signal effects and so on,
13:47
so monitoring is becoming a mathematical domain. It's a mathematical problem. And in this case, it's not only just like simple thresholds. So going with the simple statistical things like, well, it's two standard deviations from the mean, then it's a problem.
14:05
It's more complex than that. But the quite opposite of it is blind math. Do you remember Pulp Fiction? Does anybody really know what it is in this case?
14:23
Did they ever say what it is? They never did, right? I still have no idea what it is. But we see it. We see something. It's shining gold. Wow, cool. This is also valid for tools. When they look at your metrics, like something is there.
14:43
I need to tell you that something is happening. The other thing is, everybody is speaking of correlation. So we need to correlate like A with B and so on. When you do this, this autocorrelation... Not autocorrelation is wrong. When you automatically try to correlate things that you have no meaning of,
15:02
in this context, like this pair, what could happen is that you clearly see that the color of the pants people are wearing in Norway correlates very well with the amount of rain in Australia.
15:23
So how much meaning does this have? Your monitoring solution will leave it up to you to decide, does it make sense or not? So what I'm trying to say with that is, whenever you speak of correlations, you need to speak of semantics as well. And this is what is actually partially missing in the area of monitoring,
15:45
which is very important, is that semantic knowledge is a very central concept for monitoring. You need, as whatever tool provider or solutions provider, whatever, you need to understand what is actually happening when you look at things, how things behave together.
16:01
Does it make sense that an I-O weight goes together with a memory consumption metric or whatsoever? All these things. And whenever you speak of mathematical models, probably some of you are mathematicians, but you don't say that, when you speak of models, models are something that make sense.
16:21
So you model after a real world. You just try to solve a problem in terms of mathematics. And this is very crucial, not only for monitoring, but, you know, I've been playing around with this whole big data stuff. You literally don't find anything when you don't know beforehand what you're looking for, seriously,
16:41
because at some point it just becomes ridiculous what you find there. Which brings me to the next topic, and I would call it eyeball intelligence. So you have eyeballs and you are intelligent when you sit in front of a bunch of graphs.
17:00
Who's looking at something like this all day long? Probably everybody. Everybody likes it. Come on, guys. We all love charts on black backgrounds.
17:21
So yeah, we don't feel like Lieutenant Data from Star Trek, right? Yeah, except of that, actually he's a robot. When he looks at things, he has a different pace of how he can process stuff. And when we look at hundreds of machines of systems, we don't feel like data,
17:43
we feel like, hmm, no idea what is happening. And the next thing is that, actually, well, USS Enterprise is on autopilot most of the time.
18:00
They only jump in when it's serious, when there is an issue and they need to do something. I'm not a pilot, but I've been talking about this with a couple of pilots, and it's actually pretty comparable to piloting an aircraft. Most of the time you're on autopilot, when things go wrong, you need to solve this manually yourself.
18:23
So you just need, and this is the direction where the monitoring tools and solutions, whatever you built inside your setup, will go into. You will have more intelligent robots that will support you or partially replace humans, actually,
18:44
when it's about real boring things, because it's about a lot of machines, complex systems, where a human is not able to understand, to grasp everything, not even looking at graphs and charts. Even when your tool will show you, there is an outlier.
19:02
It's still up to you to decide, is this outlier something that makes sense or not. It's still the same. Also, what is a problem right now is a sort of weak resolution. And it's, so there are tools that are resolving like, well, they're sending data from the field,
19:24
like once in a minute or once in 10 minutes or 15 minutes, a sample of 15 minutes. So it's comparable to this. Do you remember this device? Does anybody do it? Or am I the only guy who's old enough for that?
19:42
But you still don't have it, right? You don't have it anymore. I hope so. Yeah, there's one-way camera. You can take pictures with that, but you will throw it away, and pictures are pretty bad quality. The next thing is the Big Bang. When you try to do a resolution like once in a minute, when you sample to a minute,
20:05
Big Bang was something that has happened within a couple of seconds. You will completely overlook something like that. You will not see it. It's probably strong enough that you will see like, there is a peak. Awesome, I see a peak. But you don't have any details anymore. What was that? What is it all about? And so on.
20:24
So we have a high level of pixelation, and pixelation itself, in terms of monitoring, is, well, pixelation itself is good for retro games when you implement a Mario game, a simulated one, or if you have something to hide.
20:43
But what I want to tell is, we need to go for resolution, which is below one second. And on demand, we need to send even more data. This data needs to be pre-processed, prepared, and sent over, once we require it.
21:04
And speaking of all this real-time, no real-time, near real-time, near near-time, near near real-time, and so on and so forth. Yeah, we're there. Do you guys have a game like this in Norway?
21:20
Some folks in Germany do have. The idea is to, you have two bins on both ends, and you need to, well, to get your water from one bin to the other through this pipe, like, manually. I mean, no pressure, nothing. Like, just spill it in. So this is how solutions that are not being, have been built for real-time processing of data,
21:46
or near real-time processing of data, how they actually look like when they speak of real-time. So I have a database, and, well, somebody will request the database, and then I will spill this information into sort of a stream, maybe.
22:02
But it's not end-to-end stream through. The other example is the snail mail. Do you have it in Norway? You don't have it? No, I'm kidding. But nobody should be using it anymore, seriously. What is it actually for? Yeah, postcards, maybe.
22:23
So real-time thing is, when you have, when you see a, like, previous second or a second before the previous second, it's already useless. It's in the past. You don't win anything. It's just like forensics.
22:41
You just look at the past, and you try to understand what has happened. So everything in this monitoring world, and I'm pretty sure it will be the case more and more, will go into the direction of real-time data processing, like, well, near real-time data processing,
23:00
where the whole data is being streamed from your field back to the backends, and over to the user interface, or whatever, alerting analytics on whatever systems are running behind the scenes. Because this way you don't lose time, where you don't need to lose time. That's what I mean.
23:27
That's pretty cool. Forecasting yesterday's weather. I'm in Oslo for now three days, and I wanted to spend, like, half a weekend here on Saturday. And every time I look at the weather forecast on my iPhone, it's like, I mean, it changes completely.
23:46
So this is one thing that weather forecasts in areas that have quite an amount of water close to them are totally useless. This is a meteorological problem, but this is also a monitoring problem, because, can you guys read it?
24:19
I repeat it, sometimes I want to go back in time and punch myself in the face.
24:25
The meaning of that is pretty simple. Okay, yesterday something evil has happened on my system. Something has crashed totally. Yeah, cool. So what? I mean, it's today now.
24:46
Forecasts need to go into the future, of course. That's what forecast actually means. So when you have past events, you will crunch this data, whatever, again, whatever tools you're using, you can write it straight to your HDFS and then Hadoop around on it, or Spark around.
25:02
You've learned about Sparky as well. You can do whatever you like. You can query it with your ELK stack and so on and so forth. You can do everything. But it's, again, it's in the direction of postmortems and learning a little how your system behaves actually. But what is necessary is like, you know, today is only yesterday's tomorrow.
25:22
We need much more forecasting to prevent issues, and the forecasting should be real, accurate. This is quite a problem, but weird enough, nobody really takes care about this in this whole world of monitoring. Everything is speaking of math with big data. That's where the money is.
25:42
But on the other hand, we need to do much more in order to prevent systems from crashing, from misbehaving. Okay, another thing is ghost in the machine is probably nothing that has a meaning at the moment.
26:01
So those of you who are on Twitter or whatever social network you use, do you remember this discussion with this weird dress? I mean, I didn't even follow it. It's weird. So the discussion was about what color actually this dress has. Is it gold or is it blue whatsoever?
26:24
Yeah, well, our eyes and our brain is a very complex area. But speaking of Schrodinger's cat, it's like the general simplified idea is, we have a cat in this box and we don't know if it's dead or alive.
26:46
And this is where I want to mention that we speak more and more about immutable infrastructures. We speak of machines that can be added to a data store and then disappear.
27:05
Nobody would really care about this. Look at React and Cassandra and data stores like this. Nobody would care about this because the majority still works, you still have your quorum satisfied and so on and so forth. So that means that we have much more flexibility about things that can die
27:21
and come back and disappear again where the current monitoring solutions or the current monitoring world have no real good support yet for that, for this situation. It's like everybody is drawing a map of things that are there. And when you start killing things, returning things, it's suddenly new things that pop up.
27:46
So it's just a replacement of the other one that was previously running there to satisfy this weird Erling guy here. It's not only about Erling.
28:03
I'm not eventualizing anything here. I just suggest that you look at the concepts there. This is the most important thing about this conference, what I've learned here. You just get introduced to concepts. Whatever you do with the concepts, it's your job. But one of the concepts from the Erling world and now Akka world and so on,
28:21
the idea is just simple, you just cut everything into very small pieces and those small pieces are allowed to crash independently, come back again through a configured strategy. They come back, they crash again, they come back. So whenever I mention let it crash these days, I also mention bring it back
28:40
because ops people don't like the word crash and this is something that I stumbled upon in ops that I've been teaching how to operate React for example. It's like every second there is a lock entry, crash, crash, crash, crash. So they actually configured Nagios to react on crash as the word in the lock.
29:02
So this thing just was going crazy all the time. Okay, the other thing is that identifying something by the IP address or a name or whatever, name is probably not the best example but IP address, it can change but this will be still the same thing.
29:21
So we need better methods of identification or even similarity checks that are again a little bit more mathematics there, like saying this thing that is appearing now here, is this a replacement for the other thing that has disappeared a couple of minutes ago? Because otherwise you just get confronted with two of them
29:41
and you in your brain need to sort of, to satisfy this relationship, like okay, A is equal to B, yes, but you're stupid too, you don't know it. The other thing is alerting. Who's on duty from time to time?
30:03
Like with a pager, with a classic pager, yeah, cool. Do you like it? Depends, right? I think that what we have with alerting in most of the systems is pretty binary.
30:21
That means that a thing is either dead or it's alive. You can turn it yellow, sort of, but it's still, yellow doesn't mean that I'm forecasting it will get red, it's just probably a threshold which is 60% of something of some resource that is yellow.
30:40
Yeah, okay, I can sleep on. 80% of something, I should better be getting up and cooking some breakfast because I probably will have to go to the office. Or I can do it remotely if I'm lucky enough.
31:05
That's when you get a false positive at 2 a.m. and need to get up to repair something. So, a lot more needs to be done. When we speak about this math and these models that are much softer than what we know from hard thresholds,
31:27
there is much more to do, to be done, in the modern tooling in order to prevent false positives and false negatives. And this is actually something that you would, all these false positives and negatives, when you have an idea how to actually, how to capture these false events,
31:46
you need to learn to train your algorithms with that, you have it as labels. Does anybody know how machine learning go? Sort of an idea of it? You essentially have an algorithm and you train this algorithm with some stuff
32:01
and when you have a classifier, which is amazing for, of course, classification is amazing for some of the monitoring problems, as well as regression is amazing for the other part, but when we're in the classification, the best thing you can have is, I have a classic, I have labeled records or something like that, that say this is bad, this is good. This is amazing.
32:21
Because a machine can make a decision based on that and say, okay, we have like 94% probability that this is bad. And this is how you can decide. And this is where the monitoring needs to go to, into this softer area, because all that stuff, all that things get more and more and more and more complex and grow and grow and grow. It doesn't need to be Twitter or Facebook.
32:42
But everybody who's working with containers will start like working with minimal things and these minimal things are like much more than previously. It's not anymore these three VMs. It's like, it's now hundreds of containers running in parallel. The complexity grows and grows. And what actually should be done is that user supports this,
33:01
because this is the cheapest way how the classifier can learn is that the user will do something. Just say, okay, user tells you this is bad. I'm currently having a problem. I'm currently not having a problem. Which is then the next topic, this feedback loop. Currently, when you need to change some logic against all these false positives and so on,
33:26
for your case, in a, well, modern tool, yeah, you get instructions like IKEA cabinet.
33:41
So you actually, yeah, you can swear around, you can tell the machine, yeah, it cannot be the case. It cannot be good right now. But what you need to do right now, currently, is you actually code or configure. You have like a whole screen of configuration where you need,
34:03
okay, when this and that, but not this, then do that, and so on. This is not like, this is not the way how feedback works. The best feedback that you can get from somebody who knows, who understands how things work in terms of monitoring performance and so on.
34:20
It's just like one click. Okay, I can judge immediately. This is not a problem. Bang. So, and also this is, as I said, this is the basis for classification. This is amazing how much you can do when you allow your users to give fast feedback.
34:45
Speaking of intelligence, well, none of you guys have named tools that are going into this direction more and more. Is anybody old enough or weird enough to know this guy?
35:04
No. Okay, John McLaughlin. Don't care about him. He's a weird musician. The thing is that he's trying here. This is a good picture. He's trying to get this square thing into a round hole.
35:23
So, whenever a monitoring solution is expected also to be the basis for business decisions, I would claim that this is totally wrong, because it puts a lot of pressure and expectations onto people who are not really into business.
35:45
I mean, we are there to support systems that they run. Whatever business is, we are supporting them, but they shouldn't offload the work onto us, onto technicians. So, business intelligence and monitoring are probably mathematically, from what I'm claiming here, mathematically go into the same direction.
36:04
Yes, it's okay. As well as like, you know, like predicting of gas behavior in a bucket also would go mathematically into the same direction, but it doesn't mean that these are the same things. So, what business expects are the business requirement is actually stable operations, not more than that.
36:25
Not that you are able to tell them what is a business transaction by looking at your locks and looking at the behavior of the systems. This is the wrong way to look at these things. So, yes, I claim that we shouldn't mix this, and what I observe, everybody observes this actually.
36:43
Of course, this is where the money is, that people are going to the direction of, or tools are going to the direction of business intelligence, claiming that you can learn about your business, looking at system data. Wrong, you can't. You can, of course, when you have custom fields, but this has nothing to do with monitoring anymore, with the classic one.
37:05
And this is the hardest one probably. So, when we nowadays want to implement a quite sophisticated monitoring solution, which
37:25
also has a continuous look on performance and so on, we need all-rounders. We need people who understand very deeply the corresponding platforms, Unix itself, or if you're on Windows, then the Microsoft world,
37:42
all the hardware, all the virtualization, everything around this, so playing every single instrument. The thing is that it's expensive. These people cost money, and it's not very many of them around, seriously.
38:00
I'm not claiming that not everybody is expert in this area. I'm just saying that what I've seen is that, and this is actually what my previous company is making money with, is that a real expert who will help you, because you just don't have time to look at these things. You will just buy somebody for two days, and they will solve problems in your running system.
38:24
So, yeah, it's expensive, like not in Swedish expensive, it's Norwegian expensive. It's like the booze here, pretty expensive. A lot of Corona. Also rare, and that's what I mean with not even 1%.
38:45
And I think that what needs to be done in order to improve that, that the solutions we have, the solutions that are being provided to us, that help us monitor, they shouldn't be as plain as they currently are.
39:04
Like introducing charts and probably just showing me one outlier or something like this, because this is then a tool for a real expert who's real expensive. They will earn money within those two days where they try to consult you or try to solve your production problems.
39:20
You should be able to do that alone. So you need mechanisms in the tools that will support you and probably completely make obsolete very expensive experts in this case. I'm not sure it's possible, but this is the direction it should go from my opinion to. So, yes, this is what I mean when I say that monitoring sucks, and that's what I mean when I say that it can actually be improved,
39:46
but it's a lot of work, and it's a long way to get there, because currently monitoring is, or at least still two years ago, it was like a forgotten child, an abandoned child. Nobody wanted to care about this.
40:01
We have now conferences around monitoring. People take care more and more and more. Some mathematical assumptions are naive, but at least it's more than just looking at a simple 80% threshold for memory. This is trying out distributions and looking at distributions. It goes into the right direction, but it's a long way.
40:21
We will get there. Thank you very much. We have plenty of time for Q&A. I'm not sure if Q&A is a concept here, but we have plenty of time.
40:49
Oh, yeah, I'm not here to sell anything. Can we make a deal? I can show you later a sort of visualization,
41:01
because, I mean, I wouldn't push the product we're building right now, because it's the wrong place to do so, definitely. But you can experiment with everything that you know from gaming. You know, they've sorted out how to do maps, how to do navigation in complex worlds,
41:20
and this is the direction it will go to, definitely. It's not any more like a solid graph that you see and it doesn't move any... Any other questions? Oh, my God, was that bad. Yeah, so just ping me. I'm at the conference till the end.
41:44
I'm still here tomorrow, but I don't expect that anybody will talk to me tomorrow. I can give you a quick idea of what we are working on, but I will exclude it from the talk completely to keep it clean. I hope that you get the ideas to get the vision, and every vision has targets towards the vision.
42:03
It's small steps and everybody will go into this direction, definitely. I'm pretty sure, so expect a lot of movement in the world of monitoring and don't build on real old grandpa tools anymore. There is much more out there which is well doing its job, seriously. Thank you very much.