Autoscaling Elasticsearch for Logs on Kubernetes
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 56 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/67161 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
Berlin Buzzwords 202243 / 56
22
26
38
46
56
00:00
BlogComputer configurationScale (map)Operator (mathematics)Demo (music)Local GroupPoint cloudMetric systemEnterprise architectureSoftwareOpen sourceSoftware maintenanceComputer virusMusical ensembleLattice (order)Multiplication signWave packetBitContext awarenessOperator (mathematics)LoginScaling (geometry)Product (business)Open setPoint cloudTime seriesDemo (music)Information technology consultingState observerCASE <Informatik>XMLUMLLecture/ConferenceComputer animation
01:25
Computer virusLocal GroupPoint cloudMetric systemBlogEnterprise architectureSoftwareOpen sourceSoftware maintenanceSoftware maintenanceSoftwareProjective planeSoftware engineeringPolygonData managementCloud computingOperator (mathematics)Lecture/Conference
02:03
Software maintenanceMultiplicationPrice indexWeb pageRotationCache (computing)AliasingCompact spaceScale (map)CASE <Informatik>Elasticity (physics)Price indexBitMappingNumberPattern languageChemical equationAliasingMultiplication signData managementState of matterContext awarenessStructural loadDifferent (Kate Ryan album)LoginRotationInformationPlug-in (computing)Point (geometry)Right angleComplex (psychology)Software testingSlide ruleLink (knot theory)ThumbnailRule of inferenceTime seriesQuery languageMultiplicationQuicksortRange (statistics)WhiteboardCartesian coordinate systemCycle (graph theory)Video gameDependent and independent variablesOpen setWeb pageSet (mathematics)Coma BerenicesSoftware maintenanceOperator (mathematics)Cache (computing)Lecture/ConferenceComputer animation
08:40
Compact spaceScale (map)Scaling (geometry)BlogBackupLocal ringRandom numberChannel capacityBefehlsprozessorPrice indexTotal S.A.Scaling (geometry)Data storage deviceSoftwareMereologyPrice indexQuery languageMiniDiscUniform resource locatorForcing (mathematics)Chemical equationStructural loadTemplate (C++)LaptopSpeech synthesisSpacetimeLoginCASE <Informatik>Point (geometry)Operator (mathematics)RotationCloud computingNumberBefehlsprozessorWorkloadBitOrder (biology)Thread (computing)Range (statistics)Multiplication signComputer hardwareState of matterConstraint (mathematics)System callMultitier architectureRight angleBackupPoint cloudRandomizationComputer animation
15:06
Operator (mathematics)BlogElasticity (physics)Point cloudPrice indexIndependence (probability theory)Enterprise architecturePresentation of a groupWeb pageComa BerenicesActive contour modelProgrammable read-only memoryUncertainty principleMaxima and minimaClient (computing)Scaling (geometry)CodeSchmelze <Betrieb>Convex hullInformation managementOperator (mathematics)Demo (music)Moment (mathematics)TunisPoint cloudFitness functionString (computer science)Arithmetic progressionCASE <Informatik>Computer architectureSoftware developerLimit (category theory)BitDemosceneThresholding (image processing)Open setMiniDiscScaling (geometry)CodeCodeAliasingPrice indexTemplate (C++)NP-hardProduct (business)Musical ensembleElasticity (physics)Multiplication signView (database)Point (geometry)Connectivity (graph theory)Lecture/ConferenceComputer animation
22:15
Group actionTable (information)Loop (music)Operator (mathematics)Annulus (mathematics)Natural numberMilitary operationAliasingLemma (mathematics)StatisticsRoyal NavySummierbarkeitBlogMenu (computing)Maxima and minimaScaling (geometry)Context awarenessRule of inferenceoutputEmailResource allocationEvent horizonExecution unitDrill commandsPrice indexComputer iconRadiusData miningProxy serverTemplate (C++)Demo (music)AliasingVolume (thermodynamics)Computer fileMaxima and minimaArithmetic meanMiniDiscOperator (mathematics)Price indexMusical ensembleMultiplication signConfiguration spaceBitSoftware testingScaling (geometry)LoginSet (mathematics)NumberOnline helpSource codeComputer animation
29:10
Gamma functionMilitary operationWave packetLoop (music)Open sourcePrice indexRevision controlOpen setDemo (music)Maxima and minimaDefault (computer science)Data storage deviceRight angleFeedbackOperator (mathematics)Computer animationLecture/Conference
30:34
Operator (mathematics)Moment (mathematics)Elasticity (physics)Open setRight angleMeeting/InterviewLecture/Conference
31:08
Musical ensembleLecture/ConferenceJSONXMLUML
Transcript: English(auto-generated)
00:07
Thanks for joining. So yeah, we're going to talk about Elasticsearch. We're going to talk about Elasticsearch in the context of logs. Actually, Elasticsearch or OpenSearch, for most of our scope, it doesn't really matter.
00:22
And yeah, we're going to talk about them in the context of logs and other time series data. And we're going to focus on how to autoscale them using an operator on Kubernetes. So the agenda would be first to talk a bit about the use case,
00:41
why you would want to do that to yourself, and then about how we're going to do this, what we're trying to automate here, and then the available operators, the operators that are already there that can do stuff like that. And then we're going to end with a demo. So let us introduce ourselves first.
01:02
I'm Radu. I'm a search guy at Sematext. I work with Elasticsearch, OpenSearch, and Solr, doing most of my time doing consulting, production support, and training for those. And I also help out with Sematext Cloud,
01:21
which is our general observability product. Hi, I'm Cibrian. I'm a DevOps slash software engineer working for Polypoly. I also do consulting for Kubernetes and infrastructure automation.
01:41
I am also a software maintainer for Kubernetes projects like KOps, which is Kubernetes operations, its CD manager, and cloud provider AWS, and also contributor for many of the Kubernetes ecosystem
02:01
projects. Thank you. All right, so let's start with the use case. The idea is if we have a small Elasticsearch cluster, we want to automate as much as possible. The dream, the buzzword is zero maintenance, but yeah, it's a buzzword.
02:22
But yeah, we want to basically just give people a cluster that they can send logs to and not worry that much about it. Larger clusters tend to be more like snowflakes. We need to be more aware of what the load patterns are, how the mapping looks like, and stuff like that when
02:42
managing them. So we don't really have this in our mind when we're talking about Kubernetes operator, but in some situations, you might be able, for example, let's say you have a multi-tenant application, you might be able to say, OK, let's have one cluster per tenant, and then we have the previous problem.
03:01
Like, we have many small clusters that they can hopefully kind of manage themselves. Moving on to how, so what we're trying to automate. Just to make sure we're all on the same page, I want to talk a bit about time-based indices and why that's a good idea, and then I'm going to argue that for most use cases,
03:23
rotating indices by size is going to be a better idea, and then I'm going to talk about how that would work in the context of scaling up and down an Elasticsearch cluster. So time-based indices, if you use Logstash or any other log shipper, you've probably seen this a lot,
03:42
like one index per day, one index per month, one index per hour, something like that. The advantages are pretty much across the board. We're trading a bit of cluster state, so we're going to have more indices, more shards. But compared to having just one index, we're going to have much faster indexing, because we're going to do much less segment
04:03
merging while indexing. That's our main bottleneck. We're going to have much faster searches, because let's say I'm looking at the past two hours. I'm only hitting the latest index. Obviously, that's going to be faster. But even if I'm looking at the whole data set, all their indices will be done, so they're much easier
04:24
to cache. So even if I'm looking at more indices, if I'm looking at the whole data, those searches will be faster too. One thing to mention here is that if you're using Kibana, even if you're looking at the last two hours, Kibana will look at your whole index pattern.
04:40
So if your index pattern is Logstash star, it's still going to hit all the indices. But Elasticsearch has this cool shard pre-filtering feature where it looks at your date range, and it looks at the date range of each shard, and it's going to be relatively cheaply able to just discard shards. You'll see that in the response. It's a shard skipped, and you can get a number.
05:04
And then when it comes to deleting data, it's obviously much cheaper to delete entire indices than to delete data from within an index, which will cause more segment merges. In practice, we might end up having multiple time series indices sliced by how we're querying.
05:23
So let's say I search for syslog separately quite often, then it makes sense to take syslog data off on its own time series and have it like that. Purely time-based indices suffer from what we call
05:41
the Black Friday problem. So if the indexing throughput is uneven, then we might end up with huge indices on one hand, which are going to be slower to index, slower to search, and very small indices on the other hand, which are bloating the cluster state unnecessarily.
06:02
So to fix that, we can rotate indices by size. So this will give us a much more consistent both write and read throughput. And our rule of thumb here is to rotate at 10 gigabytes per shard. I've put a link there in the slides.
06:20
The slides will be public, of course, with some performance testing on how we determine that. But of course, there's some, sorry, there's some complexity that we're trading off with this design. So first of all, we have to create like our first index, and make a right alias point to it.
06:40
And then our log shipper will write to that right alias. And then this alias will have to be managed. So we have to do this rotation at every, let's say 10 gigabytes per shard. Usually this can be easily automated with index lifecycle management in Elasticsearch. In OpenSearch, it's called index state management.
07:01
They do pretty much the same thing. One notable difference is that with index state management, you have to say, rotate at X, say gigabytes per index. So if we change number of shards, we need to also adjust this state management policy.
07:20
With index lifecycle management, you can just say primary shard size X, and then no matter how many primary shards you have, you don't have to worry about that. It will just work. Other things are, if you still hit all the indices, like Kibana does, like with log-star, it's gonna work just the same. It's going to look at the time ranges,
07:42
and it's going to skip shards that are not involved. You can still cache this sort of information on the application, but obviously it's more awkward than having dates in the index name, right? And it's the same when we want to delete data. It's going to be a bit more complex.
08:01
Usually those management plugins will deal with that. Another problem is when we backfill. So if I'm importing data from last week, it's going to go into the latest index, because that's where my right alias is, and it's gonna mess up those time ranges a little bit.
08:21
Okay, so onto the actual auto-scaling bit. So let's start with the smallest cluster. Let's say we have two nodes. We have one index, one shard per index, and we have one replica per primary shard. We want to start with the lowest number of shards possible, where we still have a balance across our cluster, because we don't want
08:41
to bloat our cluster state. We don't want to have too many shards. When it comes to performance, that's not really a problem, because we can still fully use the CPU while indexing if we want to, even though we have one shard per node. When searching, we have one thread per search per shard,
09:02
so that's somewhat limiting, but it's usually okay. If we're looking at, let's say, two hours, that's usually gonna be a fast search anyway. If we're looking at a longer time range, then we're going to naturally parallelize, because we're gonna hit more indices, so we're gonna spawn more threads with our search, so usually that's okay.
09:20
However, when we add the third node, we won't be able to use it for indexing. Even if we oversharded, if we had more shards than nodes, we would probably not have a perfect balance in our cluster, and I'll show you that in a bit. So our suggestion is, even if we maybe did not get to our 10 gigabytes per shard, we can force a rollover.
09:43
Okay, so we can use the rollover API to create a new index. Before we do that, we need to update our index template to change from one shard per index to three shards in order to have a perfect balance, and then we do this force rotation. We will also want to use total shards per node to enforce a balance, because otherwise,
10:02
we might get into a race kind of situation. So when we add a new node, it's obviously not gonna have any shards initially, and Elasticsearch is going to start to balance them out. But if I'm creating the new index right away, maybe the cluster isn't perfectly balanced at that point, and so Elasticsearch will have a tendency
10:21
of creating a lot of those new shards into my new node, which will make that new node a bottleneck. So total shards per node will help with that. Okay, let's say we add a fourth node. Now we do have a perfect balance on the face of it, but for indexing, it's not.
10:40
Like some nodes will have two shards of the latest index, some nodes will have just one. So it's not, we're gonna have bottlenecks in our cluster. So again, we can change the index template to have two shards per node. This is gonna be enough for four nodes, and force rollover.
11:01
Now let's say we want to go back down. Our load is, you know, we have less load, and we want to go back down. So we need to do two things. Actually, we need to do one thing, to drain the node. But like, how do we do that? We exclude it from our location, and then Elasticsearch will move the shards off it. But if we use total shards per node,
11:21
that might get in the way. So we need to either remove or relax the total shards per node constraint of existing indices in order for them to be moved to the other nodes. And then we can safely shut down the node. And then once we do that, we do the same thing. We change the index template. Now we have three shards per node again,
11:41
sorry, per index again. And we create a new index, we move on from there. So you can see where this is going, right? Some other, let's call them best practices. One of them, with regards to scaling,
12:01
we usually, for most use cases, we want to scale based on disk space. So here's my thinking. Even though usually for logs, your predominant workload is going to be indexing. You're going to use a lot of CPU indexing. You're going to do not so many searches maybe. But let's give an example.
12:21
So if we want to ingest 10 megabytes per second, this laptop, Chibrian's laptop, can do it rather easily. But if we want to keep that data for a month, then we have about three terabytes. And even if this had the disk space, it would be rather complicated to load
12:40
like a Kibana dashboard with three terabytes of data. So we would probably need more laptops. And then those would be able to ingest more. So in practice, we would know upfront, like on this laptop, on this node, we can fit let's say one or two terabytes of data. And then when we get there, we just add more nodes.
13:02
So we can scale on anything we want. And we'll talk about that, like existing operators can scale on CPU, a number of shards and stuff like that. But in practice, what I find is that most use cases, you just look at disk space and scale based on that. Speaking of disks, searches are going to do
13:22
lots of random IO, right? The query part and also the aggregations part is going to do a lot of random IO. So if you're using a cloud provider, it probably makes sense to go with ephemeral storage, which is usually backed by local SSDs, which is because that's going to be much faster
13:41
than even if you're still using SSDs with persistent storage, because that goes over the network and introduces extra latency. So it's not that important that it has a lot of IO ops and a lot of throughput, but if the latency is not very good, then our searches are going to be slower and then we can put less data on a node.
14:00
We're going to have more nodes, it's going to cost more. Obviously, the downside is if we lose the node, we lose the storage. So we may need to have either just backups or extra replicas to make up for that. Last but not least, I want to talk a bit about the tier setup. First of all, we don't support this
14:21
in anything we've done. But in practice, I've seen that this is rarely useful. Because I've seen a lot of situations where the cold setup was like, okay, we can just put some spinning disks on there. It doesn't work because the cluster becomes unstable.
14:41
It cannot kind of monitor itself and stuff like that. And then I've seen situations where, yeah, you had good enough hardware on the cold storage, and then it begs the question, should it be a cold storage or should it be used for indexing because the CPUs are idling? So in practice, I've rarely seen a situation where hot cold makes sense. So we're kind of back to just one tier.
15:05
Your turn. My turn. Okay, so how we wanted to go about this. First of all, operators are not a new thing for Kubernetes.
15:24
So we thought, okay, let's look for something that already exists. Even if it's not perfect, then yeah, maybe it can be improved. The most popular one, let's say, or most seen is the Elastic Cloud on Kubernetes operator.
15:43
But because of licenses and limitations is not necessarily a good thing, at least for our use case where we wanted something simple and without too many strings attached, that was not a good fit.
16:02
Next, I think one of the oldest and still maintained operator is the Zalando one. It has a good license. It has auto scaling, drains pods, adjust replicas. It's quite nice.
16:21
Works in production, at least I think for them. And we started working on this for the use case that Radu presented before. To get it started, so what we'll see later in the demo,
16:43
it wasn't such a big deal. We did a lot of hard coding and tuning. We commented some code that was not working for us and in the end it worked. But it wasn't exactly the architecture
17:04
that we were comfortable with. And then we noticed that Opster started an open search Kubernetes operator too. It's not as full feature as the Zalando,
17:21
but at least for our use case, it was good enough. I mean, yes, it doesn't have auto scaling yet, but can be added and also has a bit of a cleaner architecture. So we started again trying to do the same thing
17:48
with Opster. It's still a work in progress, but it's going quite well from my point of view at least. Okay, next demo.
18:03
In the interest of saving time, I prepared already a Kubernetes cluster with the Zalando operator installed and two nodes. I also have, let's see.
18:36
Okay, so this is the small Elasticsearch cluster,
18:41
a master node and the two data nodes. So let's look at what we have on the node.
19:13
Okay, we have a cluster that, let's see. So the cluster is green.
19:21
You can see the two data nodes here. There is an index called locks-01, which already has some data in it. There is a component template and template for that index.
19:41
So you can see here, we have locks-dash and scaling. This was done by the operator when it started. And there is a write alias that goes to this index so that we can rotate with the ISM.
20:09
Okay, so this is the scaling template, which is good for the two data nodes and the ISM policy.
20:25
Okay, as Radu said, we want to rotate on 10 gigabytes and if not, seven days. Okay, so at the moment, there is already 13% disk usage
20:53
on these two nodes. The goal is to add more data
21:00
and then the cluster should scale up once the disk threshold has been reached. Let's start it and talk a bit.
21:36
Okay, we should see immediately that disk usage
21:41
starts to go up and we should start the operator. We started from here, it's still development, so not exactly production ready.
22:01
We can see that I think it already reached the 15, oh, the threshold was at 50% that we set for it to go. So it already got to that. It noticed that it reached it and now it wants to add a new replica,
22:22
so a new node should follow soon. As soon as that node comes up, we should see the operator adjusting everything so that it suits the new setup better.
22:43
Let's see how things look here. Okay, so there is a data pod coming up.
23:07
Hopefully it should be up pretty soon. Usually hint testing was much faster. Okay, let's look a bit at the config file until this comes up.
23:24
So the operator has its own CRD, which is called the Elasticsearch dataset. You could set the volumes template, like for this, we want it 10 gigabytes.
23:43
You can set the disk usage percent for scaling up and down, mean and max replicas, so you don't get overbilled if something unexpected happens, and also some cool down for scaling up and down
24:05
so that you don't scale up abruptly and don't give enough time to rebalance for the cluster. Okay, like you can see here.
24:23
So let's look at what the operator did. Let's see. The scaling template was updated. Now it has three shards, and the ISM policy was updated too.
24:45
Also the right alias was changed. Let's see here, aliases.
25:03
So you can see that already we have the second and the third index. I think that's because there is another index, node coming up.
25:22
We have the indices, which have primary shards based on the number of nodes. And let's see how the ISM policy looks.
25:47
The ISM policy is at 20 gigs. I think because we had four shards. So I think we're quite good. Now we want to see what happens
26:01
when we remove some data, I guess. I'm putting too many logs already. So let's delete some things from here and see if the operator is removing any nodes.
26:26
We don't want to remove the last index because this one is connected to the right alias. So we will start with the biggest one, which is the first.
26:43
Let's see. Okay, let's see if the operator notices things. It should notice that there is not any need
27:04
for the extra nodes and just scale down the cluster to the two nodes, which is the minimum size of the dataset that we configured. Here.
27:53
I think we need to delete a little more, right?
28:07
Yeah, they're still not balanced. So maybe deleting some more indices would help.
28:34
Okay, so let's see.
28:45
Okay, so now things are going back to how it was in the beginning, scaling down, updating the scaling template and ISM, and should go like this up until those two nodes that remain.
29:05
Yes, that concludes the demo. Thank you for watching and for being with us. So if you have any questions or feedback, please do.
29:28
Thank you. So I noticed in the demo that there was index dot open distro. So does it mean that the Xalandos operator,
29:42
basically if you want to use it, you are stick with, I guess, the maximum would be Elasticsearch 7.10, right? You'll stick with the open source flavor. No, we actually tried very hard to use the open search.
30:02
Yeah, I mean- But it's by default it works with- Open distro basically is deprecated. You should upgrade to open search, right? Because open distro is no longer maintained actively. It's like you should upgrade, right?
30:20
Or basically what kind of, which version of Elasticsearch are we looking at right now? If I'm not wrong, it's open search 1.3 that we have there. I think there's like the index naming is, says open distro, but it's actually open search. So Xalandos operator was made to work with Elasticsearch,
30:42
but it also works for open search, right? So this one will work for both. Opster will work only with open search, I think, and Elastics obviously will work with Elasticsearch. But this one for the moment works with both, depending on how they'll diverge, yeah.
31:03
Any other questions? Go on. Well, if not, thank you very much. We'll be around.