Highly Available Search at Shopify
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 60 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/66633 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
Berlin Buzzwords 202332 / 60
5
12
20
23
33
34
35
46
49
00:00
Musical ensembleComputing platformHypermediaMereologyPresentation of a groupTouch typingDiagramMeeting/InterviewLecture/Conference
00:30
Context awarenessPhysical systemPressureData storage deviceComputing platformVolumeScalabilityCluster samplingVertex (graph theory)NumberGene clusterMultiplication signComputing platformLocal ringComputing platformProduct (business)MereologyMathematicsService (economics)1 (number)Elasticity (physics)Mathematical analysisCASE <Informatik>2 (number)Subject indexingScaling (geometry)Order (biology)Physical systemMaxima and minimaCloud computingTerm (mathematics)BlogBitContext awarenessCartesian coordinate systemBit rateMessage passingCentralizer and normalizerData storage deviceBus (computing)Data managementSpacetimeOpen sourcePressureComputer fileComputer animation
04:36
Level (video gaming)Event horizonAbstractionService (economics)Scaling (geometry)Category of beingDomain nameData storage deviceAreaService (economics)9 (number)WordEvent horizonPhysical systemExtension (kinesiology)Volume (thermodynamics)MereologyComputer animation
05:46
Distribution (mathematics)Database normalizationPhysical systemCloud computingMiniDiscVirtual machineAbstractionLevel (video gaming)Different (Kate Ryan album)Elasticity (physics)Database normalizationData managementAsynchronous Transfer ModeComputing platformSurjective functionRight angleDistribution (mathematics)Lecture/ConferenceComputer animation
07:31
Parameter (computer programming)Computing platformSoftware developerCartesian coordinate systemExtension (kinesiology)Data managementComplex (psychology)LogicGame controllerCASE <Informatik>AbstractionWorkloadKey (cryptography)AutomationObject (grammar)Scaling (geometry)Meeting/Interview
08:28
Distribution (mathematics)Total S.A.Cluster samplingCASE <Informatik>Game controllerMechanism designShift operatorDifferent (Kate Ryan album)Gene clusterCloud computingLastteilungTime zoneContext awarenessSound effectError messageAsynchronous Transfer ModeStructural loadComputer architectureSoftware maintenanceIdentity managementPoint cloudMathematicsCartesian coordinate systemSoftware developerBit rateWorkloadSlide ruleService (economics)Endliche ModelltheorieAdditionBuildingElasticity (physics)GoogolVirtual machineConstraint (mathematics)Computer animation
13:05
Event horizonVolumeCybersexPressureElasticity (physics)Service (economics)Structural loadEvent horizonVideoconferencingSlide rule2 (number)Denial-of-service attackCybersexVirtualizationData storage devicePhysicalismPattern languageScaling (geometry)Volume (thermodynamics)Order (biology)Flash memoryLecture/ConferenceComputer animation
14:58
VolumeEvent horizonCybersexTwitterMultiplication signNumberPressureLastteilungRight angleQuery languageProxy serverStructural loadCausalityNeuroinformatikLecture/Conference
15:43
Texture mappingSubject indexingVolume (thermodynamics)Right angleField (computer science)MappingSoftware developerMathematicsPrice indexReal-time operating systemExecution unitStructural loadResource allocationComputer animation
16:46
Data storage deviceServer (computing)Right angleStructural loadOrder (biology)Product (business)Speech synthesisData storage deviceLecture/Conference
17:20
Data storage deviceGame controllerPhysical systemPressureEvent horizonVolumeQuery languageScale (map)Level (video gaming)Computing platformData storage deviceGame controllerEvent horizonLevel (video gaming)Gene clusterMacro (computer science)Physical systemMathematicsYouTubeLastteilungPressureHeuristicMiniDiscDatabase normalizationElasticity (physics)Data managementStructural loadMultiplication signScaling (geometry)Time zoneQuery languageProxy serverNeuroinformatikVolume (thermodynamics)Thermal expansionVideoconferencingRight angleResultantCatastrophismFlash memoryReduction of orderDomain nameDemo (music)Computer animation
21:38
Vector spaceScaling (geometry)BuildingRevision controlAdditionLimit (category theory)BitVector spaceLimit (category theory)Speech synthesisScaling (geometry)CASE <Informatik>BuildingLatent heatStandard deviationLecture/Conference
22:30
Context awarenessPresentation of a groupLecture/Conference
23:03
LoginConfiguration spaceGame controllerPrice indexInsertion lossRevision controlCodeSoftware testingElasticity (physics)Time zoneMeeting/InterviewLecture/Conference
24:32
Gene clusterBitElasticity (physics)Data storage deviceBootstrap aggregatingCore dumpCartesian coordinate systemMaxima and minimaSoftware developerTerm (mathematics)Price indexSoftware testingDegree (graph theory)BefehlsprozessorPoint (geometry)Common Language InfrastructureLecture/ConferenceMeeting/Interview
26:13
Operator (mathematics)Elasticity (physics)Lecture/ConferenceMeeting/Interview
26:46
2 (number)Query languageState observerReal-time operating systemCASE <Informatik>Lecture/ConferenceMeeting/Interview
27:25
Source codeInheritance (object-oriented programming)Domain nameLecture/ConferenceMeeting/Interview
27:52
BitRight angleMetric system1 (number)WritingLecture/Conference
28:21
BitMoment (mathematics)NumberMetric systemDivisorRight angleMultiplication signScaling (geometry)CASE <Informatik>Term (mathematics)Real-time operating system1 (number)Server (computing)Slide ruleFuzzy logicElasticity (physics)Meeting/InterviewLecture/Conference
29:26
Database normalizationSubject indexingMultiplication sign2 (number)Basis <Mathematik>Gene clusterBit rateMusical ensembleLecture/Conference
30:05
SequelBitBlu-ray DiscRight angleGene clusterApproximationMeeting/InterviewLecture/Conference
30:43
GoogolLevel (video gaming)MathematicsSet (mathematics)1 (number)LoginSoftware developerDifferent (Kate Ryan album)Gene clusterNumberLecture/Conference
31:30
Query languageMultitier architectureCartesian coordinate systemDifferent (Kate Ryan album)Dependent and independent variablesGene clusterSoftware developerBefehlsprozessorScaling (geometry)Projective planeHacker (term)1 (number)MereologyMeeting/Interview
32:28
GoogolCartesian coordinate system1 (number)Service (economics)Core dumpScaling (geometry)State observerLecture/Conference
32:58
GoogolMobile appReplication (computing)Elasticity (physics)Observational studyDifferent (Kate Ryan album)Meeting/InterviewLecture/Conference
33:34
Auditory maskingCartesian coordinate systemMultitier architectureLevel (video gaming)Right angleElasticity (physics)Meeting/Interview
34:13
GoogolDependent and independent variablesAxiom of choiceComputing platformConnectivity (graph theory)Message passingBus (computing)Cloud computingDatabaseService (economics)CASE <Informatik>Lecture/Conference
35:03
Musical ensembleDiagram
Transcript: English(auto-generated)
00:08
Hello everybody, I'm Josef, I'm part of the search platform team at Shopify. My colleague Leila couldn't make it, but she worked with me on this presentation, so that's how you can get in touch with her. I'm around on social media and on LinkedIn.
00:23
And we'll talk about highly available search at Shopify. So just a quick agenda. We'll set the context on a few things, we'll talk about Shopify, what our team does, some of the systems we use, and then we'll talk about the problem space, and then how our team goes about solving it.
00:45
So yeah, let's talk a little bit about Shopify. What is Shopify? For those of you who aren't aware, Shopify is a cloud-based commerce platform. You can start and manage a business, you can create and customize a store, manage inventory, do payments, all that kind of stuff.
01:03
We have about 3 million merchants today, that sell products with us. They've sold over $700 billion in GMV, that's gross merchandise value. We have all sizes of businesses, my local coffee roaster is on Shopify,
01:22
also some big businesses that you may have heard of, Gymshark, FC Barcelona, they're also Shopify merchants, so we range in size and scale. One important distinction from other commerce platforms is that on Shopify, each merchant is distinct in their store, it's not a central marketplace.
01:46
And then let's set context on some of the tech. This is a search and streaming conference, so hopefully you all are familiar with Elasticsearch. But for those who aren't, Elasticsearch is a distributed search and analysis engine, it's built on top of Lucene,
02:04
full text search capabilities of Elastic are really good for our use case in commerce. It's very scalable, it's very fault tolerant, and it handles a lot of the hard parts of running distributed systems at scale, which is why we like it.
02:20
Then there's Kafka. Kafka was first written, I think, at LinkedIn, it's open source, it was donated to the Apache Foundation, it's a streaming service, it's Shopify's main messaging service, so we build our applications that need to communicate to each other on top of Kafka, so if data needs to get out of MySQL
02:42
and go somewhere else, it goes through Kafka. This really helps our use case in search because it also acts as our back pressure, it acts as our message bus, and all that stuff. A little bit about our team, so I'm part of the search platform team. Search is a very important part of commerce,
03:01
hopefully over the last day and a half, you've all kind of heard about that a lot. It allows buyers and merchants to kind of do what they need to do, a buyer can come, browse, fund what they need to do, and a merchant can manage their store, sometimes using search. Our team manages all of search clusters at Shopify.
03:22
We run Elastic search as a service at Shopify, so dev teams can come to, well, they don't really come to us, but they can ship a YAML file, magically they get Elastic search and they can get up and running without having to go and create clusters and having to know the details of what a cluster should be configured as, and all that stuff.
03:42
So we give them a fairly opinionated cluster and they can be up and running in no time. And for a sense of scale of our infrastructure, we have about two petabytes of data across the platform, stored on about 114 distinct search clusters. Our largest ones are about 120 nodes each
04:02
and the smaller ones are three nodes. That's kind of the minimum we recommend for teams to run as. In terms of throughput, our background indexing rate is around 90 documents per second. At peak times, we peak to about 400,000 documents per second.
04:21
So this is changes to documents, so a product changes, orders changes, or even a blog post changes. It'll go through the pipeline and that will update our Elastic search cluster. So that's kind of roughly the scale of the things that we work on. And so hopefully with that scale and the fact that search is such an important
04:43
and critical part of any store, and by extension Shopify, we have to make sure it's always available. And, well, always, that's a fuzzy word. We have to make sure it's highly available. So whatever your SLO may be, whether it's three nines, four nines,
05:03
whatever, you have to adhere to that. So for this talk, we decided to break up the... the domain failure or the failure domains into three categories. So you can have system failures of any kind that can impact your availability. There's large sales events.
05:21
We call them, I don't know, high-volume commerce events. We'll get into those. Those can affect your availability. And then rapid data growth. As you onboard more merchants or as merchants add more data, data is ever-growing. You're never going to shrink. So these are the three topics or three areas that we tackle
05:42
to make sure that we offer a highly available service. So system failures. Any kind of system failure, really. So whether it's the machine failure, disk failure or even a cloud provider failure, these are things that we have to plan for and mitigate.
06:01
And we handle these at different levels of abstraction. At the lowest level, as I mentioned, we run Elasticsearch. So Elasticsearch has a lot of redundancies built into it. And the first redundancy layer that we have is Elastic's own managing of distributed system.
06:21
This is a very, very simple example. A three-node cluster with three shards and two replicas. And it's distributed across three nodes. Everything's running, everything's great. What happens if a node fails? So as you can see, with that distribution, we haven't lost any data, but the shard three primary is gone,
06:41
which means you can't really write to that for those that haven't played with Elastic. Primary is where you receive writes in Elasticsearch. So shard three is kind of in degraded mode. Luckily, Elastic has built tooling there. Because we have a replica, we can rebuild the primary
07:01
and relocate the replica for shard one, and you're up and running, fully operational, while you tend to whatever happened to node one. This is our kind of first layer of defense against any failures. And luckily, this is a freebie. Thanks to Elastic, we didn't have to design this.
07:21
But on to things we did design. And I guess a little refresher on Kubernetes. Maybe most of you know it, but just a quick refresher. Kubernetes is a container orchestration platform. You can automate deployments, scaling and management of any containerized application or workload.
07:42
In our case, Elasticsearch is a stateful workload, which we use, we run on Kubernetes. One key thing about Kubernetes is it allows you to programmatically define controllers that can combine the YAML and your opinionated logic on how to run certain things.
08:02
So you can add another layer of abstraction on top of Kubernetes. For example, we have a Kubernetes object called an Elasticsearch. Developers will define it and we'll provision it for them based on some parameters that we've defined. Kubernetes is super complex. I'm not going to get into it.
08:21
But this is kind of the extent that we'll talk about it in this talk. So yeah, with that out of the way, we kind of design against VM node machine failures by running our workloads in Kubernetes, in GCP. For those familiar with GCP offerings, it's called GKE.
08:43
Google Kubernetes Engine, I think. So that's how we run our Elastic clusters. Like I mentioned, we run a custom controller. What that does is that we can define some constraints in Kubernetes. They're called taints, tolerations, where you can ensure that only one node per cluster ever lives on a VM.
09:06
So you won't have two nodes on a cluster on one VM. You control damage when a node or VM goes down. In addition, we use Elastic's Rack Awareness to distribute our nodes across availability zones.
09:21
So we ensure that the replica and master for a given shard don't live on the same VM, in the same availability node. So now we also protect against an AZ failure. If an AZ fails, you still have a replica or the primary for a shard and you can rebuild it. So you can kind of defend against that failure mode as well.
09:45
Additionally, it also effectively allows us to take down, let's say, a third of our cluster at any given time, if you're on a three availability zone deployment. For quick restarts, instead of restarting node by node, you can shut down a third of your cluster
10:01
and still have a fully available service. Customers, developers can reach the cluster with no impact to them. This has helped us greatly in upgrading and rolling out changes to Elasticsearch.
10:22
Yeah, so when I wrote this slide, I was thinking, oh, hypothetically, a region can fail on any cloud provider. I don't know if you all followed it, but last week, AWS had a huge failure, us-east-1, which is their biggest region. So if you're going to run your workload in the cloud, that's probably something that you have to protect against.
10:41
And so while Elastic itself improves our availability, we still need to be ready for a full regional evacuation or a regional failure. And so to protect against that, we're using Kafka here. We're leveraging Kafka to send our, to ingest data to clusters within the same jurisdiction
11:03
but in different regions. And I'll get into the jurisdiction in a second. And so what we do is the consumers for Elasticsearch live in the same cluster, kind of hand-wavy, but in the same Kubernetes cluster, let's say. They consume from two different regions. So for example, you can have, I don't know,
11:24
two regions, one in Germany, one in Netherlands, and you can consume from both. And that way, you ensure that you have two Elasticsearch clusters with identical data. Should a region fail, you can failover with no impact. Gives you an active-active load balancing setup.
11:42
Small caveat here is that our resilience model for MySQL is different. So you might say, well, you have the same MySQL on both sides, so you have the same data. They're not the same MySQLs. They have different shop data on them. That resiliency model is slightly different. But for Elasticsearch purposes, we consume the same data, so we end up with active-active.
12:04
And so also to do that, we've had to build mechanisms for load balancing. We have endpoint control where we can gradually shift traffic or we can fully shift traffic in case of a disaster or in case of maintenance. So what that gives us is kind of a distributed cluster architecture
12:22
across the globe. I mentioned jurisdictions. This is for data residency purposes, compliance purposes. So we have some clusters in Europe, we have some in North America, we have some in Asia-Pacific region. And a great side effect of this gradual traffic shift
12:43
is also we can test things out on one cluster because it's active-active, we can send 10% of traffic to one cluster, monitor the load, monitor error rates, and then slowly shift traffic over to 100% without worrying about impacting customers or the application.
13:06
On to compute pressure. So we talked about failures, great. What happens often with Elastic, and I think somebody was talking about it yesterday in the CalDB talk, that's right, where Elastic can have issues in compute pressure.
13:20
Your infrastructure can be under-provisioned or any high-load event can cause problems for the availability of your service. The biggest one that we see at Shopify, and I guess it's common across commerce, is high-volume commerce events. We deal with two of them. One is flash sales, which is really short-lived sales
13:42
that happen on the order of minutes to hours, which means a ton of people go to your physical store or virtual store and you're effectively doing a DDoS on yourself, a distributed denial of service on your own service. This has become a common pattern in commerce, so we have to protect against that, build against that.
14:02
And the other one that's big in North America, but I think it's slowly catching on everywhere anyways, is Black Friday Cyber Monday, which is a weekend at the end of November. We refer to it BFCM for short, you might see that in a couple of the slides. That's a whole weekend of non-stop sales. People go crazy, you've seen the videos
14:21
of people rushing Walmart in the US. Same thing happens online, so we have to protect against that. And for a sense of scale, we get about 44,000 requests per second, that's 2.6 million requests a minute on Black Friday weekend, and then 57,000 writes per second.
14:41
So we go up, that's about double, I think, our regular daily amount. And you can't really double your whole infrastructure, so what do you do to protect against that? Elasticsearch itself is not easily auto-scaled up, it's not elastic in that sense.
15:02
So what we do is we work very closely with our data science team, we project some numbers, we work on trends and we provision the cluster ahead of time to make sure that we don't get in trouble when it comes to the Elasticsearch cluster itself. But of course, when you have queries, they go through a load balancer, they go through a proxy.
15:22
There we still leverage Kubernetes, we use the horizontal pod auto-scaling feature in Kubernetes, where the proxies will scale up as demand hits us. And so far we've been able to handle it very well. But of course, we have other problems.
15:43
Writes can also cause compute pressure in Elasticsearch. So developers will change indices, they will change mappings of fields and you will have to re-index a given index. And that's a huge influx of writes because you have to go through your entire corpus
16:01
and resend it over to Elasticsearch. So at peak, we have about 400,000 writes per second, I forgot to put the units, but it's 400,000 writes per second versus in BFCM, which is supposed to be high volume, it's about 10x higher than BFCM when we do a big re-index.
16:20
So what do we do to handle this? Typically our re-indexed consumers, so we talked about Kafka, we have Kafka topics and we have real-time consumers and re-indexed consumers. The re-indexed consumers are able to handle the load, but if ever they, for whatever reason, shard reallocation or some internals of Elasticsearch, we're able to throttle.
16:44
We have some load shedding, we can throttle the re-indexed consumers. This ensures that real-time writes are prioritized and updates to products or orders make it into Elasticsearch immediately while on the backlog, we slowly work through the re-index of the documents.
17:02
This is another way that we protect our merchants and our servers against high load. And speaking of re-index, storage growth. So as storage grows, we have to protect against that.
17:21
Elasticsearch clusters have a funny way of dealing with disk full, not funny, it protects the cluster, but once you reach 85%, Elastic will stop accepting writes. Once you hit 90%, more catastrophic things happen. Shards will start to get evicted, data will start to get deleted,
17:41
your cluster will go red, problems. This can happen at any time with a re-index or if you have an event like Black Friday. Luckily, we have our custom controller that helps with that. Of course, you could pre-provision storage.
18:03
Pre-provision, put a whole, I don't know, 100 terabytes of storage, you put data in there, but then when your big event is over, then you have to scale it down manually or you have to live with paying that extra bill for the extra storage. That's not really what you want to do.
18:20
That's a lot of toil for your engineers, that's a lot of work. So, what we have, we've made our storage scalable to adapt to the changes of requirements. Our custom controller is able to use some heuristics that we've defined and it'll query the API of the Elasticsearch cluster it's managing
18:42
and when it reaches a certain heuristic, it's able to scale up using the Kubernetes Volume Expansion API, which became available not too long ago, I think as of 1.23 or 1.24, something like that. And so, as a result of this, the system is resilient to growth,
19:02
to data storage growth, so you can throw a lot of data into it. Once the controller sees that things are growing, it'll do a volume expansion. The important thing here is that the volume expansion is fully online, so you don't take your cluster down. You call into a GCP API, you grow your underlying storage
19:21
and Elastic remains up. At the same time, we have volume, I don't know, shrinking? Whatever, reduction? That one is less online, but also saves in storage because one day you were experimenting with something and you put, I don't know, 10 terabytes of data, you realize, oh, you don't need it, you duplicated some data,
19:41
you delete it, and then the controller comes around, looks and says, oh, I can save X amount of terabyte and you shrink it back down. This was a super high-level overview of the controller. I know lots and lots of people are interested in it. But my colleague Leila did a great talk at KubeCon Amsterdam this year, so if you want to learn more, I encourage you to watch that video.
20:02
I think it's on YouTube. No, I know it's on YouTube, yeah, it is on YouTube. It's a great talk, it goes into all the details. If you want to see, there's even a demo, I think. So, yeah, that's a great talk. And, yeah, that's how we protect against storage growth. So just to sum up,
20:23
we talked about kind of three failure domains, I guess. The first one being system failures. So to address that, at the Elasticsearch level, we have the built-in redundancies of Elasticsearch. Kubernetes allows us to distribute the load across disparate nodes,
20:43
across availability zones to protect against that, and then at an even more macro level to protect against regional failures, we have an active-active setup using Kafka and a load balancer. On the compute pressure side, we deal with BFCM and flash sales,
21:01
which put a lot of compute pressure on our clusters. To deal with that, we can handle queries with load balancing at the proxy levels. And we have load shedding when there's too much writes happening. By scaling down our consumers for reindex. And then finally, for storage growth, which is, it was kind of a tricky thing.
21:24
It occasionally paged the team to go and fiddle with the cluster. We implemented it in the controller. The controller allows us to scale up, scale down on demand with no human involvement, so this is tackled that way. And a little bit about what's next, because why not?
21:42
So we've all heard about vector search lots and lots. I hope you went to some of those talks, they were super interesting. That's the next challenge for us. Specifically, how can we leverage Elasticsearch and the Lucene KNN features underneath to do vector search and whether the performance is up to our standards
22:01
and it meets our specific use case. And then speaking of scale, our data is growing and we are probably going to hit some of the limits in Elasticsearch. For example, I think there's a 1024 limit on shards in Elasticsearch. We're not too far off from there, so that's the next thing we're building
22:22
is how do you deal with those kind of limitations as you scale out even further. That's it. Thank you very much. Thank you, Oster, for the presentation.
22:40
Do we have any questions? I was curious about the AZ-Aware shard assignment. How did you get that? I thought ES didn't have that built in, or did it?
23:02
I think there's a Rack-Aware feature within Elasticsearch, so for us, a rack is an availability zone. But that's done in the controller, so the controller distributes it. So when you provision the Elasticsearch, the Elastic node comes up
23:20
and then for each node, the controller sets what rack it's on? The cluster comes up and we... You're testing me on how much I remember from the controller code. I think we label it in a way that then gets passed on to Elasticsearch to know that node X is in rack Y, for example.
23:41
And your ILM policies also have to be aware because your ILM policies are creating new indexes, which means your controller should be, before data is written to them, should... So basically there is an ordering problem too then, which is your ILM should go first, then your controller goes and updates the config
24:02
and then ES attached... We don't use ILM in the way that you typically think because our data is long-lived, we don't age out any data because it's all search and data, it's not logs per se. Oh, I see. OK. So we manage our indices differently, we version the indices manually
24:21
and the consumers take care of the versioning. I see. OK. But you're right, if we were running ILM we would have to... At the beginning you said, like, you... So your team is providing Elasticsearch clusters to other teams
24:40
and you're like fairly opinionated about what you provide. Can you give a bit more details about those opinions? Sure. So what that means is we have an internal tool that... It's a CLI that developers will run to create some test infrastructure, well, I guess, initial bootstrap, initial infrastructure. So when they create an Elasticsearch YAML,
25:02
we provision that with three nodes. I forget how many replicas, I think two replicas minimum per shard. They can modify that, but that's kind of the baseline starting point. And we provision some basic, basic storage for them. I think we start at a gig per node, something like that.
25:20
And I forget the CPU. But that's kind of what gets passed on to Kubernetes. And then as the teams get larger, they can tune that as they wish. OK, so the opinion is mostly on the resources side of things, but in terms of what they can do, in terms of going crazy with queries, etcetera, that's up to them, basically. Is that fair, or...?
25:41
Yes, to a degree. We are closer to some applications, like the Shopify Core application, than others. Other applications it's fully self-serve. For Shopify Core and others, we have our own internal gem that kind of wraps the Ruby gem, we're a Rails shop, that wraps the Elasticsearch API.
26:00
So we are opinionated in how they would create their indices, that kind of thing as well. OK. Yeah, I have a question about that, but maybe it's fairer to leave other people to ask more questions. Do you use ECK, the Elastic Operator for Kubernetes?
26:21
We do not. Our operator was developed before ECK, and I think early, early on when Kubernetes operators were brand new. If a team wanted to do that today, I would suggest using ECK for sure. And our operator is hooked closely into the rest of the operator ecosystem at Shopify.
26:40
So it's kind of, again, opinionated in that way. OK, thank you. And if I may ask a second question. If I understood correctly, you talked about throttling the re-indexes. How do you do that to leave resources to online queries? So how do you manage to throttle re-index?
27:03
So we have a kind of observability on real-time delay. So we measure real-time delay. I think our metric is a few seconds, maybe five-second delay. Above that, alerts go off, red lights start spinning, loud sirens and all that. At the worst case, we can throttle it down to zero,
27:21
the re-indexed consumers. OK, I see. So you don't use internal re-index IP from Elastic, you re-index from external sources? That's right. We consume from Kafka, and we have our own very, very lightweight consumers written in Rails, or Ruby, it's not Rails, and it really deals with our domain.
27:43
So it's not a general purpose. For example, it's not Logstash. It's super, super lightweight, and we can easily scale it up and down with Kubernetes. Thank you. Any other questions?
28:05
So you were talking about writes. I was curious to hear a little bit more about other metrics, performance metrics, such as how it affects latency. Anything else that you've experienced besides the writes?
28:21
So for us, the two main ones are query latency, which we measure with P99, P90, P50, and then it's real-time delays. Those are the two main metrics that we care about, other than actual server uptime. So how fast the storefront and the merchants can query, and then how fresh the writes are, basically.
28:44
I can't think of any other metrics. So can I ask a little bit about scale, number of documents, and what kind of latencies you reach? At the best case, we're under a second from MySQL to Elasticsearch.
29:00
There's some fuzz factor in there, because by the time stuff gets out of MySQL, but from the moment they reach Kafka to the moment they're written into Elasticsearch, it's about under a second, and you still have to allow for that, I think, about a second or half a second it takes for Elastic itself to re-index it and put it into shards and stuff.
29:21
And in terms of the scale of the documents, I can go back to the slide with the numbers. I think it's about 40-some thousand documents per second that we write. So that's our... Oh, never mind, 90,000. That's our background indexing rate on a day-to-day basis that go into our clusters.
29:41
Okay, thanks. Any more questions? I think we still have plenty of time for questions. I think there's one question in the back.
30:05
Hi, thanks for your talk. One question I had was, so you talked about clusters consuming data from two regions, right? And then I think you also mentioned about the MySQLs being different. How are they different?
30:22
If you could elaborate a bit on that. I'm not on the MySQL team, so I'll try to do my best approximation. But basically, our MySQL is sharded. So the MySQL in region A will have, you know, shops one to 1,000. Region B will have 1,000 to 2,000, for example. And each one will then write to its Kafka.
30:42
And where we consume is we consume from both to capture both change logs or change sets. Okay. Does that answer your question? Yeah. Okay. Thanks. Any more?
31:01
Yeah, so you have a pretty large number of Elasticsearch clusters and nodes. And your developers are pretty flexible in creating new ones. Do you support or do you guide them through maybe different levels of... Or, yeah, different SLA levels they might want to have. Because I'm just starting, I might not need all this resiliency at all.
31:24
And it costs money, so maybe there's some trade-offs where the developers can decide on that. So we have internally different tiers of applications. So the highest tiers have the best, most responsive SLAs and SLOs. And then the lowest tier is your developers hacked a throwaway project.
31:44
So they'll be fine with the smallest cluster, three nodes. And don't forget, these are virtual clusters because they live in Kubernetes. So we actually co-locate a lot of clusters in one large Kubernetes cluster. There, we kind of benefit from the economy of scale.
32:01
For our very, very large ones, they're single-tenant for those reasons. But for the most part, they're kind of self-serve anyway. Once developers, you know, if you've developed a hackday project, then it's great, but you're hitting it with a lot of queries. The most we do is we give them some guidance, maybe increase your nodes, maybe increase your CPU.
32:23
There's no alerting that goes off for us at all. And teams are actually responsible for their own services at Shopify. So if you run, I don't know, my amazing application that's backed by search, we provision dashboards for them also.
32:41
So they have, magically, observability dashboards for their application. It's on them to then create alerts based on that and then act on that. So all this, everything we talked about about autoscaling and all that is there. The only ones we get alerted for are the very, very few core applications at Shopify.
33:00
The rest is all the app teams. In this slide, you showed Active Active between different regions, and you have also replication inside each region.
33:24
So you have total four copies of the data. Yeah. Is it needed? Can you avoid it? Well, the replication within Elastic is needed for the cluster itself to function, because you need a primary and a replica.
33:40
If you lose your primary, you need the replica to rebuild it. So that's a baseline Elastic itself. At this level, you need it to protect against a regional failure. Again, we only do this for one or two applications. Back to the gentleman's question there. We don't offer this for other tiers of applications. That's a hobby or a small thing.
34:02
You're right. It's a trade-off. Do you spend money to get the availability, or do you risk it and accept a lower availability? And what about the availability of MySQL, like Kafka, have built-in replicas and so on? Yes, so each... Actually, it's funny.
34:20
Each of these components has their own team. So there's a streaming platform team and a database platform team, and they're responsible for making sure that their service is available so we interface with them like an external party would with a cloud provider. So they're always available. In their case, they're Shopify's main message bus.
34:41
They have no choice. They have to be available. Same goes with MySQL. Thanks. Do we have any questions? I'll just quickly check online. Yeah, no questions online. Thank you very much for the presentation,
35:01
and thank you for listening as well.