Using Solr unconventionally to serve 26bn+ documents
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 56 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/67163 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
Berlin Buzzwords 202241 / 56
22
26
38
46
56
00:00
Decision theorySoftware testingPlanningMathematical analysisData managementVideo trackingIntegrated development environmentMusical ensemblePlanningProduct (business)Moment (mathematics)Software testingData conversionBitDecision theoryProjective planeXMLUMLLecture/ConferenceComputer animation
01:11
Virtual machineData structurePoint cloudRange (statistics)TwitterFluid staticsSet (mathematics)Source codeQuery languageClient (computing)Streaming mediaRevision controlWhiteboardPlug-in (computing)Data managementTask (computing)HexagonComputing platformSample (statistics)Computer hardwareDecision theoryComputer configurationScaling (geometry)Product (business)Variety (linguistics)Range (statistics)PermutationWhiteboardBuildingCore dumpDecision theoryData centerScaling (geometry)Branch (computer science)Software developerCASE <Informatik>Human migrationOnline helpData structureComputer hardwareMultiplication signShift operatorHexagonComputer configurationVirtual machineLogical constantInstance (computer science)TwitterDifferent (Kate Ryan album)Computer architectureRevision controlTask (computing)Data managementProduct (business)Software testingSystem administratorWindowPlug-in (computing)Source codeSampling (statistics)Subject indexingStreaming mediaClient (computing)Service (economics)Projective planeQuery languagePoint (geometry)NumberMassState of matterRight angleDiagramPlastikkarteDirection (geometry)Statistical hypothesis testingPlanningSample (statistics)Lecture/ConferenceComputer animation
06:41
Projective planeComputer animationLecture/Conference
07:05
Product (business)Scaling (geometry)MiniDiscSpacetimeQuery languageOperator (mathematics)Price indexFormal verificationRevision controlBroadcast programmingSubject indexingCore dumpComputer filePoint cloudMetric systemInformationNetwork topologyEvent horizonRead-only memoryDemonDemonPoint (geometry)Product (business)Subject indexingLevel (video gaming)Mobile appCore dumpRevision controlSystem administratorLocal ringSet (mathematics)Pairwise comparisonMereologyPlanningSynchronizationPredictabilityClient (computing)Cellular automatonPhysicalismInstance (computer science)1 (number)BitMusical ensemblePay televisionDemosceneLatent heatRight angleMathematicsConnectivity (graph theory)Query languageSemiconductor memoryEvent horizonData centerPlug-in (computing)Block (periodic table)Reading (process)BefehlsprozessorData recoveryCuboidState of matterBackupPoint cloudDirectory serviceRange (statistics)Slide ruleComputer fileOperator (mathematics)AreaCategory of beingThread (computing)Domain nameNachlauf <Strömungsmechanik>Cartesian coordinate systemComputer hardwareTouch typingWhiteboardView (database)Stack (abstract data type)Computer animation
15:02
State of matterDemonCore dumpMereologyAreaTelecommunicationCache (computing)Event horizonGroup actionGateway (telecommunications)Instance (computer science)Vertex (graph theory)TunisStatisticsMechatronicsScale (map)Block (periodic table)Mobile appWeightOperations researchComponent-based software engineeringJava appletLibrary (computing)Plug-in (computing)Software developerSynchronizationCodeSoftware testingVolumeRead-only memoryType theoryConfiguration spaceSet (mathematics)Different (Kate Ryan album)BefehlsprozessorSpeicherbereinigungSpreadsheetQuery languageClient (computing)Similarity (geometry)Vector potentialPressure volume diagramOperator (mathematics)Open setCuboidDifferent (Kate Ryan album)State of matterDemonResultantInstance (computer science)CodeSimulationSoftware testingType theoryMultiplication signMultiplicationRange (statistics)Line (geometry)Gateway (telecommunications)Digital electronicsOperator (mathematics)Replication (computing)Core dumpStructural loadSpreadsheetLibrary (computing)Query languageJava appletPlug-in (computing)System administratorConfiguration spaceCartesian coordinate systemVideo gameCASE <Informatik>Acoustic shadowEvent horizonProduct (business)Client (computing)Level (video gaming)Projective planeScaling (geometry)Right angleRevision controlFormal languageCondition numberBitService (economics)Real numberTopostheorieBackupAreaData managementChemical equationPoint (geometry)Context awarenessServer (computing)Virtual machineSoftware developerComputer animation
22:58
Musical ensembleComputer animationLecture/Conference
23:30
Operator (mathematics)BitScalabilitySlide rulePoint (geometry)Computer architectureReading (process)Lecture/Conference
24:33
Video gameScaling (geometry)Process (computing)Core dumpMeeting/InterviewLecture/Conference
25:25
Elasticity (physics)Data centerSimilarity (geometry)ResultantLecture/ConferenceMeeting/Interview
25:48
Instance (computer science)SynchronizationMultiplication signLecture/ConferenceMeeting/Interview
26:16
Level (video gaming)Scaling (geometry)MassReplication (computing)Data centerLecture/Conference
26:51
Core dumpLink (knot theory)Elasticity (physics)Open setMusical ensembleData centerReplication (computing)Context awarenessPoint (geometry)Lecture/ConferenceMeeting/Interview
27:28
QuicksortType theoryLecture/ConferenceMeeting/Interview
27:51
Computing platformMikroblogQuery languageInstance (computer science)Reading (process)Synchronization2 (number)Lecture/ConferenceMeeting/Interview
28:34
Open setSubject indexingSynchronizationLecture/ConferenceMeeting/Interview
29:12
Musical ensembleLecture/ConferenceJSONXMLUML
Transcript: English(auto-generated)
00:08
So yeah, I'm Richard, I'm a senior data infrastructure at Brownwatch, and yeah, this is what I'm talking to you about. So starting off with the agenda, I'm going to quickly go over how we use Solr at Brownwatch currently,
00:21
the handshake between a company decision and a new project, and then our planning and design around this project, and then what we ended up making, then all of the testing we've done throughout this project, and then ending with our findings and hopefully some Q&A. A bit about what Brownwatch is, so we're a world leader in social listening and tumor research.
00:45
We track billions of conversations every day, and this helps brands and agencies understand what people are saying about them online. I feel like I have to do a PSA after today's keynote that we're not like Cambridge Analytica. We're completely moral. I just want to put that out there.
01:02
But how we use Solr at Brownwatch, so I thought I'd do a quick raise of hands of who uses Solr in production at the moment. OK, I know those lot do, they work for me. But keep your hand up if you have, say, 10 million or more documents. OK, and then hundreds of millions, billions. OK, cool.
01:23
We've got a wide range of people. But to quickly recap, on the Solr card structure, for those who don't use it, there's quite a basic diagram, but you effectively have your zookeeper, which houses the core state of your cluster, which it collects from Solr nodes and collections.
01:42
And then on the right-hand side here, for every collection that's broken up into many, many shards, which is configurable by you. And then on the left-hand side, you have your Solr nodes, which are basically just instances of Solr running. And then these will house what are called replicas, which are essentially just a copy of the shards' data in that collection.
02:04
But at Brownwatch, we have two main datasets, what we call mentions and what we call quick search. So mentions, the documents come from many different data sources online. They're tailored to the client's query. And so we get a constant stream of updates and deletes with them. And our mentions clusters have anywhere between 4 and 13 billion.
02:24
And we typically run this on a cluster of 96 instances spread over 16 machines. And we currently have 27 in production. And then for quick search, the data comes from the Twitter Firehose. It's a lot more static. So the only thing that changes the data is the compliance that we do, because we're not at Cambridge Analytica.
02:43
And that will delete stuff. And then the document ranges from 2 billion to 26 billion. And this, again, is either 96 or 192 instances. And we've got four of those. But as for Solr, we use the same version across the board, which is 773.
03:03
Especially at this scale, it can be quite hard to upgrade, so you've got to re-index, etc. We use Puppet to manage everything, because we house everything in data centers. And we have our own plugins that are just specific to the branch use case. We've sold a Prometheus exporter for these lovely dashboards that I made.
03:21
And we have our own tooling for general management. So the reason why we do this is because it works for us. And this has been done for many, many years. And we've been able to keep up with the growth. So when I started five years ago, we only had six clusters. And now we've got over 30. And along the way, we've had to develop many in-house stuff, such as tools to migrate data from cluster to cluster,
03:46
as well as do heavy duty admin tasks. But it hasn't been all rosy. So scaling hasn't always been our best friend. And whilst we have tried to keep on top of it, there's still quite a lot of manual intervention involved.
04:04
And it's normally me that has to do this, but some new feature will come out in Solr and we'll get really excited. So I investigate it. But it's not until we put it to scale that it doesn't always work out for us as planned. So this is kind of where the handshake of two things come together.
04:23
But a backstory to this is in 2018, Brandwatch merged with Crimson Hexagon, which was kind of our biggest competitor. And up until that point, Brandwatch has always been in data centers, whereas Crimson Hexagon was always in AWS. And over the last couple of years, with the pandemic, the silicone shortage, and the canal blockage,
04:45
it put a massive delay on hardware turnaround for us. So what would be several weeks turned into many months. So this just wasn't scalable for us. So we made the company decision to prioritize AWS instead of data center, as this would give us quicker scaling options.
05:03
And this came about the same time that we were given a new project. So for our quick search, which had always been a 30-day window at either a 10% or 100% sample set, we were asked to make a 30-month window at a 10% sample set. And so with this, there were some initial forwards.
05:25
So we quickly got together and started thinking, well, are we bound to Solr? Can we use something else? But for us, we are because, like I mentioned, we have our own in-house plug-ins. And so the tech debt to not only make something for another service,
05:42
but also change our whole architecture would be too much. Can we do a direct copy? So effectively, like a lift and shift and just copy what we have already and put it in AWS. But for this, this would be significantly way more expensive. So that was not an option. But was there anything in AWS that could be used to our advantage
06:03
so that we didn't have to reinvent the wheel or save time? Was there anything we could do with the shard layout or the index sizes that we use that could help improve things for when we get on with this project? But prior to this, we had done tests of different permutations.
06:21
We were building out quick search and found the layout and the number of shards that we have as kind of the most optimal for us. And what kind of infrastructure management could we use? In the past, we had tried using Puppet in AWS, but it didn't really give us positive results. And so we didn't really want to do that.
06:42
But in 2019, I'm severely hunched in the front that I attended this talk, as did some of my other colleagues. And I can also see Houston, so we're like fangirling. But after we saw this talk and the success that they had, we definitely knew that we wanted to use this as inspiration for our project.
07:05
And so you might be asking, well, you're already putting solar on AWS, which is a new domain for you. Why complicate things by adding Kubernetes on top? But for us, we saw how much Kubernetes could help with just the manual intervention that we would no longer need to do.
07:21
And so this was a nice benefit for us. So it moves on to our planning and design. So of course, we had some goals, which we needed to serve the product requirement. We wanted to have a solar cluster running on Kubernetes and AWS. We wanted close or even better performance to what we were expecting from our clusters in the data center.
07:42
We wanted scaling to be easy. And we still wanted to support backup and restore for anything, disaster recovery, touch wood. And we didn't want it to cost the Earth because we still needed to get paid. But during this, we had a light bulb idea. So in the past, or up until this point, sorry,
08:02
our solar clusters would receive the same writes and reads to the clusters, and they'd be round-robinned. But we quickly identified that there was duplicate writes going on. So we decided to split this up. So you would have a dedicated write cluster, so to speak, and then we'd ship this data to S3 in a central area.
08:20
And this allows us to spin up multiple other solar clusters which can just be responsible for reads. We found this would be quite a nice idea because then we can spin up clusters as and when we go, and we just need to pull the data from S3. We don't need to re-sync everything up. So we kind of took this idea, and then we started creating really bad drawings.
08:42
And then we would, as a team, go through them and be like, what doesn't work? What does work? And then I upgraded and got an iPad. So then I made nicer drawings. We went through and see what worked, what didn't work. And we just kind of started building up on this idea. And we eventually got somewhere.
09:02
So I'm going to go through it bit by bit. But we started with the write cluster because that was going to be the easiest part. We had been building solar clusters for many years. So we had some spare hardware in the data center. So we just min-maxed what we had and just crammed it in just a couple of instances. And we just leave it alone because it doesn't have reads
09:21
so we don't have to worry about the unpredictability of client queries coming in and killing it. So that was quite easy. But then getting data into S3 wasn't so easy. So originally, I built a plug-in at the point of commit to push this data to S3. And this worked locally and on staging.
09:41
But then as we scaled up, we encountered lots of block threads and just a cascading set of problems. So we kind of scrapped this. And we kind of leaned back on the quick search product we only commit every couple of minutes, whereas mentions it a lot quicker. So what we've done instead is we take a snapshot periodically
10:04
and then we upload that data to S3. And that's worked for us so far. So this is kind of what we have so far, which is the write cluster. And we've got data going into S3. So now we needed to make the read cluster. So this is where the solar operator was our heavenly light.
10:25
So we deployed this on our Kubernetes cluster and we got it to spin up a solar cluster for us. And it kind of just worked for us. It was great. And then we just spun up our other apps that we typically have in a cluster stack. So the Prometheus exporter to give us better visibility
10:41
and then the other apps that we use, such as APIs and stuff, just so we can query the cluster and make sure everything's reachable. So this is all good. We've got a cluster. It's technically a read cluster. There's no data in it. So how did we get data from S3 into the cluster?
11:00
And this is where I introduce the first new component, which is called Argo. So with our version of solar that we were running, we couldn't ship data from S3 into the solar out of the box. So we built our own application called Argo, which is in Rust. And it's attached as a sidecar container.
11:20
And essentially, it periodically checks a specific path on S3. And once it's seen changes, it will download this locally to the solar pod, update the index.properties file with the new core, reload it, and we've won. But we've built Argo to be safe. So if any problems along the way,
11:40
we just bail out and carry on using the original core. But more on a deep dive. So you can see here we've got our collection and shard in S3, as well as on disk on solar. And Argo just constantly polls S3 to check if there's anything new.
12:00
And in the event of a new version coming in to S3, Argo will then get a set of all the segment files that have changed than the previous by getting the set of segments on local and just doing a comparison. And then it creates a new index, downloads these new segments,
12:22
and then hard-links the existing ones. And once the download and sync is complete, we then update the index.properties file to point to the new index. And if that was okay and successful and the replica's being used, we then just get rid of the old one because we don't need it anymore.
12:41
And by doing this, we've also kind of, for free, got what we call cloud repair. So in the event of a solar pod dropping and we lose all our data, Argo is aware of what should be there so it can recreate the directories for us and then re-download the data for us so we don't have to do anything. And it's been working pretty well.
13:02
Originally, we were just downloading files as is, and it wasn't that great. So we changed this to use byte ranges and we kind of forgot about the immutability, so we used that to our advantage. But it's been really efficient and fast for us, so I don't know if you can see it on the slides, but it barely scrapes any CPU
13:21
and only just overpeeks 50% of the memory's been allocated, which isn't even that much. And it downloads quite quickly and keeps our indexes up to date nice and well. And we also discovered recently that we should stop Argo prefixing the indexes with what solar prefixes them, so we now have a different name for them.
13:42
And so this is Argo. And so with that, we now have, oh, we also introduced the cleanup app, which just separates S3 from being cleaned up from Argo. So at this point, we had a fully working read cluster. We had it one way replicated, one to save costs whilst we were exploring and developing,
14:01
but also we kind of thought, well, S3 behind the scenes is replicated, so we're not too scared about data going missing. And whilst this served our purpose, we knew that there was a lot more to do. And so this is where I introduced another new component, which we call the Solar Health Demon.
14:22
So I don't know for those of you who do use solar clusters, but for us, we would have things that would collect data on the core admin API and the cluster state and then stitch it together and give us a detailed view. But if a solar node goes down, you can get core timeouts and then things become slower
14:41
and it just gets a pain in the ass. Whilst the premium physics board is really great, when it is the more complicated problems, metric might not always give you the right answer. And so this is where the Solar Health Demon comes in. So it's also in Rust because we have some Rust fanatics in our team. But it can give you a detailed breakdown of each node,
15:04
but also straightaway tell you what the status of the cluster is. The way this works is there's essentially two layers. You've got the judgment layer, which is essentially the cluster state, and then you have your collection layer, which is broken up by multiple monitors.
15:20
And the key efficiency here is that each monitor is isolated to one particular area of something it needs to monitor. But we start by having watches applied to Zookeeper on the most essential nodes on Zookeeper, which is live nodes in collections. And these monitors take this, send it to the cluster state,
15:40
but then they also create, with a map, many, many core admin monitors in the context of the live nodes. And so each one of these monitors then is pointed to a specific solar pod and it will collect the core admin API and then the state.json is very similar, but the state.json is under a collection. And then all of these monitors will collect this data
16:03
and they just constantly send this to the cluster state, which caches this. But in the event of something happening, so say if a solar pod drops offline, obviously the watcher will notify the monitor because it's changed. And at this point, the monitor then sends an event to the cluster state
16:20
where it does a recalculation of the cluster and its health and it will emit this on its API and we would get a ping on our phones saying that the cluster's unhealthy. But we've also gone that one step further where we've introduced what we call a circuit breaker. So we put in a Z node into Zookeeper, which our API services constantly check for.
16:42
And if the circuit breaker is present, it blocks any traffic coming into the cluster so we can just focus on fixing the cluster if it doesn't do it by itself. We know that we're not going to be swamped with unwanted requests coming through. And so at a high level, it's pretty simple, but you have your client that would run queries
17:01
and then that goes to our gateway and typically that would be round robining. But then this API instance has detected that there's a circuit breaker and so it sets its ready status to can't be used. So the API gateway is aware of this and then went forward on any traffic. And so this has made it really easy for us to see the state of cluster
17:25
because we found, especially when working with Kubernetes, it's not always easy to debug things. Multiple times we've had to just spin up a busy box just to check things. So this has been really nice for us. So you can see there that straightaway it's just telling you
17:41
what is wrong with certain nodes, et cetera. And it's just been a really nice feature for us. And if a cluster is sick and becomes healthy again, then the daemon removes the circuit breaker so the cluster can serve requests again. And so that's our solar health daemon. And then finally we have what we call the Hera manager.
18:02
So pre-context, this project was called Hera, which is why that's called that. But this is just a lightweight application to allow people to do schema and config deployments because it's not just our team that can do them. And if in the event that is unbalanced, we can run balance operations, which will do that.
18:22
And this is our end result. This is what read cluster looks like to us. And it serves us pretty well. And you might be ready to heckle me and have some questions. So a key one would be why aren't you using Java that much or the solar library?
18:44
One of the reasons is when we were originally sitting down and planning things and doing our research, we couldn't really see much material out there of people doing similar work and writing about it. So kind of limited what we could learn from. And so on the flip side, we had the freedom of being creative.
19:04
The plugin that we originally had to upload data just caused too many issues. And we're kind of on a deadline. So we decided to just scrap it and stick with what we know. Our team aren't actually Java developers either. We look at other fun languages such as Rust. And we have also found that it does prevent many race conditions
19:23
when dealing with asynchronous and threaded code, which we were doing a lot of in this case. But we have our cluster, and now we also need to test things. So we obviously didn't have anything in AWS before in our team. And so we had to see what would work.
19:42
Originally, there wasn't any like-for-like machine. And so we compiled a spreadsheet, got all the instances, and tried many different configurations. And then we did different types of tests, which would be a single user, incremental, and pure, throttling the cluster. And we used Locust.
20:01
And then we started collecting results. And they were really crap. And so we thought this would be a dead end. But some of the improvements that I've already mentioned we applied along the way, which gave us better results. And we found the Graviton chips, which made a significant difference as well. And this allowed us to filter out which would work, which wouldn't.
20:21
But we also know that this couldn't be a definitive answer because effectively we were doing a simulation. So we needed to do something else. So we applied what we call shadow loading. So we had, when we found an instance that we thought would work, we'd spin up a cluster to scale. And then we would shadow load any traffic going to a real cluster
20:43
and just point it to this one so we could see what it looked like in a real life scenario. And this, for us, was kind of like the final test. And with that, we then managed to find something that would actually work for us. So on the right there is an example of the product, one of the products we sell. And it runs on just 64 AWS instances,
21:03
instead of 192 that we typically use. And we're quite close to performance. For some reason this gives way slower than the actual real life, I do promise you that. I think it's due to whatever website I used last night. But what did we learn along the way?
21:21
The server operator does make running a cluster on Kubernetes easy, and we can't recommend it enough. The opportunity to shadow load real life traffic is really going to be what differentiates between something that's going to be feasible and something that's not. We made Argo work with solar replication, so we've now scaled up.
21:42
And this gives us much better performance, which we kind of expected anyway, but at the time we didn't really have the budget. The scaling from AWS does allow us to be more flexible, so we don't have to wait several months to get more boxes. We can just do a one line PR. And downloading in byte ranges is also a good improvement.
22:02
But also don't be afraid to think outside the box. I think it's fair to say we did that in this case, and it's paid off for us. And finally, what's planned next for us? So we want to upgrade solar, obviously, hopefully to version 9, because there's the incremental backup and restores
22:21
that could probably help us a lot. But also the S3 support that came in 8.10. And then just some other features that we're quite excited about, so open tracing makes diagnostics a bit easier. We might investigate Topo LVM, which allows you to run multiple solar instances on one given AWS instance.
22:41
And we need to really move the right cluster into AWS as well. And we're looking at open sourcing some of our applications, like the Solar Health Demon, if there is interest. And that's kind of it, really. We're hiring, and thank you for coming. And if you've got time, there will be some questions. Thank you.
23:06
Yes. Thank you, Richard. So, once again, we have the chance, or you have the chance, to ask some questions. But before lifting something up, like the hand.
23:22
Yes, I see. And then I come by so that the people online can hear everything. Yeah? Okay. Thank you very much for your awesome talk. On one slide, you presented the kind of architecture
23:40
that you put up on AWS on the read side. You kind of rushed over it, saying, okay, and then we put our read cluster on Kubernetes, and the more in-depth slide. But the point is, here's Solar and there's Zookeeper,
24:01
and it just works, and you rush to the next slide. Could you elaborate a bit more on how you put Solar and Zookeeper in play on AWS? Because that is what we at our company found the most interesting problem, to say it in a nice way, to get Solar and Zookeeper played together in a meaningful and scalable way.
24:26
Yeah. Are you using the Solar operator? No. I recommend using the Solar operator. Okay. Just does it all for you. Yeah, that made our life a lot easier. Okay, thank you.
24:44
So I have two questions. One, so I also evangelized the Solar operator, unsurprisingly. And I generally kind of talk about how it's easier to kind of scale to more nodes and have smaller nodes that aren't handling a ton of cores,
25:03
but it seems like you've actually scaled from more Solar processes on bare metal to less Solar processes in Kubernetes. That was very interesting. I just want to know how you found that, and if you're actually storing more data on each node now that you're on Kubernetes.
25:25
I believe we are storing more data on each node. And we don't have the data center on bare metal still outperforming, and we think that's because the more nodes there are, there's more shared pooling of resources.
25:41
But we've managed to scale down enough that we can get almost similar results. Okay, that's very interesting. Thank you for that. I have one other question. So since you're kind of pulling S3 data all the time, are you using persistent data on Kubernetes, or are you just letting the ephemeral data go away and sync it back whenever it comes back up?
26:04
I think, are we? We are, aren't we? Yeah, yeah, we are. Yeah, I thought we were. Okay. Yeah, we're using... I forgot what it's called now. But yes, we are. Okay, cool. Thank you so much.
26:23
Yeah, so congratulations on the talk and the massive scale. I'm interested in knowing how S3 comes into play when it comes to replication and how that compares with basic replication capabilities that Solr has. Or if you were able, guys, to take other possibilities apart from replication from Solr,
26:46
which was the roadmap which led you to S3 to replicate. Yeah, so there was the possibility of cross data center replication. I think that's what it's called. But then seeing that that was going to get deprecated,
27:00
we thought there's no point trying to use it. But this also, the setup we've used makes it more scalable. So say if in the context we want to upgrade Solr, we can just spin up another cluster with this data from S3 and then re-index that whilst we're still serving and then we can then do the switch.
27:20
So it just becomes, it's like decoupling for us. Okay, thanks for the talk. One question that I have is what sort of replica types you were using, whether you evaluated Solr's pool replicas instead of this way to move data across.
27:43
Yeah, so this was over a year ago, but I originally tried spinning up a collection with just pool-based replicas, but it didn't work because obviously it needs to have a leader. So we just use TLOG. In our mentions platform we use NRT because it needs to be quite real-time.
28:01
But yeah, TLOG's helped us be quite more efficient as well. Anyone else? Okay. Hello, hi. I'm just curious to know how frequently is the data updated,
28:21
so like queries per minute or something like that. And the second question is, so you separated out reads and writes. How frequently is the S3 data synced with the read instances of Solr? The sync one I think I can tell you. No, I can't. So the, great.
28:43
We commit every couple of minutes, but we upload maybe for each index maybe every 10, 15 minutes. And then the download only takes up to a minute, but that would be whenever the uploads finish,
29:02
so around every 10, 15 minutes as well. So thank you, Richard. It was great. So this is for him. Thank you.