Cross Data Center Replication in Solr - A new approach
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 60 | |
Autor | ||
Mitwirkende | ||
Lizenz | CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/66662 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
Berlin Buzzwords 20233 / 60
5
12
20
23
33
34
35
46
49
00:00
Formation <Mathematik>WellenpaketAutomatische HandlungsplanungCoxeter-GruppeWorkstation <Musikinstrument>ZentralisatorRechter WinkelDiagrammVorlesung/Konferenz
00:33
Virtuelle MaschineMereologieDatenreplikationCoxeter-GruppeMultiplikationsoperatorAnalytische FortsetzungRechenzentrumQuick-SortRechter WinkelVorlesung/Konferenz
01:42
Architektur <Informatik>BitFunktionalBimodulProjektive EbeneComputerarchitekturZusammenhängender GraphVorlesung/Konferenz
02:29
SoftwareentwicklerZusammenhängender GraphVektorraumCASE <Informatik>Quick-SortVorlesung/Konferenz
03:02
Physikalisches SystemRechter WinkelOrtsoperatorZentrische StreckungRechenzentrumBimodulWiederherstellung <Informatik>MereologieLeistung <Physik>DatenreplikationDienst <Informatik>BitVorlesung/Konferenz
03:49
RechenzentrumAbfrageOrdnungsreduktionZentrische StreckungBimodulDatenreplikationVersionsverwaltungBitVorlesung/Konferenz
04:32
BandmatrixVersionsverwaltungOrtsoperatorRechenzentrumPhysikalisches SystemFormation <Mathematik>Rechter WinkelCASE <Informatik>Vorlesung/Konferenz
05:38
CASE <Informatik>Physikalisches SystemPunktRechter WinkelQuick-SortBimodulMathematikVorlesung/Konferenz
06:40
RechenzentrumMechanismus-Design-TheorieDatenreplikationProdukt <Mathematik>BimodulRechter WinkelVorlesung/KonferenzBesprechung/Interview
07:17
AbstraktionsebeneRechenzentrumArchitektur <Informatik>DatenreplikationComputerarchitekturBlockdiagrammQuaderMereologieQuick-SortVorlesung/KonferenzComputeranimation
07:48
AbstraktionsebeneDatenreplikationArchitektur <Informatik>AbstraktionsebeneVersionsverwaltungMereologieClientExogene VariableWarteschlangeProzess <Informatik>MultiplikationsoperatorPhysikalisches SystemPivot-OperationAuswahlaxiomComputeranimationVorlesung/KonferenzBesprechung/Interview
08:47
AbstraktionsebenePhysikalisches SystemImplementierungWarteschlangeRechenzentrumBimodulWort <Informatik>Exogene VariableMereologieVorlesung/Konferenz
09:44
RechenzentrumWarteschlangeDatenreplikationDiagrammMathematische LogikPlug inPhysikalisches SystemVorlesung/Konferenz
10:28
Physikalisches SystemCoprozessorMathematische LogikPlug inApp <Programm>Vorlesung/Konferenz
11:02
RechenzentrumMultiplikationVersionsverwaltungFormation <Mathematik>DifferenteUmwandlungsenthalpieBimodulDatenreplikationComputeranimationVorlesung/Konferenz
11:50
MAPSoftwarewartungProgrammbibliothekElektronische PublikationSpeicherabzugDifferenteBitInformationKonfigurationsraumAutomatische IndexierungOrtsoperatorApp <Programm>Mini-DiscCodeMereologieDefaultCluster <Rechnernetz>
13:34
Kategorie <Mathematik>InformationBimodulKonfigurationsraumCASE <Informatik>MereologieElektronische PublikationComputerarchitekturMathematikInstantiierungSchnittmengeCoprozessorDatenreplikationProgrammbibliothekIdeal <Mathematik>PunktDifferenteUmwandlungsenthalpieVorlesung/Konferenz
15:40
Automatische IndexierungMetadatenCoprozessorMultiplikationsoperatorZeitstempelInformationRechter WinkelSpezifisches VolumenRechenzentrumWarteschlangePhysikalisches System
17:11
WarteschlangeAggregatzustandAutomatische IndexierungOrtsoperatorQuick-SortPhysikalisches SystemCASE <Informatik>RechenzentrumStellenringVorlesung/Konferenz
18:19
Automatische IndexierungExogene VariableCASE <Informatik>RichtungFahne <Mathematik>SynchronisierungDatenreplikationVorlesung/Konferenz
19:20
Automatische IndexierungDatenreplikationMultiplikationsoperatorZahlenbereichPhysikalisches SystemMereologieFehlermeldungWarteschlangeFigurierte ZahlHilfesystemStapeldateiCASE <Informatik>Fahne <Mathematik>Wiederkehrender ZustandRechter WinkelEinfache GenauigkeitBildschirmfensterComputerarchitekturHeegaard-ZerlegungMomentenproblemProdukt <Mathematik>RechenzentrumTelekommunikationMathematische LogikDiagrammQuaderSoftwaretestProzess <Informatik>LoginGebäude <Mathematik>GeheimnisprinzipWort <Informatik>App <Programm>ExistenzaussageTropfenMessage-Passing
26:34
WarteschlangeAuswahlaxiomDefaultMultiplikationTermSchreiben <Datenverarbeitung>CASE <Informatik>Einfache GenauigkeitMehrrechnersystemVorlesung/Konferenz
27:47
Automatische IndexierungMathematikDifferenteTaskProzess <Informatik>MultiplikationsoperatorEchtzeitsystemUmwandlungsenthalpieKanalkapazitätProgrammfehlerQuick-SortRechter WinkelRechenzentrumRichtung
30:09
Cluster <Rechnernetz>SpeicherabzugSystemverwaltungPunktBildverstehenMereologieInformationMultiplikationsoperatorVorlesung/Konferenz
31:22
SystemverwaltungSicherungskopieRechenzentrumSynchronisierungWarteschlangeVorlesung/Konferenz
31:58
AbfragePunktPunktwolkeATMRechenschieberOrdnung <Mathematik>ComputeranimationVorlesung/Konferenz
32:43
AbfrageMereologieEindringerkennungHilfesystemDatenreplikationCASE <Informatik>Zentrische StreckungSystemverwaltungMetrisches SystemCachingDatenfeldMultiplikationsoperatorBitCodeRechter WinkelWeb-SeiteUmsetzung <Informatik>SynchronisierungMAPWiderspruchsfreiheitAbfrageCluster <Rechnernetz>Inverser LimesProgrammfehlerMatchingClientOpen SourceExogene VariableVersionsverwaltungSpeicherabzug
36:41
DokumentenserverCodeQuick-SortProjektive EbeneVorlesung/Konferenz
37:18
Quick-SortSchnittmengeAbfrageWarteschlangeAutomatische IndexierungOrdinalzahlRechter WinkelCASE <Informatik>Wurm <Informatik>EinsMultiplikationsoperatorSchreiben <Datenverarbeitung>Lesen <Datenverarbeitung>Vorlesung/Konferenz
39:03
Prozess <Informatik>Metrisches SystemMereologieQuick-SortMatrizenrechnungArithmetische FolgeVerkehrsinformationSpiegelung <Mathematik>Rechter WinkelVorlesung/Konferenz
39:58
p-BlockDemo <Programm>Cluster <Rechnernetz>BitRechter WinkelSkriptspracheDokumentenserverVorlesung/Konferenz
40:43
Formation <Mathematik>Vorlesung/KonferenzDiagramm
Transkript: Englisch(automatisch erzeugt)
00:08
Well, thank you, everyone. It's so good to be back in Berlin after four years. And I'd like to start with a short story. I arrived in Berlin on Friday afternoon. And right when I arrived, the plan
00:21
was to head to Hamburg for a short weekend trip. So I boarded a train at Berlin Central Station. And 10 minutes into the journey, my backpack got stolen. My backpack with my MacBook, my iPad, everything with my presentation on it. So while that got stolen, the good part is everything was backed up. Everything was copied, replicated onto another machine
00:42
on my phone. So I could recover that and use my colleague's laptop, my MacBook, and continue with my presentation. I certainly did lose a bunch of my notes. But we'll see how it goes from now. That ties into that was just my machine that got stolen. But what about if you were running a sort of cluster
01:02
and hosting a bunch of data for a lot of people, and you lose all of that data for whatever reason? For times like those, it's important to be ready. And cross data center replication allows you to really be ready for times like just what I encountered a few days ago.
01:21
And a shout out to, well, Mark Miller is not here. And he helped me co-author this talk. And so, yes, it is a co-authored talk. I'm the only one presenting. My name is Anshu Gupta. I work and I lead the engineering team for Solar at Apple. And yeah, let's take it from here.
01:41
So the agenda for this talk, I'm going to be talking about Y cross TC. I kind of highlighted a little bit as I started my talk. But I'm going to talk a little bit more about it. I'm going to talk about the history of this, not the module but the functionality itself. If you've been someone who's been associated with Solar
02:02
or have tracked the project for a few years, you know that this is not something new functionality-wise. So I'm going to kind of talk a little bit about that, follow it up with the architecture, the setup, and what does it require, how can you get started with it. And then I'm going to deep dive into a few of the features and the components of the solution of the new cross TC
02:24
module in Solar. And I'll wrap it up with future opportunities for anyone who might want to kind of participate in the development of this feature. And we'll wrap it up with questions and answers. So budget Solar is a critical infrastructure component
02:41
for most people who are doing anything data in today's day and age. It used to be something that was used for purely full-text search, not the case anymore. And we've seen Solar used for things like spatial search. And we've seen Solar used for things like vector space search now. And with that kind of a footprint, it only makes it more important for you
03:03
to be ready, for you to be in a position of being more resilient. So to have a system that's more resilient is ever more important. When anyone mentions a cross data center replication
03:21
kind of thing, the one thing that comes to mind is disaster recovery. One thing I'd like to bring to notice is it's not just about disaster recovery when it's about cross data center replication. What you also get with a module like this is the power to scale and the power to reduce latency. To highlight a little bit more, let's assume that you have a data center somewhere in the US
03:44
and you want to expand your offering of your service, whatever you have, to another part of the world. If that's the situation that you're in, what you really want to do is you want to replicate all of your data so that your queries could now come to another data center sitting somewhere else in the world, maybe Germany,
04:02
maybe Singapore, Japan. And what this module will allow you to do is not just be prepared and be resilient, but also allow you to scale and reduce query latencies for the end users.
04:22
So moving on to a little bit of history of where we're coming from and where we're going now. The first version of cross data center replication, which was called CDCR, and for the sake of separating the new version from the old one,
04:40
I've been calling the new one XTC, and so you'll see XTC written in more places than CDCR. CDCR was introduced as a solution to solve this problem back in Solar 6. And the idea there was to go ahead and build something that would make Solar more resilient, allow you to copy data over onto another data center,
05:02
generally approaching a solution for two data centers. Again, the reason for that was what that solution tried to do was to be self-sustaining. So Solar is built for certain use cases, right? It's certainly not a messaging system. It's not a queuing system. But the solution there tried to not use any third party system,
05:23
not rely on anything else other than Solar, and to treat Solar as the only solution that would accomplish everything that you needed. So once that solution was built, a few people tried to use it, ended up in a position where it was really hard to use as soon as your cases got complicated, your data grew, your throughput requirements changed,
05:43
that system was almost unusable. With that, there's a lot of thought put into is there a way to fix it? Can we kind of change it to kind of make it work? After a lot of thought, there was a realization there was a fundamental problem in that approach. And that problem was people were, the community was trying to solve all problems.
06:02
It was like we were roaming around with a hammer, trying to find a nail, and Solar was our hammer, right? So trying to find, trying to solve every problem that there was using Solar. So because it became unmaintainable at that point, the community decided to kind of get rid of this module.
06:20
So Solar 8 is when this module was removed. A lot of people wanted something like this, specifically because Solar 9, the release of Solar 9 meant it was not just deprecated, it was completely removed from Solar. And as I've been talking about the growing footprint for Solar,
06:42
it only makes it more obvious and more required for you to have some kind of a supporting mechanism. And so the need for cross data center replication was even more with Solar 9. So we worked on a cross-DC module, something that kind of wraps in all of the knowledge that we've built over the years,
07:01
the community has built over the years, and what we ended up with is almost two years of effort and a module that we use in production right now. It's not released yet, so yeah, it's something that we just use internally.
07:20
So moving on to the architecture, if you look at the architecture, it has a few block diagrams. And if you've attended one of my previous talks about the idea of a new cross-DC solution, it's very similar to what we started off with. What these boxes kind of highlight are, there's an isolation. There's an isolation between Solar itself in yellow boxes
07:43
and all of the white box in the middle. The thing in the middle is what Solar was trying to accomplish as part of its previous release. With this new version, what we're trying to do is to isolate that and allow Solar to be completely isolated
08:01
and abstract out the queuing part from this entire solution. So when a request comes in now, the client sends in a request, goes into Solar, Solar just has the responsibility of sending this request somewhere. That somewhere is basically a messaging queue. And while we started this entire process of designing this and building this
08:21
with the idea of this being a very extendable system that would fit into every queuing system that there is, one thing that we realized very soon is most people end up using Kafka. And it was a lot of time that we were spending trying to make it extendable, trying to solve a problem that probably was not required to be solved back then.
08:41
So we ended up pivoting and really relying on Kafka as our queuing system of choice. The current implementation that we have, that we've built, is basically something that relies on a lot of the guarantees that Kafka offers as a system. So when a request comes in, it goes into Kafka
09:01
and then it's up to you to set up a mirroring system that copies this data over onto another queue somewhere else in another data center. And again, those are the details I wouldn't get into right now, but you could potentially use the same queue. You don't really need to use a mirror as well, but you have a cross-DC consumer on the other side.
09:24
That's a module that we also built that is offered by the Solr community. And the responsibility of this module is to just read data off of the queue and write it into Solr. And I'm going to talk more about what this really means.
09:41
The good part about isolating the queuing system from Solr here is you're not restricted by how many data centers do you want to replicate this to. It's the mirroring logic in the middle that kind of decides how many places does this data go to. And it's actually the cross-DC consumers that you run that decide who receives these packets.
10:03
So this is super powerful because it doesn't really restrict you while that diagram up there is, it highlights only two data centers. This system works for an n-way replication. Let's look at the requirements now.
10:22
So you need a producer plugin. I don't know how many of you here really know too much about Solr, but Solr relies on something like the update request processor for processing all of the incoming updates. This entire system that we've built currently only forwards and replicates update requests.
10:41
So the update request processor that was actually built specifically for the mirroring logic takes in and forwards the request. So you need the producer plugin, which is provided by the community. You need a messaging queue, something that you need to set up yourself. You need a cross-DC consumer app, something again provided by the community,
11:02
and you don't need to really code anything for it. It's a configurable piece that you just can drop, run, and get going. And you need an external versioning for multi-way replication. The need for that is basically because when you're trying to write to Solr from multiple places into multiple different data centers,
11:21
one thing that Solr would not be able to accomplish is to guarantee that your versions across your data centers are consistent. And for that specific reason, it's kind of required that you maintain your own external versions because you have the best idea about which document should win the race of making it into Solr.
11:46
So setting up your cross-DC module. Well, for the indexing part, we have a producer jar that we're going to be shipping. None of this is released, as I said, but we built it internally and we've tested it out reasonably.
12:01
The producer jar is supposed to be copied into the shared library. If you use anything in Solr that is not shipped by default, you probably install it into your shared library. There are multiple ways to do this. One way to do this is to install things into a shared library at a per-core level. I would not recommend you to do that, and the reason for that is purely because
12:21
your shared library, if you put it on a per-core level, you might end up in a situation where you're copying the same jar over and over for every core that you have, making your maintenance of all of these jars almost impossible. When you want to upgrade your jar for cross-DC, you're in a position where you're responsible for copying these jars over to every core that you know.
12:50
The second thing that you really need to set up is the consumer. The consumer is a JVM-based app. You can just set it up, run it. It requires information about the Solr cluster
13:02
that you're trying to write to, and the Kafka details that you want to read off. I'll get to that in a bit because this configuration can be provided as a config file that you can provide to the consumer itself, and you run a bunch of consumers when you run your cross-DC solution.
13:21
The problem here is most people are, they don't want to maintain different configuration files sitting on disk for each of the consumers that you're running, because these are going to be running of different nodes. So there are two ways to set your configuration for this module. The first one being,
13:40
you can either update your Solr config and you can copy it over to your Solr instances, or the other one being, you can store all of the information for both the producer and consumer in the cross-DC.properties file. Now the cross-DC.properties file allows you to change a lot of stuff while your Solr instances watch for these changes,
14:01
allowing you to make a lot of these changes without the need to restart a bunch of stuff. You can certainly specify the topic names. You can enable and disable your cross-DC replication for a specific collection through these properties file. While we support that and the need for that was because we saw a bunch of use cases
14:22
where people were trying to use a shared config file across a bunch of collections. Now these collections, even though they stay together, they're part of the same Solr cluster, they don't necessarily belong together in my opinion. And my recommendation here would be for you to maintain your different config sets.
14:41
Do not try to share your config set across different collections and try to toggle the cross-DC properties while you're trying to enable cross-DC for one collection, disable the same thing for the other. I know people would be very tempted to do so because then it leaves you with only one config set to take care of,
15:00
but I highly recommend to not do something like this. Solr has a lot of such interesting toggle points that allow you to do stuff like this, but it's not the ideal case. Anyone who's maintaining Solr would kind of understand this. So now that we know how the architecture looks,
15:21
how are things required to be set up, the idea behind the ability to set up an update request processor that you have a jar, you have to install that in the shared library, you have a consumer, let's dive into some specifics of this module. So as I said, the current solution only supports
15:41
mirroring of update requests. All of the requests that come into the primary data center that receives these requests is responsible to take these requests, wrap them around into mirrored Solr requests, and a mirrored Solr request is basically a request, a Solr update request that adds a bunch of metadata to it,
16:02
with a bunch of added metadata to it, information like a timestamp for when was this request first received and put on the queue. Having this information allows you to look at your latency. A lot of people who run this system would want something that tells them how long has it been or what's the lag like right now.
16:23
While you can get some of that information from Kafka directly, the kind of information you get from Kafka is just how many packets have you not read yet. You will not really be able to see how much of a latency do you actually have. You only have visibility amount into the volume of data that's still pending to be processed.
16:42
So the mirrored update request adds that metadata that allows you to see that, get visibility into latency based on time. When we were building this, we tried a bunch of approaches, we thought about it a lot. We literally discussed this and spent months just figuring out what was the right approach to this.
17:03
When an update request processor receives an update request, there are multiple ways to process that. So one way to handle that is for you to index that document locally and then send it to the queue. That's a great idea, but then there are cases that require you
17:21
to kind of send this document onto your messaging queue first only so that you could persist an incoming update that made it into your system. Both of these systems have their pros and cons. If you were to write these documents locally, you'll have a successful local index, but what if your queue is suffering from an outage?
17:42
In that situation, your update fails, but your update has kind of sort of made it into your primary data center, leaving you with an inconsistent state. On the other hand, if your queue is down and you're trying to write to the queue first, you might be in a position where you have a complete write outage
18:02
because every incoming request is trying to be written to the queue first, and the queue is unable to accept anything, so that approach will not even attempt to write this data locally.
18:20
So having seen the benefits of both of these approaches and having tried both of these, we realized the best way to do this was to actually send this data in locally, have this persist in a solar index, and return back an appropriate response accordingly. If for some reason the update fails, we let the user know that the update failed at Kafka.
18:41
The user still knows that the index, the document made it through the local index. Now, a lot of users might be okay with their cross-DC replicator cluster being out of sync. Now, in such a situation, we spoke with a bunch of users,
19:01
we spoke with a bunch of people in the community, and what we heard back was, we want a flag that allows us to bypass this. We are okay to skip these updates. As much as at least I tried to convince them to not move in this direction, because once you are not consistent between your data centers, the entire idea of being resilient in case of a downtime
19:22
goes out of the window, right? But there were enough people and use cases that wanted an ability to continue to move forward while ignoring the failures that were happening on the DR cluster. So there is a flag. If you were to try this out, you could use that flag where you're able to ignore
19:41
a failed update onto the queuing system. That's great for times when your queuing system goes down because you're not seeing a bunch of errors while you're fixing your queuing system, and you can later recover from a backed up collection snapshot that you can restore onto your DR cluster. But at that moment, you can continue to process your updates.
20:04
Also, there are times when you want to send in a test request, right? So when you send in a test request, you can also, as part of your mirrored update request, add a flag that says shouldMirror equals false. Adding such a flag to your update request
20:20
will allow you to completely skip replication for this. Now, while we were building all of this and we had moved on, we thought things were working, things were great, users, all the test cases passed. The next thing that came up was one of our users came back and said, you know, we're seeing some documents
20:43
that never make it into the other side. We don't see any errors. We see these documents come in and we don't know what's going on. We asked them for the logs. There was nothing wrong going on. What we realized was they were sending these documents in batches, which I guess that is the recommended process
21:04
for, I mean, the entire community is recommended to send solar updates in batches. So they were sending these batches with medium sized documents, but the size of the batch itself was big. When these documents made it into the white box that I showed in the architecture diagram, their mirroring logic did not support packets this size.
21:23
Solar did support this, so solar was happy accepting all of these big packets. And the queuing system was happy accepting these batches as well. It was the mirroring logic that was not happy mirroring these across data centers at that packet size. So we thought about it. One, it took us a lot of time to debug.
21:42
When we figured it out, we realized the problem was in our approach. We were, again, too tied into trying to use solar for more things. A problem that we had started off to solve, we had kind of gotten back into, we had gone down the same rabbit hole of, solar could do this, and there's no need for us
22:01
to look at any other solution. Now, we spent some time, tried to figure out what could be a possible approach that would solve this problem, and realized Kafka is actually meant to do this. Kafka is really good at accepting small packets, a ton of them, and processing them at a really good pace.
22:23
Our entire approach to using solar batches and just replicating those batches as is was because we were taking Kafka communication exactly to behave the same way as solar behaves, and that's not true. So what we ended up flipping to, or trying at that moment, was using,
22:41
or splitting down the entire solar batches into sending one document at a time to Kafka, and letting Kafka manage the batching for us. Yes, that means when the document gets to the other side there's a bunch of batching to be done again, but this allowed us to control two things. One, larger documents themselves,
23:02
because now you're sending one document at a time, and there's more that you can process. But also, we had more visibility into the errors that were happening. When a document failed, we were able to see what was going on, as opposed to a batch failing, and then trying to figure out which document failed
23:20
and what to do in such a case. Solar's not really great when a single document in a big batch of documents fails, and in this approach, that part was critical because there were more than one moving pieces that were outside of solar. Having tried this out and having tested this out, one thing that I can guarantee is that this,
23:42
I don't have any numbers for this, but this performs a lot better than the approach where we were trying to piggyback on solar's batching and try to reduce the number of packets that we were writing to Kafka. So once the packet goes to the other side and is received by the destination queue,
24:02
the cross-DC consumer is what I would like to call a very simple app that's meant to process these update requests. What I call simple, I also call complex, and the reason why I kind of use both of those words together for this is because as simple as it sounds and looks,
24:23
if I were to explain this to you, it kind of encapsulates years of experience of having worked with solar. A lot of people from the community contributed towards what or how to process these requests. Now, when a request comes in,
24:40
the consumer tries to write these requests to solar. When you try to write these to solar, they could fail for various reasons. If you're not experienced enough with solar, you don't really know what to do with this request. You could either retry this, or two, after enough retries, either drop this request, or you could just resubmit this back to the queue
25:04
to process this again at a later time. Now, while that's interesting, the complicated thing here is most people don't know what to do in what situation. If you don't know what to do in such a situation, then you might end up in a spot where you're resubmitting your request to the queue. This request will never succeed,
25:21
because it is just that kind of a failure, and you'll be stuck. You'll be stuck trying to figure out what is it that's going on. Your consumer's up and running. It's not moving forward, and something's up. So all of that logic right now is encapsulated in the cross-DC consumer.
25:41
So it handles resubmission really intelligently. What we also realized was there was a need for people to use more than one topic. Using a single topic was a great place to begin with, helped us with our test cases, and made us believe that it all works. Until we went to production and realized
26:01
there are use cases, and I got pinged by people in the community as well about this, there are use cases where they really want multiple topics to be handled by the system. And the reason for that is what if you're re-indexing data for a single collection? And let's say that collection has two billion documents,
26:20
and it's going to take a while for these documents to come in, and for these documents to be re-indexed. If you share the same topic, you're sharing that topic with the same collections that are serving live traffic and getting live updates that are now getting queued in the same messaging queue. So what was needed was the support
26:40
to kind of support multiple topics, maybe even a topic per collection. And yes, we thought about that as the default choice as well. We thought about what if we were to use a topic per collection. Some users have challenges in terms of being able to create topics easily. Some users have hundreds of collections
27:01
in their single-solar cluster. Some users have transient-solar collections that only come up, get used, and they throw away those collections. In all of those use cases, it's not worth it to create a topic, and then use it, and then discard the Kafka topic. A lot of these situations, they want to just reuse an existing topic.
27:21
So we provided the support for users to specify what topic do you want to write something to, and what topic should a consumer consume from, and what collections should this data go to. What we also added was the idea of prioritization.
27:41
So prioritization almost sounds like what I described in terms of being able to support multiple topics, but not really, right? I also said there are situations where people want a prioritization of some sort, but are unable to create these topics at will. In such situations, there needs to be an ability for people to go out and specify priority for certain collections,
28:01
and for the consumer to be smart enough to say, I'm going to give more processing capacity for requests that come in for this specific collection, and I'm going to spend less time processing requests for this other collection. Maybe that other collection is just getting less used. Maybe that collection is okay to be behind,
28:21
and one of the collections might be tiny, but might need a higher priority. So this helps you with providing starvation, in handling starvation in situations where your active re-indexing or your indexing throughput for different collections is off by a lot.
28:42
And all of this is configurable and at real time, so you don't have to restart anything. You can make these changes in your cross-tc.properties file, and once you do so, you'll be able to basically update the priority of processing for your cross-tc consumer.
29:02
If you use anything like this, you certainly want a way to get more insight and be able to perform some administrative tasks that don't require you to talk to things like Kafka directly. Let's assume you had a bad document that came in, and a bug in cross-tc for some reason, for whatever reason right now,
29:21
is unable to process that document, and you know that you want to drop that document. Or you created a collection in your primary data center, you sent in a billion documents, these got queued, and once these got queued, the consumer tried to process these updates, but what ended up happening was
29:40
the schema was out of sync, or the collection was just bad, and you realized all of this data does not make sense anymore, so you wanted to just skip processing of all of this data in your cross-tc, in your DR cluster, right? Or in your secondary cluster. In such a situation, what you might want is the ability to say skip my Kafka offset
30:03
and take it to point x, that would allow you to skip two billion updates. Without having to create a collection temporarily, get these documents to get processed, dump these documents and delete the collection because you never wanted them to begin with. So you might want to play with the offset,
30:22
you might want to just get visibility into the offset, you might want information about what's the latency like, or you might want to pause consumption of a specific collection at a specific time. The admin API, something that we're currently working on, will allow you to do all of this.
30:41
These are going to be just endpoints, not a fancy admin UI, but it's going to be endpoints that allow you to get insights as well as do operational stuff with your cross-tc consumer. Pausing consumption is basically a part of a bigger vision here. It's not just about replicating data through cross-tc.
31:04
Imagine a situation where your primary kind of went down or one of the clusters went down. You want to now back up from your primary and you want to bring this data into your secondary for whatever reasons, and you want to do that as seamlessly as possible
31:20
while you have your cross-tc turned on. What you can do in this situation is send in an admin request, say pause consumption for this collection, get a backup snapshot, restore it into your other data center, and turn your consumption back on into your secondary cluster. What that'll enable you to do is to not worry about, did I miss anything?
31:41
Everything is now in the queue, the consumption was paused, and at the end of it, you'll have a DR cluster that's in sync, up to date, and you don't have to worry about anything. That brings me to the last thing I want to deep dive into, and something that has always been
32:00
a complicated thing in Solr, ever since, you know, ever since Solr started. Delete by equity. It gets even more interesting if you use Solr in cloud mode, you know how complicated it is and how finicky it might be. It may or may not work, which is why a lot of features,
32:21
as soon as you mention delete by equity, explicitly say it's not officially supported. So that's the last bullet point on this slide. It's not officially supported, but if you're in a situation where you're sure you're not updating the same document again, and you want to go ahead and delete documents,
32:41
you're not in the business of updating the same document with the same ID over and over again, this will work. In every other case, you have to watch out for, and as I said, it's not officially supported, so there are no guarantees about what happens. Now, there's, and all of this is, of course,
33:01
because update ordering is the challenge. There are two, until last week, there was one way to do this, until we realized the approach had an issue. The original approach was basically, when a request comes in for delete by equity, send a request, map these requests into delete by IDs
33:20
by fetching all of the IDs, convert these into delete by IDs, you know, a thousand at a time, and paginate through this. Great, great idea, right? But, imagine someone who does not enable doc values, sends in a request, that matches a million deletes, right?
33:42
Delete a million IDs. Try that, not for the sake of delete by IDs, but try that just as a query. You'll realize the field cache is going to blow up if you're running, say, an older version of Solr, or if you're relying on, if you don't have doc values. So you can't really paginate, irrespective of what you try to do. The query itself is going to fail,
34:01
and so there's no way to get to this stage of creating or converting this into delete by IDs. The client could still do that, but you as a user will have to restrict this to a thousand matches at the most. The tough problem here is, you can't even get back a response, you can't even tell them how many matches
34:21
does this query have, only because the query will never go through, right? So we flipped this around, and converted this into an approach where all of this gets executed as a single query with a restriction of matching only 10K docs at a time. There's still some conversations, and there's an approach that's been committed.
34:41
We're evaluating if this is going to solve the challenge of the problem that we've been looking at. And I've had some good conversations about being able to turn on doc values for existing collections that don't have doc values enabled at this conference. But again, as I said, it's a slippery slope. So if you want to use delete by query with the solution,
35:03
know your restrictions, know your limitations. That brings me to the last part of the talk, which is just the future opportunities. As I said, all of this is something that we built because we needed something like this for our use cases at Apple. And we've been using it for the last six months.
35:22
And it works. We've fixed a lot of things based on things that we've been seeing, issues that we see at scale. We've fixed a lot. But the release is yet to happen. So one, help us with the release. Come contribute. It's all open source. This was not a code dump. All of this was created, designed,
35:41
and implemented upstream, every bit of it. All conversations happened upstream. It has metrics, but it requires improved metrics reporting. The admin API request is something that we've started to work on, and we would love to add more to it. So if that's something that you're interested in, feel free to ping me about it.
36:01
And as I said, there will be times when you miss a document for whatever reason. It could be a bug or not. But at such times, you need a way to check the consistency across your data centers. The next stage of that certainly is to be able to auto-heal. But to begin with, what is needed is to be able to at least say that your clusters are out
36:22
of sync. So that's something that's been on the radar. If you're interested in any of this or in the cross-DC replication work in general, please reach out, and it'd be great to collaborate upstream. And that's about it. We have three more minutes, I guess.
36:41
So any questions? Did you see where this code lives? Oh, thank you for the question, Houston. And your question is, where does the code live? The code does not live in the main Solr repository.
37:01
It lives in something called the Solr sandbox repository that is also owned by the Apache Solr project. So yes, there is also documentation in that repository, but you'll find all of this code in the Solr sandbox repository. Ken. Really appreciate the talk.
37:22
One question that I have with this, you say delete by query doesn't work. What about atomic updates? I could see some really bad things happening, like you're indexing. It succeeds to index, but posting the Kafka queue fails. You have sets messed up, increments messed up, all sorts of weird things.
37:40
Yes. So all the things that are potential challenges with Solr itself are still a challenge with this solution. Atomic updates, if your local request passes and succeeds, the only thing that will potentially fail is if your queue is down.
38:02
If your atomic update does succeed, you will be able to write the update. At the same time, anything that involves updating the same document again and again is going to be a challenge. We haven't tested it out. We don't have use case for that right now. But if that's something you're interested in,
38:21
it's totally worth checking out, because I know there will be challenges with just the idea of updating the same document again and again. Would you say that this is best suited for worm kind of use cases, where write ones read many for that reason? Write ones read many, or you could write
38:41
once and update the document. There is that possibility. The only challenge is if you update the same document very close to the potential another competing update. If that's not the challenge, and if you have fairly infrequent updates to the same document, this will work.
39:01
Thank you. So a quick question. Lots of them, though, but I would start with, so you mentioned about some monitoring capabilities, matrix stuff in process. That was part of one of my questions, because Kafka generally is pretty verbose,
39:20
and are you exposing that as one of the possibilities also in solar? So yes, some of the stuff will get exposed. For example, latency backlog, the stuff that the consumer has access to, it will get reflected. Right now, I think we report backlog. I think we report latency.
39:41
Actually, yeah, I think we report both of those. I'm not sure. I'll need to check. But as I said, that's work in progress. There's not a lot of metrics that we report. We only have what we thought was essential for us to be able to even start this thing. OK, I have a few more questions, but then one last one maybe that then Kafka would be embedded into solar in that case?
40:02
That's the white block in the middle. We're providing a solution, but we're not shipping any bit of Kafka other than the client with solar. You're supposed to kind of bring up Kafka. To test it out, if you check out the repository, there is a script that basically pulls in Kafka
40:21
for you, starts stuff, brings up clusters, shows you a demo of literally sending data to it, replicating it, reading it into another cluster using the consumer. So there's a full demo in the repository right now. Take a look at it. But we're not going to be shipping Kafka.
40:45
Any other question? There'll be a time, I suppose. Oh, yes. All right, thanks a lot. Thank you, everyone.