Highly available and disaster ready Apache Solr
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 69 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/67305 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
Berlin Buzzwords 202161 / 69
11
25
39
43
45
51
53
54
60
00:00
SolrTrailMathematicsPerspective (visual)Data centerComputer architectureSoftwareFile systemComputerMainframe computerReal-time operating systemMIDISubject indexingComputer configurationMathematical optimizationProcedural programmingAnalytic setForm (programming)BitPhysical systemSlide ruleData recoveryLocal ringTerm (mathematics)Virtual machineHeat transferNumberScaling (geometry)Tape driveConfiguration spaceRevision controlOperator (mathematics)CASE <Informatik>Process (computing)Musical ensembleComplete metric spacePoint (geometry)CuboidVariety (linguistics)Set (mathematics)Data storage deviceInternet service providerInsertion lossDatabase normalizationBackupComputer filePoint cloudWebsiteDifferent (Kate Ryan album)Software-defined radioMultiplication signReplication (computing)Uniform resource locatorOptical disc driveInterface (computing)Cloud computingMechanism designXMLComputer animation
07:52
CodeImplementationOrder (biology)Combinational logicHigh availabilityFile systemFrequencyBuildingMacro (computer science)Level (video gaming)Task (computing)Subject indexingComputer configurationAxiom of choiceExpected valueGroup actionMereologyMultiplicationPhysical systemCore dumpLocal ringTerm (mathematics)NumberScaling (geometry)System callConfiguration spaceRevision controlDependent and independent variablesOperator (mathematics)CASE <Informatik>Strategy gameConnectivity (graph theory)System administratorAdditionPoint (geometry)Data storage deviceLatent heatInsertion lossElectronic mailing listDatabase normalizationBackupComputer fileClient (computing)Different (Kate Ryan album)Block (periodic table)Single-precision floating-point formatMultiplication signUniform resource locatorRight angleInterface (computing)Mechanism designRepository (publishing)ChainProjective planeQuicksortParameter (computer programming)Error messageCuboidAliasingVertex (graph theory)Run time (program lifecycle phase)Thresholding (image processing)Computer animation
15:35
CodeImplementationData centerComputer architectureSemiconductor memorySoftware testingBand matrixSubject indexingDecision theoryBitLimit (category theory)Complex (psychology)MereologyMultiplicationPhysical systemLocal ringDatabase transactionQueue (abstract data type)QuicksortConfiguration spaceRevision controlDependent and independent variablesCoprocessorCASE <Informatik>AbstractionQueueing theoryCuboidChemical equationSurjective functionSource codeClient (computing)Endliche ModelltheorieMultiplication signReplication (computing)Message passing2 (number)Game controllerRepository (publishing)LogicEntire functionComputer configurationPhysicalismComputer fileDifferent (Kate Ryan album)Single-precision floating-point formatService (economics)Mechanism design
23:19
CodeImplementationLogicData centerComputer architectureCubeSynchronizationPhase transitionDecision theoryForm (programming)Existential quantificationMultilaterationMereologyMetric systemPhysical systemTerm (mathematics)Queue (abstract data type)QuicksortConfiguration spaceRevision controlDependent and independent variablesCoprocessorCASE <Informatik>Process (computing)Queueing theoryError messageInstance (computer science)Point (geometry)Data storage deviceMetadataSoftware frameworkTraffic reportingSource codeFluxBackupEvent horizonGene clusterTrailObject (grammar)Concurrency (computer science)FlagLoginMultiplication signReplication (computing)Message passingInterface (computing)Figurate numberMechanism designRepository (publishing)Mobile app1 (number)Field (computer science)Spacetime
31:02
Physical systemSlide ruleData storage deviceSearch engine (computing)Multiplication signFile formatComputer animation
32:01
Data centerOpen sourceReplication (computing)Meeting/InterviewXMLUML
Transcript: English(auto-generated)
00:07
Hello everyone. Welcome to this talk, the search track of the Berlin Buzzwords 2021 edition. I'm going to be talking about highly available and disaster-ready Apache Solr today.
00:22
So during the course of this talk, I'm going to be introducing a few concepts. I'll talk about HDR and then talk about HA and the options that Solr provides out of the box to... If you were interested in HA and DR, what would your options look like? So over the years, Apache Solr has evolved into something that's more than and beyond for
00:45
text search. It's come over to handle use cases that involve machine learning, analytics, and a lot of other things, almost like a Swiss Army knife. It's not only grown in the number of features that it offers, but that it can help its users accomplish,
01:02
but also the scale at which these features can be successfully used. Because of that reason, Solr is being used in a wide variety of industries in infrastructure that I can only define as critical. The ways in which these industries like retail, banking,
01:23
travel, research, and a bunch of others use Solr make it essential for it to work reliably, both in terms of correctness as well as availability. So we're going to dive directly into disaster recovery. Disaster recovery was developed in the mid to late 1970s when there
01:42
was a realization on the dependence of computer systems and that downtime meant a loss of revenue. Back in the day, mainframes would back up on tapes and wait until those tapes would complete restoration of data in case something were to happen.
02:00
Over the years, though, disaster recovery has evolved a lot. Currently, disaster recovery involves a set of policies and tools and procedures to enable the recovery of an existing system for its continued availability or continued operation.
02:22
And that could happen due to either a natural or a human disaster. And disaster recovery generally uses an alternate site to restore that operation to normalcy for the entire site. As the intention is to have the primary location recover, it is generally done in one of the two
02:43
ways, either using an offline process, something like a backup restore, or more actively transferring data at real time so that if something were to happen to your original system, you could recover from that. Asura has provided DR readiness, and I'm going to refer to disaster recovery
03:06
as DR for the rest of the slide for the most, more often than not. Asura has provided DR readiness options for a while. And while they have evolved over multiple releases, they still continue to be the recommended options, the same options. And while backup and restore is
03:24
certainly the most important DR option available, but CDCR, or cross data center replication, as it's known, actively copies data, allowing it to also work as an HA solution. But I'd like to highlight that DR shouldn't be confused with HA, and we're going to get to that
03:43
in a little bit. An important aspect of DR is that it is configured with a designated time to recovery and a recovery point. And by recovery point, I really mean that checkpoints are generally maintained, and when a restoration has to happen,
04:09
data is expected to go back to that restoration point, essentially translating to not all data that you might have gotten into the system before the disaster happened is going to be
04:20
covered. At least that's my design. Diving a little deep into backup and restore options for Solr. To give you a little bit of history of the backup restore features that Solr offers, it was first introduced in version 6.1. And when it was introduced, it was run a lot, but
04:42
it basically came with some restrictions. Most importantly, it only allowed for full backups and was not really cloud friendly in a sense that all it would allow users to do is to backup either on NFS, which is local backups, or HDFS. And there was no way to only backup
05:01
data or the configs, so it would always backup everything every time. And more often than not, there were a lot of failed backups. And I'm going to go back to why that happened, but there was a realization that something needed to change if this feature was to be continued to be used by its users. So it needed a new perspective.
05:26
And that new perspective came in the form of incremental backups. The realization that Solr tries to backup every bit of data every time you want to backup, and people generally want to backup early, maybe twice an hour, maybe less frequently. But backing up all of the data
05:41
every time really meant storing more data, also transferring more data across the network, maybe even to a different data center. So with incremental backups, Solr offered something that allowed for reducing redundancy of data storage and transfer, translating to faster and less expensive backups and restores. But it did not stop there. This entire feature
06:07
offered two more valuable sub-features, you could say, or things. First one being safety against corruption. While the previous backup mechanism did not check for a backup index before
06:22
it backed up, the incremental backups ensured that the index files that were backed up were checksummed. And to ensure that at the time of backup, the index file that you were backing up could be used if you were needed. And another more important aspect of this was this
06:43
entire architecture allowed for backups to be more cloud-friendly. So you were no longer restricted to only using NFS or HDFS file systems. So backup and restore evolved into
07:02
something that started using incremental backups, allowing users to kind of optimize on the resources it requires and also the odds of it succeeding. But it also added extendable interfaces. So what that meant was you could now have cloud services or cloud service providers
07:22
as your backup options. Well, that wasn't something that was released when this change happened. Actually, the upcoming release, I think, is going to have support for GCP. And there's an open PR for supporting AWS or S3 as a backup option for Solr Cloud.
07:44
The PR is still open, so I'm not sure when it's going to be commercially released. But it's out there, and it's looking really good. One interesting thing that Solr allows you to do is to define multiple repository implementations. And the extendable interfaces that I just spoke about are actually repository
08:04
implementations, allowing you to have a definition of, say, HDFS, GCP, AWS, maybe another proprietary backup file store that you might have at your workplace. And you don't really have to use all of these all the time. And you can specify exactly what
08:25
you want at runtime, making it really easy and convenient for people to switch and use things as they wanted. And one more interesting aspect of this backup and restore update was
08:42
unlike the older backup restore, which required users to restore an existing index into a new collection only, the new backup and restore, yes, it still continues to work with Alias, which is also a recommended mechanism if you don't want to change your client code.
09:01
But the new version of backup and restore allows users to restore into an existing collection, piggybacking on a feature that was released only recently that allows Solr indexes or collections to be marked as read-only at the same time or in the background, allowing you to restore
09:20
into the same collection. So how do we use backup and restore? And for the sake of time, just discussing backup here, you basically have to define your implementation that you might use in your Solr setup in your config. And while the config that's highlighted here
09:49
is a local file system repository option, the location being an NFS mount point, you can have a GCP implementation with the upcoming release. And in the future,
10:00
you will be able to have an S3 configuration for this as well. And Solr exposes four APIs as part of the backup restore umbrella. And that's the backup, the restore, the list backup, and the delete backup API calls. The backup call, let's see what happens when you send in a backup call. When you send in a backup call to Solr, it parses those parameters,
10:24
picks up the right repository implementation based on what you've already configured and what's been provided as part of the request, and then sends an internal core admin API call to back up the core. Well, this call is kind of an optional call because Solr allows you to specify something
10:42
called the index backup strategy. And while right now it supports two strategies out of the box, one being copy index files and the other one being none, and they do exactly as the name suggests. The copy index file would allow you to take an index file and back it up into your
11:00
implementation of choice at the location that you've specified, or none would skip backing up all of your data. And then it moves on to the next step and backs up your configs. In this case, it would allow you to only backup configs for you to create a collection, say, that looks exactly like the current collection, but does not hold any of the data that you have.
11:23
In your old or the existing collection. And towards the end of the request, it does some internal housekeeping, where it takes care of things like ensuring you don't have too many backups and to clean up the oldest backup that you might have if you've specified the
11:42
number of backups that are allowed for a given repository implementation. So moving on to availability. Availability is the probability that a system will work as required,
12:03
when required, during a task or a mission. And the mission is basically a project, the time when you really intend to use the system. The system which aims to ensure an agreed level of operational performance, generally termed as uptime, for a higher than normal
12:23
period. And the normal period might vary based on use case to use case. It's called a highly, like it's called a highly available system. When the system is not available, it's considered to be downtime. A highly available system is designed with redundancy in the system
12:45
to take care of both micro and macro level failures, to overcome component failures at different levels. The systems are designed with no single point in failures, so as to ensure availability, even when something were to go down. It's generally achieved by redundancy,
13:04
monitoring and failover, a combination of those so that you not only need redundancy, but you also need monitoring in order to detect when something goes down and to failover by either taking that specific component out of action or in addition to direct traffic that was
13:22
meant for that component onto a healthy component. An important aspect here is highly available, high availability for a system does not really translate to a system never going down. It just means that the system is going to continue to work
13:40
beyond a certain expected availability threshold, which is expected out of the system. So any highly available system is and can go down. It's just whether it still continues to work for the expected duration. And while HA and DR might sound very similar in terms of what
14:05
they're trying to achieve, they're essentially different. They have some logical overlap, but a highly available system by design handles smaller scale issues, a smaller component going down. Whereas DR generally means a bigger problem has happened and you need to now recover
14:25
after the loss of something that is more than just a small component of the system. So Solr has provided HA options for a while now and replicas are the essential building blocks
14:44
when it comes to HA and Solr. Replicas provide redundancy and so HA in a sense that if one replica goes down, another active replica takes over processing the requests that were meant for the original replica. Solr does not automatically spin up something, but it makes
15:02
sure that anything that is meant for something that's now down is rerouted to an active replica. It also takes over other roles and responsibilities like being a leader or when something were to go down, ensuring the availability of the system again. Now, however, there are times when due to some error, which may be human or not,
15:25
the impact on the system is not limited to one or two components. It's not local, it's not physically bounded. In such a situation, you need something bigger that spans across the data centers. And that's where cross data center replication allows you to have a
15:44
single physical data center. So cross data center replication. Talking about the history of cross data center replication, it's existed in Solr for a while
16:00
now with the first release of CDCR, as it was known, coming out in Solr 6.0. And it was meant to accommodate two or more data centers with limited bandwidth and allow for data to be replicated across these. But the design decisions that were made back in the time
16:22
had some issues leading it to failures and problems for users that tried to use the system. And so it was deprecated in 8x. It's now been removed and the intention is for it to be replaced in 9x by a new approach. I've kind of pushed some code up already into my GitHub
16:45
repository. I'm still working on it. I have some working model of this thing. But let me talk about what this approach looks like. So the new Solr with cross DC replication architecture looks something like this. It's kind of widely based off of an approach that we've used at Apple
17:06
to achieve cross data center replication. The reason why we didn't use what Solr offered out of the box was not because we didn't believe in it, but because we started using it before Solr had cross DC as a solution out of the box. The best part about this entire
17:24
infrastructure is that white box in the middle. All of the queuing mirroring logic, something that Solr is not designed for, it was never designed for, is abstracted out in the system as compared to the older system, which also forced Solr to behave like a queuing service,
17:48
eventually leading to problems like unbounded growth in transaction log and out of memory issues. In this architecture, when a client sends a request to Solr, Solr locally indexes that
18:00
document and then puts this document onto queue and source one and desk two are basically topics in this case. And the reason why I'm going to refer to a bunch of things using terminology that's generally common for Kafka, may not be common for other queuing systems,
18:22
it's just because of that, that the implementation out of the box for this would be Kafka to begin with, would use Kafka to begin with. So Solr writes this to a source topic, let's call it source one because it belongs to data center one on the left hand side.
18:40
There's a mirror of some sort that mirrors this into a queue that's called test two, a destination queue for data center two. And there's a cross data center consumer who's responsible for consuming everything that's coming into its local destination queue and writing it onto Solr. And as I mentioned, the isolation of responsibility in this case,
19:04
as Solr is not expected to be the queue, takes away a lot of problems that sort of might have had with a CDCR solution. If you've wanted to be replication, it still stays easy. Again,
19:22
primarily because everything inside the white box is abstracted out of Solr and it's not really, it doesn't really know what's going on inside that box, which is where the complexity got added, whatever bit of complexity there might be. Now, in this case, when an incoming request comes directly to DC2, it gets written by Solr onto the source topic for that data center,
19:46
gets mirrored to the destination topic of the other data center, only to be consumed by a cross DC consumer that's running locally in DC1 and indexed into the Solr cluster running
20:01
on that data center. The cross DC consumer has checks in place to ensure and avoid circular mirroring. So if a request were to come originally into DC1, Solr is going to make sure that it does not come back through the source topic of data center 2.
20:23
So there are checks and balances that ensure that that doesn't happen. So the requirements for this new architecture are a messaging queue. That messaging queue, as I mentioned, would be Kafka to begin with, but you could have your own implementation and
20:40
have your own version of messaging queue. It could be RabbitMQ, it could be a file-based messaging queue, a proprietary internal user company might have created a messaging queue system and you could use it. You'd need a very simple implementation that would allow you to use that queue. You'd need a messaging queue consumer implementation, producer being optional. You'd need
21:02
a cross DC consumer that will be provided out of the box and nothing would need to really change in there, maybe some configurations. And you'd need external versioning for multi-bay replication. The reason why you'd need it only for multi-bay replication is because if you were to send in the traditional one-way replication model, it's going to
21:25
piggyback on Solr's versions provided in the original originating data center to just replicate everything over. But if you have multiple data centers acting as source, then you can't let that happen because those versions are not synchronized
21:43
and you'd need an external version in that case. Ingesting data into this entire system can happen in one of the two ways. It can either be sent directly to the queue and consumed by the consumer, in which case you would need an external version. You would also
22:03
not get all the benefits of the second option, which is using a mirroring update request processor, which is a custom request processor that makes sure that the request is processed the way it's supposed to be processed, knowing that the cross DC is enabled. The biggest, one of the biggest benefits that it provides is the abstraction of the
22:22
underlying queue. So if you were to start off with Kafka, because that's what sort of provides out of the box or would provide out of the box, but move on to using a different queuing mechanism, because that is your queue of preference, you wouldn't have to change your client code because the client's not writing to this queue and is agnostic of that queue.
22:46
So that's one of the benefits. The second benefit is that it allows for more checks and controls before the submission. A request that's coming in might fail on the originating data center, in which case it shouldn't be written to the queue. But if you write directly to the queue, all these requests make it into Solr, it might succeed on one of the data centers,
23:05
might not succeed on the other, say because the configs were different, something was off, and the cross DC consumer in this case would have to be intelligent enough to figure out whether to retry it, discard it, or what needs to be done with this request.
23:22
And an important aspect that we used with our cross data center replication was to handle deletes better. A system like this is designed so that you could have, you would avoid any accidents happening, bringing down the system. So an accidental delete sent to one data center,
23:44
in our case, gets mapped into an update that only flags those updates by adding a field to those documents as deleted. A parallel process that runs occasionally, or frequently, sorry, is responsible to then go ahead and clean up these documents at a later point in time.
24:04
What it allows you to do is say the delete came in into data center one, by the time you've realized that it was an accidental delete, you could still go to data center two, and delete those documents because those documents were just marked. And then use
24:21
backup and restore or something else to restore all of the data back. You haven't really lost that data. All the requests that are processed in this architecture are mirrored requests. Mirrored request is nothing but something that encapsulates the original Solr request
24:43
while adding some mirroring metadata that's used for tracking and metrics purposes. Things like attempt and submit time that allows you to track, for example, the submit time is the time when the request was originally first written into the system by one of the
25:01
Solr instances in one of the data centers. When a receiving data center processes this request, it knows how long has it been and what the latency of this looks like. So allowing you to alert if you're off and your SLAs are not met. Attempt is another thing that allows you to
25:23
track and alert if the same request is getting rejected by the cross-DC consumer or not being successfully processed by the consumer, in which case you could go back and figure out issues like an out-of-sync config. The cross-DC consumer in this entire framework is a standalone app
25:48
that has a simple responsibility of reading from the queue and writing to Solr, but also an important and intelligent ability to figure out what kind of requests to discard and which ones to resubmit into the original queue. And I'm again using the term topic
26:04
because Kafka is the preferred queuing mechanism used here. So after a lot of trials and errors, and we've learned it the hard way, realize that the only kind of request that is safe to drop
26:21
are 409s, which is basically version conflicts. In case of optimistic concurrency, when a request comes in that the cross-DC consumer is trying to send to Solr and Solr responds with a 409, would translate to Solr already having a more recent version of this document, in which case
26:41
it's safe to discard this document. In all other cases, it's safe to retain these documents. Just like the backup and restore story evolved into allowing and using more cloud-based, or is evolving into using more cloud-based backup repositories, the same is true for this
27:06
approach here as it's not limited to Kafka. And even though it might start off with Kafka, this interface allows for using custom queue implementation. And that may or may not be something that you really desire or want or need, but it certainly allows you to stay open
27:22
to the idea of switching over to a totally different queuing infrastructure. To extend that interface, and this might be a little too much of detail, but if you were to extend the interface and use your own custom messaging queue, you basically need to define source config.
27:43
And this is all in flux, so please don't hold me to this for now. So there's source config that you'd need to implement and a cross-DC consumer that accepts a message processor. The message processor basically has all the logic that deals with sending message to
28:03
solar, figuring out what to do in case of a failure, and everything else around it. So all the intelligent aspect of stuff is in the message processor already pre-programmed. The cross-DC consumer is what you'd need to implement, the responsibility of which is to get mirrored objects from the queue and put it back to the queue in case of a failure. And the message
28:24
processor is going to let the cross-DC consumer know whether an object needs to be put back into the queue or discarded. So it really needs to make no decision, it just needs to be able to get and put back objects from and into the queue. And the best part is that this approach works
28:41
well with event-driven systems, which is what I've been working with. So the road ahead on this is to release this kind of more basic cross-DC solution, especially because 9.0 would not have the old version and a lot of people, I realize, kind of want the need for it. They're already
29:05
using the old version and would need some form of a cross-DC replication to be offered by solar. It does not report any metrics to a reporting system as of yet, or that's not what I'm working or concentrating on at this point, and that would be great to add. It does log all sorts of
29:25
problems, so you can always take your logs and figure out what's going on and if something's off or if everything is okay. It does not handle collection API requests, which would be a good to have thing. It's something that we've had with an Apple, so might be happy to add it at a later point in time, but for now, none of that is part of this space. And it's great to
29:47
cross-DC replicated clusters, but it's really important to have some form of a mechanism to ensure that these clusters are in sync, these clusters are working well together.
30:00
Even if it's not self-healing, it doesn't fix any problem, it's important for there to be a way for people to know when these are in sync. And then support for more queue systems, which again is completely subject to what the community might want. Maybe everyone just uses Kafka, that's what they love and that's what they use,
30:21
and they don't really want anything else. But if people need something else to be supported, that's certainly on the road ahead. And all of this can happen by community participation. So if this is something that interests you, please feel free to ping or participate directly,
30:40
whatever might be your preferred way to get involved in this. But yes, and it doesn't have to be code, it can be testing, it can be trying out, pitching in ideas, everything's valued. So while I spoke a lot about the primary data store, I spoke about a lot of HADRs being safe and stable, scalable, and you being able to safely use
31:05
solar and critical systems. I'm very sorry, Anshu, to interrupt you because we are running out of time. So maybe a final thought you want to share, and then we have to wrap this up in the interest. Oh yeah, this is actually the last slide, so I'm done. Yeah. So the
31:24
question is whether you could use solar as a primary data store. TLDR is no, and that is basically because yes, it offers HADR, but it's just not designed to be a primary data store. It's not designed for storing documents that are really large and do not follow the kind of
31:44
format that solar is designed for. So yeah, solar offers a lot and it's a great search engine. So I would recommend that you use solar for just that. And that's about it. Thank you so much. Okay, thank you very much Anshu. I think this gave us great insights and I think many of us
32:02
are looking forward to the new cross data center replication. I think it's also good to see that the stuff you developed at Apple will finally be open sourced. So I think that's always good to have something that is practice proven to see and to make it into open source.