We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

What's new in Apache Solr 9.0

00:00

Formal Metadata

Title
What's new in Apache Solr 9.0
Title of Series
Number of Parts
56
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Apache Solr 9.0 might be among the most anticipated release for the project in the last decade for Solr. For folks who don't follow the project very closely, the list of changes is a lot to comprehend and digest. This talk would make that process easy for the developers by highlighting some key aspects of the 9.0 release. During this talk, I'd cover the migration of the Solr build system to Gradle and what it means for developers who work with Solr. I will also talk about updates to modules like the movement of HDFS into a non-core plugin and the removal of auto-scaling framework, CDCR, and DIH. In addition, this talk would also showcase some of the key security, scalability, and stability improvements that Solr 9.0 brings to the users. At the end of this talk, the attendees would have a better understanding of the Solr 9.0 release and a high level road map for the project allowing them to plan better.
Musical ensembleProjective planeBitAdditionVector spacePhysical systemVector fieldType theorySearch engine (computing)Population densityCASE <Informatik>Shift operatorProgramming paradigmMaxima and minimaMereologyRevision controlSoftware maintenanceOpen sourceLevel (video gaming)SpacetimeQuery languageDimensional analysisInformation securityStability theoryMathematicsSystem administratorSoftware developerConnected space10 (number)ParsingWordSurvival analysisDemosceneSlide ruleMathematical analysisData conversionNormal (geometry)Formal languageScalabilitySubject indexingSheaf (mathematics)Latent heatCore dumpModule (mathematics)Server (computing)CodeFunctional (mathematics)Independence (probability theory)Direction (geometry)Complete metric spaceXMLUMLMeeting/InterviewComputer animation
Vertex (graph theory)Plug-in (computing)Substitute goodComputer configurationChemical affinityDefault (computer science)Random numberType theoryMultiplication signPoint (geometry)Latent heatChannel capacityBit rateParameter (computer programming)Resource allocationNumberWebsiteAdditionOverhead (computing)EmailInstance (computer science)Task (computing)CASE <Informatik>Query languageSet (mathematics)Rule of inferenceProcess (computing)Different (Kate Ryan album)Condition numberData managementInterface (computing)Structural loadOrthogonalityMusical ensembleComputer hardwareTrailRevision controlScaling (geometry)Slide rulePhysical systemBasis <Mathematik>Computer configurationLevel (video gaming)Factory (trading post)Plug-in (computing)Dependent and independent variablesElectronic mailing listSubstitute goodScalabilityStability theoryCuboidTheory of relativityExterior algebraLimit (category theory)Centralizer and normalizerComputer animation
Plug-in (computing)Substitute goodComputer configurationDefault (computer science)Random numberState of matterProcess (computing)AuthenticationEmailTelecommunicationHuman migrationJava appletClient (computing)MathematicsElectronic program guideInformation securityDependent and independent variablesAuthenticationConfiguration spaceEmailState of matterFactory (trading post)Public key certificateProduct (business)Computer configurationPhysical systemTime zoneProcess (computing)Latent heatRevision controlBitClient (computing)Interactive televisionAuthorizationInstance (computer science)ImplementationSystem callRandomizationPoint (geometry)Java appletAffine spaceDefault (computer science)Single-precision floating-point formatInternetworkingTimestampTerm (mathematics)Entire functionPublic-key cryptographyFile formatSelectivity (electronic)FeedbackMereologyDirectory servicePatch (Unix)Decision theoryOverhead (computing)Human migrationPublic-key infrastructureLine (geometry)Plug-in (computing)Rule of inferenceMultiplication signCASE <Informatik>outputSlide ruleAdditionLevel (video gaming)Context awarenessZoom lensGene clusterScaling (geometry)Fitness functionTelecommunicationKeyboard shortcutRight angleMultitier architectureComputer animation
Default (computer science)Information securityPhysical systemCategory of beingCategory of beingCodeSystem administratorClient (computing)Information securityRevision controlMathematicsProjective planeAutomatic differentiationDirectory servicePoint (geometry)DemosceneMedical imagingRight angleSoftware maintenanceRepository (publishing)Software developerDefault (computer science)QuicksortPhysical systemMereologyPhysical lawModule (mathematics)BitMultiplication signSlide ruleMaxima and minimaElectronic program guideAdditionMusical ensemblePrisoner's dilemmaData conversionBuildingOverhead (computing)InformationLoginCASE <Informatik>Task (computing)Java appletResultantComputer configurationInternet service providerInteractive televisionInheritance (object-oriented programming)2 (number)Interface (computing)Integrated development environmentTerm (mathematics)Set (mathematics)Identity managementLimit (category theory)Plug-in (computing)System callComputer animation
Data managementCodeInstance (computer science)MathematicsConfiguration spaceSocial classState of matterLevel (video gaming)Plug-in (computing)Computer configurationFlow separationMetric systemType theoryLoginMereologyCore dumpCASE <Informatik>Point cloudSource codeLine (geometry)File formatPoint (geometry)Revision controlHuman migrationSimilarity (geometry)Factory (trading post)Electronic program guideSystem callSoftware frameworkOperator (mathematics)Default (computer science)Projective planeInformation securityMultiplication signSession Initiation ProtocolRepository (publishing)Module (mathematics)Right angleQuicksortSlide rulePurchasingData centerPhysical systemServer (computing)Directory serviceComputer animation
Musical ensembleMeeting/InterviewLecture/ConferenceJSONXMLUML
Transcript: English(auto-generated)
Hello, everyone. I'm pretty excited to be here at Berlin Buzzwords 2022, here to talk about what's new in Apache Solar. Any new major release for a project that's as big and as critical for people like Apache Solar is always a big thing.
And with Apache Solar 9.0, there's a bit of history that is involved with this release, in addition to just being a major release. So let me let me walk you through. And I lost connection to the clicker. Can you get the next slide?
Yeah, thank you. So I'll start off with a little bit of history of how Solar evolved. Solar was started off as a project at CNET in about 2004. It was donated to the ESF and it graduated to being an Apache top level project back in 2007.
Around 2010, Apache Lucene and Apache Solar, Lucene is what Solar was based on. The Solar community and Lucene community were pretty much the same batch of people. And they got together and realized that it made sense to merge both these projects.
These projects ever since stayed together and were released together. And with almost kind of a release cadence starting Solar 4.0, the release cadence was a year and a half-ish. So every year, year and a half, there was a major version release. Right till about 2020,
when a conversation started in the community about separating these projects into their own independent top level projects at Apache. So basically establishing Solar as an independent project again. And early 2021, we saw Solar being established as a separate Apache project. That makes Solar 9.0 an important release in the sense that it's the first release of
Apache Solar as an independent project that did not get released as part of the Apache Lucene project. Can I get the next slide, please?
So as part of this talk, I'm going to try and touch upon a bunch of things. I've tried to categorize everything into these sections here. I'll start off with indexing and search, move on to stability, scalability, security, then talk about deprecations and approval, which are very important.
And then try to talk about everything else that does not fall into these buckets. But one thing to remember is Solar 9.0 is a major version release. And to do justice to a major version release in 40 minutes is just not possible. So I'm trying to do my best to cover as many things as I can, but I may have to leave out a few things.
So if you have any specific interest, please look at the change log to understand what actually has happened as part of this release to get a more complete picture of this. Can I get the next slide, please? So indexing and search is always at the very core of Solar, which is kind of obvious.
It is a search engine. All you want to do is be able to index data and be able to search through it. And so let's start off with the features or the changes in Apache Lucene Solar that directly impact indexing and search.
Next slide, please. I'm going to try and reconnect in the meanwhile. OK. Yeah. So in the recent past, the industry has kind of seen a paradigm shift, especially in certain use cases. And if you attended some of the talks by Alessandro, by Joe, you would have realized how everyone's trying to talk about neural search.
And with the release of Solar 9.0, Solar piggybacks on a feature that was introduced by Lucene and the capabilities of Lucene to allow for being able to search on dense vector field type. So you can now define a dense vector field type if you don't know too much about it.
The idea is traditionally search worked on very sparse vectors where every every keyword was a dimension in the space. But this paradigm completely shifts everything into making things more compressed, things more meaningful rather than just being a bag of words.
What Solar allows you to also provides you as a tool here is the query parser that allows you to search on these dense vectors. So it's a very interesting change. I haven't personally tried it yet, but from the conversations I've had, I feel like this is one of the really important changes or features that have come to Solar as part of the 9.0 release.
OK, the clicker work. Next up is the text analysis. And the text analysis part ties back into the traditional sense of how search works.
And with this release, there's a whole bunch of new Stovas temvers that have been added, adding better language capabilities for languages like Hindi, Indonesian, Nepali, Serbian, Tamil and Yiddish, as well as there's a new Norwegian normalization filter that has been added as part of the 9.0 release, only making this version of Solar much more richer and capable when it comes to
being able to handle languages outside of English or other languages that it supported so far.
So indexing and search, the reason why this slide is part of the section is because a lot of people in the recent past have slowly graduated or moved towards using Solar as not just a traditional text search engine, but for SQL purposes. And that kind of makes sense because at the end of the day, from a search engine,
all you're trying to do is be able to store some data and be able to get that data back based on a certain risk criteria. And SQL is just a language for being able to specify what is the data that you need. With Solar 9.0, all of the much-loved SQL capabilities of earlier Solar versions stay as is.
The only change that happens is that SQL is now a module by itself, so it's moved outside of the core. It doesn't impact the core directly. So the functionality is right there, but it's made things much easier for developers to organize and manage the code base.
In addition, while you could, in the past, people had, users had admin UI support for basic searching and querying capabilities. With this release, there's support for running SQL queries using the admin UI.
Stability. As systems get to be more critical, there's obviously a much more need for things to be made and kept stable. You can only use a system if you can rely upon it. So all the changes that come out with Solar 9.0 in some way impact the stability of the system.
You don't want it to go down in the stability levels of a system that once was. And one of the really important things that the developers or maintainers of an open source project, or any project in general, get to do with a major version release is dependency upgrades.
And while I wouldn't dive into the details of these dependency upgrades, and I'm not sure, I'm guessing there are more dependency upgrades that happened as part of this release. This is just the tip of the iceberg. I would like to highlight two things here. One is that the minimum version supported by Solar for Solar 9.0 is JDK 11 up from JDK 8.
And the other thing, and I'm going to get back to this in a little while, is because of the projects being split and the establishment of Solar as a top-level project, Lucene is now a dependency for Solar.
And while it might not mean too much for you, I'll talk to you in a little bit about how this is an important aspect to consider when you're working with Solar in the future.
Rate limiting. So there are times when you want to guarantee the capacity for a specific kind of request type. As an example, let's say you want to have 42 select requests be able to process at any given point in time.
And this is a JVM-level setting that allows you to specify how many requests of a specific type should be guaranteed but also can be processed. And the way it works is when a request comes in, Solar takes in the request,
tries to allocate a slot for that request. If it's unable to allocate a slot for that request, there are two options. One, it can hang on to that request for a little while, and that time can be specified by the wait for slot allocation and millisecond parameter. And the reason why you want to do that is because the other alternative is to put this request into a wait list.
Now, of course, putting something into the wait list and getting it back off the wait list is going to incur an overhead. And to avoid that overhead, you can specify how long are you willing to wait before even incurring that overhead. In addition, there are some interesting parameters like the last slot borrowing
that allows you to be able to borrow slots from other request types. So when you can borrow slots, there's also a chance that you might end up overborrowing from someone from other request type that then runs out of its own slots. So to guarantee that every request type has a certain number of slots
that it always reserves for its own use, you can specify something called the guaranteed slots that guarantees the availability for that request type at that level. One thing to remember is this is an experimental feature, so if there's something that you would want to try,
please feel free to try it and report to the community, send out an email to the mail list, bring people there, if you have any feedback or if you have any questions related to this.
So every now and then, there have been requests from users who've asked about, is there a way we could kill a request that we don't want to process or something that's taking too long or something that we don't feel
like we need a long-running task or a long-running query? And yes, there can be queries that can run for an extended duration. So there have been requests where people have asked, is there a way for us to go back to Solr and say, we've fired this query, we want to now cancel this query, and task management allows you to do exactly that and more.
What task management interface allows you to do is to be able to list, track status of, or cancel a request that has been marked as cancelable. Requests can be marked as cancelable when you send these requests. To Solr. And while this applies to queries right now, this is very easily extendable to all the other request types.
Another important thing to note here is, while it sounds very similar to the idea of time allowed, it's kind of related but orthogonal at the same time, in the sense that it allows you to short-circuit a request when it's taking too long,
but the task management interface allows you to cancel a request on the basis of much more than just time. For example, if you have a request that came in, or if you have two requests that you want to send out and you only want the response from one of those, whatever comes back sooner, as soon as the first request comes back, you can send out a request to cancel the other long-running query.
And these queries could be compute-intensive, it would be taking resources, so that's a good way to get back your resources at the same time.
Moving on to scalability. Scalability and stability are kind of closely tied. When you're trying to scale your system, you're obviously trying to look at the stability aspect of it, but at the same time, scalability kind of in isolation looks at how stable does the system stay
when you try to scale it beyond what's regularly accepted. A new experimental feature that has been introduced in Solr is that of node roles. What node roles allows users to do is to be able to specify
a certain role for a specific Solr instance. You can specify these values at startup, and right now, out of the box, Solr supports two node roles. One of them is the data role, and the other one is the overseer. There are two possible values for data, which is on and off.
There are three possible values for overseer, which is allowed, preferred, and disallowed. If you want to add a new node role, you can do so by using one of these two value sets. I'll give you an example. In a four-node cluster, like the example on the title on the right,
nodes one and two have been marked as disallowed for overseer. Node three has been marked as allowed for the overseer, and node four has been marked disallowed for data and preferred for the overseer. When you set up a system like this, what ends up happening is Solr is going to go and try and make node four as your overseer.
That might be a use case where all your instances are actually very high or heavy, trying to compute a lot of things, and you don't want your overseer, which is a central management instance, running on the same Solr instance. What you end up doing is you tell Solr
to not host any data on that instance, only treat that as an overseer. It might also be the condition where you have different hardware setups, and so you've set things up differently for your data as well as your overseer instances. When the node four goes down for whatever reason in such a case,
Solr is going to go and start looking at the instances for a new overseer election and realize that node one and two cannot be elected as overseers because you've marked that as disallowed, whereas node three can be elected as an overseer, but you could configure your Solr in a way where three is never really heavily loaded,
three is not hosting too much data, so it's running lighter loads, allowing you to run your overseer without too much to worry about.
And can I have the next slide, please? Replica placement plugins. This is a substitute for autoscaling. Autoscaling is a feature that was introduced a few versions ago and had stayed in Solr,
and I'm going to get to that along the list of things that have been deprecated or removed. Autoscaling, as it existed in previous versions, no longer exists in Solr, but replica placement plugins kind of allow users to do most of what autoscaling was trying to do or was trying to accomplish. It's an easy-to-use API
and offers a whole bunch of things or a whole bunch of plugin factories that you can use to decide where a new replica would get added. So it doesn't allow you to do any magical things, anything. It doesn't autoscale or do any of those things,
but it allows you to specify a way in which you want your data to be placed on your Solr cluster. So there's a simple placement plugin. There's a random placement factory. There's minimize score placements factory, which are kind of obvious in terms of what they do.
And then there's the affinity placement factory. Affinity placement factory is what you might want to use in your production systems if you're really worried about how your data resides or where your data resides, because it allows you to respect things like availability zone and ensure that data is distributed across, say, racks
or availability zones. Moving on, the distributed overseer. I just spoke about node roles, and I just mentioned how you can now specify a specific node
in Solr to be your overseer by specifying the node role. This feature in Solr, also an experimental feature of Solr 9, has been worked upon for a reasonably long time. Overseer is a centralized responsibility of a node
that handles cluster state updates, config API updates, as well as handles the collection API call processing. So it kind of makes sense to not have this be the single point of failure. Of course, Solr is going to go ahead and elect a new overseer
if something were to happen to your existing overseer. But in an ideal case, you don't want it to be centralized. And this is an attempt at distributing the role of the overseer across your cluster so that there's no single point of failure. So it brings in all of the benefits that a traditional or a regular distributed system would bring to you.
But at the same time, it's not the default. With 9.0, it's available for people to use, and it allows for handling and processing of distributed cluster updates, as well as collection and config API calls. You will need to opt for it, and you can try running it. If you have any feedback, again, this is one of those things that's experimental, and the community would really appreciate
any feedback that you might have on this. Can I have the next slide, please? As a system that holds data, security is a critical part of Solr,
and even more so in the last few years. A really high amount of effort has been put into making sure that Solr gets even more secure than it always has been. And while security patches have been released as part of dot releases along the edX line,
there have been a few things that were waiting that were not essential but were really good to have that we waited to be released with 9.0. So let me talk about what those features or capabilities are. Can I get the next slide, please?
The first one is the certificate authentication plugin. The certificate authentication plugin, in a single line, allows you to use client certificates end-to-end for both authentication and authorization. On an implementation level, it supports the loading of the certificate subject
via the user principle into the authorization context, allowing you to use the client certificate in an end-to-end and seamless way. This is a big win for anyone who's using client certificate because it no longer only does the authentication side of things but is an end-to-end solution.
Next up is the PKI authentication plugin. Late along the 811 lines, there was a realization that the PKI authentication format, header format,
PKI authentication is what Solr uses for all secure internet communication, which is when it's talking to each other, when solr nodes are talking to each other, PKI authentication is what it uses by default if you've set it up, if you've enabled security. I just realized that there were two issues with it.
One, it wasn't the most secure way of doing it or it wasn't secure enough way of doing it, but at the same time, the other important aspect was there were things like the timestamp that were inside the encoded value. So if you need a timestamp which can easily be outside of the encoded value and can be used to do some pre-processing and decide on a few things before putting in the compute resources required
for moving forward with processing of the request, it makes sense for these values to stay outside. And by that value, I mean the timestamp or the user. And with this update, an important thing that it does is
that it changes the encrypting from using a public key to signing with a private key. That makes the entire process way more secure, but in addition, it stores a bunch of things in the header unencrypted, allowing Solr to process these values and make decisions or short-circuit if needed
without having to go through the compute overhead of decoding values. If you're running older versions of Solr, there are a few migration paths, so please check out the change log or the reference guide to see what your options are to make sure that you switch over to the new version the right way.
I believe with 9.0, the Solr Auth v2 is the default header that's going to get used, so please make sure that you've done your bit to correctly upgrade to 9.
Under Solr 8.x, if you enable TLS in Solr, while everything worked fine and all of the interaction between a client and Solr and between Solr instances
that tells which TLS enabled, there was no way for Solr to talk to Zookeeper over TLS, and that wasn't a restriction that Solr had. That was a restriction of the Java clients offered by Zookeeper. With the zk3.7 release, which happened about a year ago,
the zk client now allows for these clients to use TLS to communicate with TLS-enabled Zookeeper clusters, allowing people to run an end-to-end TLS-secure Solr cluster where not only are Solr interactions TLS-enabled, but also Solr interactions with Zookeeper TLS-enabled.
So while it's not a critical change, it might not seem like a critical change. It's not something that's been written about too much. It's a very important change for anyone who's running Solr with TLS. Other notable changes include enabling of the Jetty request log.
The request log now logs to the right directory in a correct and compatible format for tooling, which is super useful because what it allows you to do is it allows you to go back and audit the requests to your Solr cluster. Solr, by default, doesn't log most of the requests.
For example, for a select request, by default, it wouldn't log these requests unless these are slow requests, and so you're going to miss out on a lot of access information. So in the case of a security breach, you may never know what actually happened. With the enabling of Jetty request log in 9.0, it only means that you will have that information by default.
One thing to remember, though, is, yes, it will add overhead to the amount of logs that you're collecting. So if this is something, if you're worried about security, if you want to keep it secure, please leave it enabled. If there are reasons that don't allow you to keep your request logging enabled, you might want to go back and disable it.
I don't recommend doing that, but that's an option. Another thing that's happened is in the past, if you wanted to change the log... Actually, sorry. All request handlers now support security permissions for access. So while the permissions name provider interface was something that was available for all the handlers to extend,
it wasn't the case for everything that was shipped with Solr. With 9.0, all handlers that are shipped with Solr right now allow for setting of security permissions, and there's no handler that doesn't allow for users to do that.
A lot of users in the community have, in the past, asked about ways to disable admin UI, and 9.0 allows users to disable the admin UI via system property. A lot of times, people want to do this for security aspect, not wanting to give access or avoiding accidental updates
to their Solr cluster, and so the disabling of the admin UI via system property is a super useful thing for people who only want their Solr interactions to happen through their client apps, for example. Moving on to the build side,
the build changes are really important for developers. It might be less important for users, but overall, it's also very critical for anyone who's running it on your builds. If you're someone who takes Solr, gets the code base, tries to build stuff and tries to add your own things to it,
or you have something that you do a little differently, you should be concerned about the changes in the right way, about the changes that have been introduced with Solr 9.0.
So Lucina and Solr, as I already mentioned, are now separated, and an important thing to remember there is that Solr only has Lucine as a dependency at this point. What that means is if you're running, say, a Solr plugin
that relies on some Lucine capability, while right now with Solr 9, it doesn't matter because both of them were released with the same versions. If you're looking at Solr 9, it uses Lucine 9, which kind of makes sense, but in the future, there might be a point, because these are now independent projects, where a version of Solr does not match up exactly
with a version of Lucine for reasons that the community chose. And when that happens, if you're trying to build things that rely on both Lucine and Solr, please make sure that you're respecting what Solr relies on and depends on while trying to use those Lucine capabilities.
An important change, and while there's not a lot going on it is one of those changes that took the most amount of time when it came to the 9.0 release.
It took a village to sum it up, to get the Gradle stuff, to everything, to move from Ant to Gradle. Solr is an old project. It relied on Ant for a very long time, and things were complex until someone from the community tried to add a new module,
realized how complicated editing the build.xml was, how painful it was, and the person involved was a committer, realized that if it was that difficult for that person, it certainly would be even worse for anyone who's not been associated with the project for long enough, or has not dived into the project as much.
A conversation was started to move to a more modern build system, which was Gradle. And everyone got together into moving stuff, almost everything, to Gradle. It took almost two years, I guess longer than that, to move everything to Gradle.
But right now, Solr is now completely migrated to Gradle. So if you're using Solr 9.0 or Lucine 9.0, everything is built and released using Gradle. And Gradle brings in a whole bunch of capabilities. One, it's a much more modern system that addresses the challenges that Ant had.
In addition, the capability of caching results in Gradle make running of tasks much faster. As an example, rerunning the forbidden API call went down from one minute to about five seconds. And the incremental builds are super useful with Gradle
in terms of saving developer time. There's also native support in IDEs, and adding dependencies and handling dependencies is much more easier and manageable when you're doing so with Gradle as compared to Ant. So you don't need to run Ant Clean, Clean Idea,
and a whole bunch of other stuff when you're trying to change dependencies or do something with those. Docker, the Solr Docker stuff was donated by some in the community to the PMC a while ago,
but ever since was managed outside of the Apache Solr GitHub repository by a few people. With 9.0, all of that was brought under the umbrella of the Apache PMC. So now the image creation bit is part of the Apache Solr GitHub repository.
The documentation for Docker is part of the reference guide. The official image has been upgraded to JDK 17 by Eclipse Terrain instead of JDK 11, and some people might question why so because Solr's minimum requirement is JDK 11, but the official Docker image is being shipped with JDK 17,
and the reason for that is we didn't want to wait until the next major release for switching over the JDK version. With the cadence of release for Java being much higher now than in the past, it only made sense to already hop into a more recent JDK release
instead of relying on JDK 11. And the Docker image is completely customizable, but at the same time it allows developers to create a functionally identical look limit just like the official Docker image. What I mean by official Docker image is in terms of the mount points
and the data directory and everything else associated around it. That brings me to deprecations and removal. A rather important aspect, especially because it's a major version release, allows the developer community or the maintainers, the PMC,
to remove a whole bunch of things that users may still want. So the first thing is the data import handler. It's something that wasn't really well supported over the years, but there were still users, so while instead of just deleting and removing it
and dropping all of that from the code base, it was transferred over to a third party. It's not a part of Apache code base anymore, so just remember when you're using it, when you find it on the internet, please remember that it's not offered by the Apache Solar PMC at this point. It's something that's offered by someone else outside of the Apache umbrella.
So if you run into any issues, if there are any security concerns, you need to work with those people who own the project at this point. The Solar PMC is not going to have any say in the project at this point.
Legacy cloud support. If this is not something that you know or understand, you can totally skip this slide, but for people who still use legacy cloud coming from the 4x, 5x line, it did default to false in 7.
It allowed for automatic code creation because it didn't assume that your zookeeper was truth, which is how solar cloud worked in the past. So with legacy cloud support gone, zookeeper is assumed to be the truth. So if you have cores sitting on your data directory, sitting in your solar directory, when solar comes up,
and if there's no mention of these cores in zookeeper, solar's not going to respect that. So if you rely as a hack or something else, I know there are some certain peculiar use cases in 8x specially that kind of forced you to switch on legacy cloud to do some things.
If you still rely on that, please remember that legacy cloud support is completely gone. There's no more auto-loading, of course. Zookeeper is going to be the source of truth at this point. Other changes. The state format, which was changed, I think, along the 5x lines
and had been sitting there. The old state format is no longer supported. However, the API that allowed people or users to migrate from the old state to the new state format has also been removed. So if you have any collections that were created using old solar versions, please make sure that you have migrated the state format
to a newer version as part of the 8x release. So get onto the 8x line, use the migrate state format API. Once you're done with the migrate state format API call,
then move on to 9x. If you move on to 9x, these collections will not be usable. Another thing to note is that the legacy BM25 similarity factory has been removed. So if you use that, please go through the reference guide and see what that means for you, what can you do, and what your options are.
Can I have the next slide, please?
Okay, I think I might have lost people. I'm not sure if I'm still in the call.
Yes, you are. Okay, yeah, I didn't close. I just refreshed everything, so I wasn't sure if things were still on. Moving on to HDFS support, everything has now moved on to a module.
So that doesn't really have any impact on users. Everything stays as is. It just moved outside of solar core. Can I have the next slide, please?
I think, yeah, I think we missed a few slides. We are...
So you are supposed to finish in one minute, but if you have something else to say, we need to take the question after this. Yeah, so the autoscaling framework has been replicated. I already spoke about the replacement plugin
that takes over the autoscaling framework. Please try using that. But if you're running solar on Kubernetes, one of my colleagues and solar PMC members, Houston Putman, who drives the solar operator effort, is at the conference. Please meet him. Please see him if you want to learn about solar operator.
I'm going to try and wrap this up quickly. Running out of time. Can you move to the next slide, please? Yeah, the CDCR support has been removed from solar, but there's a new SIP in the works. Check out the solar sandbox repository. Feel free to contribute. It's been actively worked on at this point.
So if cross-data center replication is something you're interested in, please pitch in, contribute in whatever way you can. Can we move to the next slide? Everything else. And the next slide, please. HDFS support has also moved to a module,
doesn't take away any of the capabilities of what HDFS supported. Or what solar supported from HDFS, but it's just moved outside of a module for better management of the code base.
Next slide, please. There've been some more logging changes, improved tracing, and the handler is now distributed, which is optional. You don't need to go to every solar instance to switch over, to switch over to log level. For a specific thing, you can now distribute this change across your solar cluster.
And MDC prefix labels are no longer hard-coded. So if you use prefix labels, please remember to set them in your solar config. They wouldn't be auto-set with Solar9. Can we move to the next slide, please? I might need a minute here. The metrics, the Prometheus exporter has seen some improvements.
The class path for which has already been pre-added to Docker, making it super easy to run. The dependencies have been improved, so it now only depends on solar change instead of relying on solar server, as well as the separation of metrics for certain top-level requests allow you to view what's a top-level request
as compared to internal requests, allowing you to also segregate these request types and see what's actually going on in your solar cluster. Can we move to the next slide? And the next one? Yeah.
So just to summarize, Solar9 offered a whole bunch of benefits. It took a lot of time, it took a lot of effort, and it made the build system much better, it improved performance, and it makes solar much more secure. What I spoke about today is just a tip of the iceberg, but there's a lot more around what was released with Solar9.0.
If you're interested in learning more, please go through the change log and the reference guide, there's a whole bunch of things that came out with Solar that the 40 minutes we're never allowed to speak about. But thank you very much for being a part of this talk,
being here with me, and thank you very much. Do we have some questions for Andrew? Okay, so I think we can leave you. Thanks for this presentation, and have a nice day.
Thank you.