We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

What's coming next with Apache Lucene?

00:00

Formale Metadaten

Titel
What's coming next with Apache Lucene?
Serientitel
Anzahl der Teile
60
Autor
Mitwirkende
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Around Lucene 8 most people thought "There's not much that can be done anymore". In contrast to that, if you look into Apache Lucene's list of new features after each release, you will see mostly 2 new areas of improvements: Vector search and performance improvements. Is this the end of development? For sure: No! This talk will check how ongoing optimizations in the Java ecosystem might be implemented in Apache Lucene. As example, this will present the new vector incubation module in recent JDKs and how it helps to make indexing and searching much faster starting with Java 20.
SoftwareBildschirmfensterDiagrammComputeranimationVorlesung/Konferenz
ThetafunktionSoftwareLuceneFokalpunktAttributierte GrammatikElastische DeformationOffene MengeProgrammierumgebungSpeicherabzugGenerizitätAnalysisFunktion <Mathematik>Information RetrievalPortal <Internet>DokumentenserverVersionsverwaltungZeitabhängigkeitAbfrageAppletp-BlockCodierung <Programmierung>Token-RingStreaming <Kommunikationstechnik>Transduktor <Automatentheorie>ZustandsmaschineMultigraphGraphVektor <Datentyp>ImplementierungRechenbuchÄhnlichkeitsgeometrieVektorraumAutomatische IndexierungVektorraumEndliche ModelltheorieAppletZustandsmaschineMAPDatenbankDokumentenserverSoftwareentwicklerIterationMomentenproblemPunktMereologieBitStatistikMultiplikationsoperatorStreaming <Kommunikationstechnik>Mailing-ListeTypentheorieToken-RingResultanteVersionsverwaltungDreiecksfreier GraphBoolesche AlgebraProjektive EbeneEvoluteSuchmaschineOpen SourceImplementierungFokalpunktAbfrageAutomatische IndexierungÄhnlichkeitsgeometrieCASE <Informatik>Inverter <Schaltung>Filter <Stochastik>GraphRechenbuchNormalvektorElastische DeformationQuellcodeOffene MengeDifferenteProdukt <Mathematik>AbstandSoftwareFunktionalDatenstrukturApp <Programm>MultiplikationPatch <Software>HyperbelverfahrenPasswortDimensionsanalyseTermp-BlockMathematikComputeranimation
Transduktor <Automatentheorie>ZustandsmaschineZeitabhängigkeitAbfrageAppletp-BlockBimodulPhysikalisches SystemVektor <Datentyp>Codierung <Programmierung>Token-RingStreaming <Kommunikationstechnik>VersionsverwaltungEuklidischer RaumTrigonometrische FunktionProdukt <Mathematik>CodeCompilerRechenbuchBefehlsprozessorThreadVektorraumSchwimmkörperÜbersetzer <Informatik>Java Virtual MachineInterface <Schaltung>Funktion <Mathematik>ROM <Informatik>Elektronische PublikationZeiger <Informatik>PufferspeicherAdressraumOverhead <Kommunikationstechnik>Minkowski-MetrikLaufzeitfehlerGlobale OptimierungArchitektur <Informatik>SkalarfeldÄquivalenzklasseArithmetischer AusdruckObere SchrankeRechenbuchDämpfungProdukt <Mathematik>Globale OptimierungEuklidischer RaumResultanteAdditionSchlussregelElastische DeformationVektorraumBefehlsprozessorSpeicherabzugMultiplikationAppletMereologieBenchmarkSampler <Musikinstrument>VersionsverwaltungLoopProjektive EbeneTeilbarkeitCodeHardwareVerzeichnisdienstZusammenhängender GraphParallele SchnittstelleCoprozessorFunktionalBitAbstandDimensionsanalyseCASE <Informatik>Virtuelle AdresseOrdnung <Mathematik>ProgrammbibliothekTrigonometrische FunktionImplementierungNichtlinearer OperatorComputeranimation
SkalarfeldVektorraumSteuerwerkSchwimmkörperSkalarproduktOverhead <Kommunikationstechnik>SystemplattformLoopObere SchrankeDickeParallele SchnittstelleMereologieVektorraumResultanteXMLComputeranimation
VektorraumSchwimmkörperSkalarproduktVersionsverwaltungDesintegration <Mathematik>Klasse <Mathematik>SpeicherabzugImplementierungLuceneVerschlingungVerzeichnisdienstAppletProgrammbibliothekCodeElektronische PublikationTaskAbstrakte ZustandsmaschineModul <Datentyp>AnalysisCodierung <Programmierung>BenchmarkBitrateSampler <Musikinstrument>BootenQuellcodeBimodulPatch <Software>PASS <Programm>PufferspeicherResultanteElektronische PublikationElektronische UnterschriftPatch <Software>BimodulAppletCompilerBitPunktQuellcodeSoftwaretestVerzeichnisdienstVersionsverwaltungCodeAbfrageSkriptspracheVektorraumMultiplikationGlobale OptimierungAutomatische IndexierungElastische Deformationsinc-FunktionDimensionsanalyseMinimumLoopBefehlsprozessorEndliche ModelltheorieHyperbelverfahrenMomentenproblemDickeVirtuelle AdresseComputeranimationTechnische ZeichnungDiagramm
MultiplikationsoperatorBefehlsprozessorGeradeElement <Gruppentheorie>DämpfungPunktProdukt <Mathematik>Vorlesung/KonferenzComputeranimation
Leistung <Physik>BitLoopHeegaard-ZerlegungDickeCodeHyperbelverfahrenInverser LimesParallele SchnittstelleMereologieAdditionBefehlsprozessorVektorraumDimensionsanalyseCoprozessorBenchmarkGraphfärbungRechenbuchProdukt <Mathematik>Vorlesung/Konferenz
Vorlesung/KonferenzDiagramm
Transkript: Englisch(automatisch erzeugt)
Yeah, thank you. Thank you for the introduction Actually because when when you are submitting the talks, it's a little bit early So at the beginning of the year, you don't know what's actually coming now So I will not talk about something which might come in the next years
I will tell you today what comes out next week in Apache Lucy 9.7 So we have some great improvements. They are also ready to be used for elastic search solar and all those stuff So the the abstract text was a little bit changed. So I'm starting
First with my background. So I'm one of the Lucene committers since very very long time. I'm PMC member Of Apache Lucene and also solar but my focus is mainly on Apache Lucene Then I'm also doing a lot of elastic search and open search I'm helping
People with my company to set up elastic search solar and all that stuff often I'm doing very very special stuff like writing Clay passes and all that which is a little bit more low level I'm also working at the University of Bremen and maintaining a long-term. I have a database for a scientific data called Pangea
so When planning that talk I wanted to I figured out that this year somehow is the 25 years of birthday of Lucene somehow because in 1998 duck cutting released the first version on on source for at that time, unfortunately
I wasn't able to figure out the exact date because unfortunately the CVS repositories by source forge were completely Canceled and you cannot I think you can download them, but that was too much work to do that So I don't know the exact date. The only thing that I know is that It it's now 22 years at the Apache software foundation, so
at the moment we have 33 PMC members 98 committers a lot of developers on the list and already many many forks In the new repositories or the old repository reaching back to 2001 has 2.7 Forks, so that's just the statistics to make the original talk which I was planning to have that
Have that ready and now I'm coming back to some short features which we had in the past which lead to What we have now and what would be released next week. So actually it will also show a little bit That says steady evolution in Lucene. It's all the time going on. There's new features coming
Also, it's already 25 years old Which is really interesting for an open source project because most of them are stopping to have evolution much earlier So we talked about that already Then in 2011, which is already ten years later. We were at version
3.1. I'm just mentioning it here because at that point there was great restructuring In the community because Lucene and solar which is a search engine Search engines over on top of Lucene was released. So There was a merge and because of that Lucene got a lot new features like those token streams and everything so a little bit later
Lucene Solar 4.0 came out which a completely new API for the for the postings list and everything was now is this four-dimensional posting list API there was also a lot of talks already on Berlin passwords about finite state automatons and
Transducers, so that's was version 4 still great and it's already now 11 years since beginning Then it got to the patches of foundation Then another great thing which I am also involved is was in Lucene 6 that we have numerical queries
since that time with multiple dimensions and Java 9 support and then One interesting thing is in Lucene 8 so you see here the the releases coming more often It's like Firefox now. I have no idea So Lucene 8 got the blog marks
Weekend and I'm telling that because the interesting thing here is Actually, what was implemented by Adrian at that time was coming very I think was invented already in the 90s how to skip over Over them some some blocks in the term D and now in the posting list to figure out
Which documents are not relevant at all and skip them skip over them So the idea was and actually in 2012 we had we had a talk here at Berlin passwords by Stefan Paul from Nokia and he proposed it for the first time to implement that in Lucene
But it then took six or six or seven seven years till it was released because there was multiple iterations in creating because We wanted to have it flexible for the scoring models or should not be fixed to p.m. 25 or Whatever. So the idea here was it was very very long time and I'm telling that so actually we get something
Which is developed for a very long time and then suddenly it appears in Lucene and we can use it actually Then only two years later Lucene 9.0 came out and now I'm coming to that part Which I'm talking about and you have heard about it already
Yesterday all the time. Everybody is talking about it. So actually We got vector support in Lucene and that's the interesting thing and interestingly here the paper about that H and SW vectors was coming out in
2016 and Actually, the implementation was already ready in 2019 So we had we had the H and as W graph Available actually at that point it the H was not really implemented. So it was only
That part it came a little bit later We didn't we be only have four of those levels if you wanted if you have listened to the talk by Alessandro yesterday He explained it very very clear How that worked and actually it was merged in December 2020 and released shortly after in 21 so actually
Here we had a very very short release cycle but the problem here is now that we have now some problems in maintaining that because Actually this type of implementation for the graph doesn't really fit well with how the Lucene Scorers and everything works So basically if you are if you are executing a query and want to filter the results is also the reason why it came a little
Bit later that you can do a combined search with filters in Lucene together with vector search So basically it's still something which doesn't really fit together Well, so it could be that this implementation will also be replaced but in the meantime, we have to live with that implementation, which is not the ideal case for
an Inverted index like Lucene so the downsides of that is You have to do many many vector similarity calculations during the indexing of that graph So basically every incoming vector has to be figured out where in the graph it needs to be
So what you need is you have to compare a lot of vector Similarities there are actually three in Lucene implemented. There's a dot product Cosine similarity and this square distance, which is just adding up
The differences So basically whenever you index something you have to compare it to most of those vectors As said before we have no boolean logic at very time So it's not easy to combine that together with filters. And also there's currently no real support for
combining the scores between Lucene The normal BM25 score and the vector score the app possibilities now coming in the new version So I can tell you I don't want to talk about that today, but we will also have function query support in the next version So also solar users will be happy that they can create the dot product of the vector and at the same time multiply it
With their score, but the other downside of that is that it often does not use the HNS W-index, so it just Is a scan through all the vectors
Which is one possibility and and another thing is because of the segment structure of Lucene whenever you merge two segments You have to rebuild the whole graph of that And when doing that you have to multiply every vector with every other vector in those segments more or less and this is a
Major approach and because of that the idea now, how can we improve that vector distance calculation in the new version of Lucene? That will come out next week So how to speed that up? so actually Maybe the next version which is coming out next week. We will have native SIMD vector support
for all JVMs For 20 and 21 So this only works if you're running Lucene's Ola or elastic search with Java 20 at least or Java 21 It won't work with any version before so You will see here. It's really a good idea to update in that case. So
the problem What we what we really want to do is how to how can we improve that vector distance? So basically as I said before we have the dot product the cosine and the Euclidean distance and Actually, what was done currently Lucene we are just hoping that the hotspot optimizer is somehow
Taking that for loop. So basically I can show you the dot product here It's very very easy so basically we are just summing up over the product of all components of the vector and The question is we are hoping that this is optimized by the hotspot analyzer
Does anybody know why it does not work for that loop? It does not work So it won't vectorize it the hotspot doesn't do anything. Yeah, the reason is is quite easy. The problem is here We are summing up on this result vector here so every calculation depends on the result that was there before the problem is we have float vectors here and
Actually in the mathematical sense the addition is Cumulative so you can can switch off the order of the additions, but actually because we are using floats here This is not the case if we if we change the order of the addition the result will be different and
Due to the rules that an optimization is not allowed to change the result. There's no optimization optimization going on actually The problem here is it doesn't work you can test that out You can try to write a benchmark and put get out the stack those
Compilation logs it won't really work in Java. So and because of that with Java already 14 15 16 something around that they have They have something available to make vectorize that part of a dot product
To use the features of your CPU, which means every CPU core can execute multiple vector Multiplications in parallel so every core can do that That's one thing and the second thing is it can do that with many many dimensions already It depends if your CPU has a VX
256 so it can already fit a lot of float vectors in this 256 bits and can multiply them to each other and then it can do that in parallel on most CPUs. It's for Processing units that can do that. So actually we see here
I think we can up we can increase the speed of that by More than a factor of four and sometimes depending on the hardware that you have So but unfortunately that is not working with the hotspot optimizer So we need something else and that is project Panama If you were there last year, you remember about my talk about memory map directory so actually this is mostly to interconnect Java with your
with your With with the operating system, so it's possible to do some lower-level implementations in Java actually The first thing is I talked about that yesterday. It's memory map directory You can also call functions from the lib C or whatever
but one of the third thing is the Panama vector API and that's The really interesting thing used here. So this will come in the next version very very short I don't want to go into the details because now you see what you're all waiting for the very very horrible code
So this simple Scala loop here if you write that with the project Panama, it looks like this Actually The idea here what we are doing is we want to tell the CPU divide everything into four parts So we are splitting the whole vector into four parts and for each of those four parts. We are
multiplying those Those components and create a new vector as an accumulation result and we split it into four parts So it can be executed in parallel. We don't have any threats here So it's it's automatically doing that and at the end We are just summing up and do the aggregation together
So basically we are unrolling all that loop to give the CPU So the hotspot optimizer knows how to transform that to assembly Instructions. So basically that's what's happening here. I don't want to go into the details The important thing is what you see here on the bottom. You see this tail here. So it's very important that you are
That you are that your length of your dimensions or the vector length should be something like a multiply of That bit size of the CPU. So actually I can tell you already make your vectors multi multiples of 64 dimensions so something like 93 dimensions is a very very bad idea because actually it's in slower than
128 dimensions at the end. So that's one thing if you are creating your models use Multiples of 64 to get use of the best CPUs available from Intel at the moment which can do 512 bits in one go
So the problem now how to compile that with Java 11 That's the last thing. So actually the problem is we have a version for Java 19 for the memory map director We have a version for Java 20 We have a version for Java 21 because this is all an API which is not yet stable in the JDK So it changes with every release and it's not possible to execute the code on a later Java version
So actually in the Lucene code We need to have everything the source code of that that you have seen before in Different directories and we want to compile it. Unfortunately Lucene people don't want to download Three or four different JDKs and all the gradle is not able to do that if you want to do early release testing
So actually what we are doing here We are tweaking a little bit the JDK and we have extracted that API to char files Which we are shipping in our source directory And what we are doing is we are simply compiling against char files for Java 19 this is only the API for Java 19 and
Actually, it's a char file generated out of in a pre in in some some So so whenever you you update it you have to come you have to extract as a source code or everything That is bytecode, which would be GPL is removed so we only let this method signatures there and then we compile against those and
basically It looks like that So the interesting thing is we are just tweaking the compiler to take our char file And put patch it into the Java base module So you see here and after that we can simply compile with the Java 11 compiler the whole code and can ship it in the
Multi-release char with Apache Lucene. So basically how it works just very very quick I have three more minutes about the results. So actually this is the point where the in day where we had The indexing the plain text gigabytes per hour went up from 225 gigabytes to so it's something like 12 15 percent
I don't remember such queries here or to get the even bigger spike. Sorry for that other thing There was some buck which suddenly caused everything to get slower That wasn't Lucene and in elastic search. Thanks to Chris who is now a PMC member since today
Although here we had something. He said indexing through point in the elastic search for vectors raised by 30% and interesting is also the merge time because you know every vector has to be multiplied because everything was even 40% faster and The script score query latency improved by 40% so you can see here
Some very nice things here how to use it So for the mmap directory from last year just use Java 19 20 or 21 that works But for the new feature Java 20 and 21, which is an incubating feature in the JDK You have to pass at module JDK incubator vector to the other command line and then it will use it automatically
In Lucene so elastic search already implemented that next version of elastic search will enable that module for you If you are running with that correct version and I hope solar when it upgrades Lucene will also do the same So the important thing is the right one is not opt-in the left one is opt-in
You have to explicitly say I want to have that So now it's one minute I Thank you for this really Packed talk. Yeah. Yeah, we have time for one or two questions. Does anybody have a question on this great topic?
then One two three that was so many questions. I've got one question Thank you for the nice talk. I was just wondering the point where you were explaining the dot product
You show you divided the vector in in four is is because is one byte And you were talking about like the dot product between float element. Yeah, so that's the reason there were four lines
No, the four lanes is basically because all current CPU's Using that you could also implement more lanes, but this is just more work and coast code gets bit Bigger but the Benchmarks we were doing Just said okay. It's enough to unroll that loop into four parts and then we are just dividing it
So basically you're trying to divide it into into into four parts Because that's what can be done in parallel and the rest we are just splitting based on the lengths of the processing unit so if it's 256 bits, that's a constant that you can query from the From the JVM which is the supported length of those vectors and then you can use that in your for loop
So it's basically we're splitting first in four and then we are using the rest with splitting into into in the loop using Using the number or as though they exactly to calculate the dot product for the bit size and at the end very low if you have something like an
not not really power of two Dimension lengths. We are just calculating the rest of the calculation in the conventional way by doing its color So actually if you have something like 100 dimensions It's actually slower than 128 dimensions because at the end you have something I think
This current CPUs. It's you still need to do additional move vacations at the end just to get exactly Those limits. Thank you. Does it explain it somehow but you can also talk about that a little bit later Alright great. Thank you very much Thank you very much audience