What's coming next with Apache Lucene?
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 60 | |
Autor | ||
Mitwirkende | ||
Lizenz | CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/66607 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
Berlin Buzzwords 202358 / 60
5
12
20
23
33
34
35
46
49
00:00
SoftwareBildschirmfensterDiagrammComputeranimationVorlesung/Konferenz
00:31
ThetafunktionSoftwareLuceneFokalpunktAttributierte GrammatikElastische DeformationOffene MengeProgrammierumgebungSpeicherabzugGenerizitätAnalysisFunktion <Mathematik>Information RetrievalPortal <Internet>DokumentenserverVersionsverwaltungZeitabhängigkeitAbfrageAppletp-BlockCodierung <Programmierung>Token-RingStreaming <Kommunikationstechnik>Transduktor <Automatentheorie>ZustandsmaschineMultigraphGraphVektor <Datentyp>ImplementierungRechenbuchÄhnlichkeitsgeometrieVektorraumAutomatische IndexierungVektorraumEndliche ModelltheorieAppletZustandsmaschineMAPDatenbankDokumentenserverSoftwareentwicklerIterationMomentenproblemPunktMereologieBitStatistikMultiplikationsoperatorStreaming <Kommunikationstechnik>Mailing-ListeTypentheorieToken-RingResultanteVersionsverwaltungDreiecksfreier GraphBoolesche AlgebraProjektive EbeneEvoluteSuchmaschineOpen SourceImplementierungFokalpunktAbfrageAutomatische IndexierungÄhnlichkeitsgeometrieCASE <Informatik>Inverter <Schaltung>Filter <Stochastik>GraphRechenbuchNormalvektorElastische DeformationQuellcodeOffene MengeDifferenteProdukt <Mathematik>AbstandSoftwareFunktionalDatenstrukturApp <Programm>MultiplikationPatch <Software>HyperbelverfahrenPasswortDimensionsanalyseTermp-BlockMathematikComputeranimation
09:31
Transduktor <Automatentheorie>ZustandsmaschineZeitabhängigkeitAbfrageAppletp-BlockBimodulPhysikalisches SystemVektor <Datentyp>Codierung <Programmierung>Token-RingStreaming <Kommunikationstechnik>VersionsverwaltungEuklidischer RaumTrigonometrische FunktionProdukt <Mathematik>CodeCompilerRechenbuchBefehlsprozessorThreadVektorraumSchwimmkörperÜbersetzer <Informatik>Java Virtual MachineInterface <Schaltung>Funktion <Mathematik>ROM <Informatik>Elektronische PublikationZeiger <Informatik>PufferspeicherAdressraumOverhead <Kommunikationstechnik>Minkowski-MetrikLaufzeitfehlerGlobale OptimierungArchitektur <Informatik>SkalarfeldÄquivalenzklasseArithmetischer AusdruckObere SchrankeRechenbuchDämpfungProdukt <Mathematik>Globale OptimierungEuklidischer RaumResultanteAdditionSchlussregelElastische DeformationVektorraumBefehlsprozessorSpeicherabzugMultiplikationAppletMereologieBenchmarkSampler <Musikinstrument>VersionsverwaltungLoopProjektive EbeneTeilbarkeitCodeHardwareVerzeichnisdienstZusammenhängender GraphParallele SchnittstelleCoprozessorFunktionalBitAbstandDimensionsanalyseCASE <Informatik>Virtuelle AdresseOrdnung <Mathematik>ProgrammbibliothekTrigonometrische FunktionImplementierungNichtlinearer OperatorComputeranimation
13:58
SkalarfeldVektorraumSteuerwerkSchwimmkörperSkalarproduktOverhead <Kommunikationstechnik>SystemplattformLoopObere SchrankeDickeParallele SchnittstelleMereologieVektorraumResultanteXMLComputeranimation
14:20
VektorraumSchwimmkörperSkalarproduktVersionsverwaltungDesintegration <Mathematik>Klasse <Mathematik>SpeicherabzugImplementierungLuceneVerschlingungVerzeichnisdienstAppletProgrammbibliothekCodeElektronische PublikationTaskAbstrakte ZustandsmaschineModul <Datentyp>AnalysisCodierung <Programmierung>BenchmarkBitrateSampler <Musikinstrument>BootenQuellcodeBimodulPatch <Software>PASS <Programm>PufferspeicherResultanteElektronische PublikationElektronische UnterschriftPatch <Software>BimodulAppletCompilerBitPunktQuellcodeSoftwaretestVerzeichnisdienstVersionsverwaltungCodeAbfrageSkriptspracheVektorraumMultiplikationGlobale OptimierungAutomatische IndexierungElastische Deformationsinc-FunktionDimensionsanalyseMinimumLoopBefehlsprozessorEndliche ModelltheorieHyperbelverfahrenMomentenproblemDickeVirtuelle AdresseComputeranimationTechnische ZeichnungDiagramm
18:49
MultiplikationsoperatorBefehlsprozessorGeradeElement <Gruppentheorie>DämpfungPunktProdukt <Mathematik>Vorlesung/KonferenzComputeranimation
19:42
Leistung <Physik>BitLoopHeegaard-ZerlegungDickeCodeHyperbelverfahrenInverser LimesParallele SchnittstelleMereologieAdditionBefehlsprozessorVektorraumDimensionsanalyseCoprozessorBenchmarkGraphfärbungRechenbuchProdukt <Mathematik>Vorlesung/Konferenz
21:14
Vorlesung/KonferenzDiagramm
Transkript: Englisch(automatisch erzeugt)
00:07
Yeah, thank you. Thank you for the introduction Actually because when when you are submitting the talks, it's a little bit early So at the beginning of the year, you don't know what's actually coming now So I will not talk about something which might come in the next years
00:23
I will tell you today what comes out next week in Apache Lucy 9.7 So we have some great improvements. They are also ready to be used for elastic search solar and all those stuff So the the abstract text was a little bit changed. So I'm starting
00:41
First with my background. So I'm one of the Lucene committers since very very long time. I'm PMC member Of Apache Lucene and also solar but my focus is mainly on Apache Lucene Then I'm also doing a lot of elastic search and open search I'm helping
01:01
People with my company to set up elastic search solar and all that stuff often I'm doing very very special stuff like writing Clay passes and all that which is a little bit more low level I'm also working at the University of Bremen and maintaining a long-term. I have a database for a scientific data called Pangea
01:20
so When planning that talk I wanted to I figured out that this year somehow is the 25 years of birthday of Lucene somehow because in 1998 duck cutting released the first version on on source for at that time, unfortunately
01:41
I wasn't able to figure out the exact date because unfortunately the CVS repositories by source forge were completely Canceled and you cannot I think you can download them, but that was too much work to do that So I don't know the exact date. The only thing that I know is that It it's now 22 years at the Apache software foundation, so
02:02
at the moment we have 33 PMC members 98 committers a lot of developers on the list and already many many forks In the new repositories or the old repository reaching back to 2001 has 2.7 Forks, so that's just the statistics to make the original talk which I was planning to have that
02:26
Have that ready and now I'm coming back to some short features which we had in the past which lead to What we have now and what would be released next week. So actually it will also show a little bit That says steady evolution in Lucene. It's all the time going on. There's new features coming
02:45
Also, it's already 25 years old Which is really interesting for an open source project because most of them are stopping to have evolution much earlier So we talked about that already Then in 2011, which is already ten years later. We were at version
03:02
3.1. I'm just mentioning it here because at that point there was great restructuring In the community because Lucene and solar which is a search engine Search engines over on top of Lucene was released. So There was a merge and because of that Lucene got a lot new features like those token streams and everything so a little bit later
03:27
Lucene Solar 4.0 came out which a completely new API for the for the postings list and everything was now is this four-dimensional posting list API there was also a lot of talks already on Berlin passwords about finite state automatons and
03:45
Transducers, so that's was version 4 still great and it's already now 11 years since beginning Then it got to the patches of foundation Then another great thing which I am also involved is was in Lucene 6 that we have numerical queries
04:02
since that time with multiple dimensions and Java 9 support and then One interesting thing is in Lucene 8 so you see here the the releases coming more often It's like Firefox now. I have no idea So Lucene 8 got the blog marks
04:22
Weekend and I'm telling that because the interesting thing here is Actually, what was implemented by Adrian at that time was coming very I think was invented already in the 90s how to skip over Over them some some blocks in the term D and now in the posting list to figure out
04:43
Which documents are not relevant at all and skip them skip over them So the idea was and actually in 2012 we had we had a talk here at Berlin passwords by Stefan Paul from Nokia and he proposed it for the first time to implement that in Lucene
05:01
But it then took six or six or seven seven years till it was released because there was multiple iterations in creating because We wanted to have it flexible for the scoring models or should not be fixed to p.m. 25 or Whatever. So the idea here was it was very very long time and I'm telling that so actually we get something
05:24
Which is developed for a very long time and then suddenly it appears in Lucene and we can use it actually Then only two years later Lucene 9.0 came out and now I'm coming to that part Which I'm talking about and you have heard about it already
05:44
Yesterday all the time. Everybody is talking about it. So actually We got vector support in Lucene and that's the interesting thing and interestingly here the paper about that H and SW vectors was coming out in
06:03
2016 and Actually, the implementation was already ready in 2019 So we had we had the H and as W graph Available actually at that point it the H was not really implemented. So it was only
06:23
That part it came a little bit later We didn't we be only have four of those levels if you wanted if you have listened to the talk by Alessandro yesterday He explained it very very clear How that worked and actually it was merged in December 2020 and released shortly after in 21 so actually
06:41
Here we had a very very short release cycle but the problem here is now that we have now some problems in maintaining that because Actually this type of implementation for the graph doesn't really fit well with how the Lucene Scorers and everything works So basically if you are if you are executing a query and want to filter the results is also the reason why it came a little
07:05
Bit later that you can do a combined search with filters in Lucene together with vector search So basically it's still something which doesn't really fit together Well, so it could be that this implementation will also be replaced but in the meantime, we have to live with that implementation, which is not the ideal case for
07:25
an Inverted index like Lucene so the downsides of that is You have to do many many vector similarity calculations during the indexing of that graph So basically every incoming vector has to be figured out where in the graph it needs to be
07:44
So what you need is you have to compare a lot of vector Similarities there are actually three in Lucene implemented. There's a dot product Cosine similarity and this square distance, which is just adding up
08:02
The differences So basically whenever you index something you have to compare it to most of those vectors As said before we have no boolean logic at very time So it's not easy to combine that together with filters. And also there's currently no real support for
08:22
combining the scores between Lucene The normal BM25 score and the vector score the app possibilities now coming in the new version So I can tell you I don't want to talk about that today, but we will also have function query support in the next version So also solar users will be happy that they can create the dot product of the vector and at the same time multiply it
08:47
With their score, but the other downside of that is that it often does not use the HNS W-index, so it just Is a scan through all the vectors
09:00
Which is one possibility and and another thing is because of the segment structure of Lucene whenever you merge two segments You have to rebuild the whole graph of that And when doing that you have to multiply every vector with every other vector in those segments more or less and this is a
09:20
Major approach and because of that the idea now, how can we improve that vector distance calculation in the new version of Lucene? That will come out next week So how to speed that up? so actually Maybe the next version which is coming out next week. We will have native SIMD vector support
09:43
for all JVMs For 20 and 21 So this only works if you're running Lucene's Ola or elastic search with Java 20 at least or Java 21 It won't work with any version before so You will see here. It's really a good idea to update in that case. So
10:03
the problem What we what we really want to do is how to how can we improve that vector distance? So basically as I said before we have the dot product the cosine and the Euclidean distance and Actually, what was done currently Lucene we are just hoping that the hotspot optimizer is somehow
10:24
Taking that for loop. So basically I can show you the dot product here It's very very easy so basically we are just summing up over the product of all components of the vector and The question is we are hoping that this is optimized by the hotspot analyzer
10:41
Does anybody know why it does not work for that loop? It does not work So it won't vectorize it the hotspot doesn't do anything. Yeah, the reason is is quite easy. The problem is here We are summing up on this result vector here so every calculation depends on the result that was there before the problem is we have float vectors here and
11:06
Actually in the mathematical sense the addition is Cumulative so you can can switch off the order of the additions, but actually because we are using floats here This is not the case if we if we change the order of the addition the result will be different and
11:25
Due to the rules that an optimization is not allowed to change the result. There's no optimization optimization going on actually The problem here is it doesn't work you can test that out You can try to write a benchmark and put get out the stack those
11:43
Compilation logs it won't really work in Java. So and because of that with Java already 14 15 16 something around that they have They have something available to make vectorize that part of a dot product
12:01
To use the features of your CPU, which means every CPU core can execute multiple vector Multiplications in parallel so every core can do that That's one thing and the second thing is it can do that with many many dimensions already It depends if your CPU has a VX
12:22
256 so it can already fit a lot of float vectors in this 256 bits and can multiply them to each other and then it can do that in parallel on most CPUs. It's for Processing units that can do that. So actually we see here
12:40
I think we can up we can increase the speed of that by More than a factor of four and sometimes depending on the hardware that you have So but unfortunately that is not working with the hotspot optimizer So we need something else and that is project Panama If you were there last year, you remember about my talk about memory map directory so actually this is mostly to interconnect Java with your
13:05
with your With with the operating system, so it's possible to do some lower-level implementations in Java actually The first thing is I talked about that yesterday. It's memory map directory You can also call functions from the lib C or whatever
13:24
but one of the third thing is the Panama vector API and that's The really interesting thing used here. So this will come in the next version very very short I don't want to go into the details because now you see what you're all waiting for the very very horrible code
13:41
So this simple Scala loop here if you write that with the project Panama, it looks like this Actually The idea here what we are doing is we want to tell the CPU divide everything into four parts So we are splitting the whole vector into four parts and for each of those four parts. We are
14:02
multiplying those Those components and create a new vector as an accumulation result and we split it into four parts So it can be executed in parallel. We don't have any threats here So it's it's automatically doing that and at the end We are just summing up and do the aggregation together
14:22
So basically we are unrolling all that loop to give the CPU So the hotspot optimizer knows how to transform that to assembly Instructions. So basically that's what's happening here. I don't want to go into the details The important thing is what you see here on the bottom. You see this tail here. So it's very important that you are
14:43
That you are that your length of your dimensions or the vector length should be something like a multiply of That bit size of the CPU. So actually I can tell you already make your vectors multi multiples of 64 dimensions so something like 93 dimensions is a very very bad idea because actually it's in slower than
15:05
128 dimensions at the end. So that's one thing if you are creating your models use Multiples of 64 to get use of the best CPUs available from Intel at the moment which can do 512 bits in one go
15:21
So the problem now how to compile that with Java 11 That's the last thing. So actually the problem is we have a version for Java 19 for the memory map director We have a version for Java 20 We have a version for Java 21 because this is all an API which is not yet stable in the JDK So it changes with every release and it's not possible to execute the code on a later Java version
15:45
So actually in the Lucene code We need to have everything the source code of that that you have seen before in Different directories and we want to compile it. Unfortunately Lucene people don't want to download Three or four different JDKs and all the gradle is not able to do that if you want to do early release testing
16:04
So actually what we are doing here We are tweaking a little bit the JDK and we have extracted that API to char files Which we are shipping in our source directory And what we are doing is we are simply compiling against char files for Java 19 this is only the API for Java 19 and
16:22
Actually, it's a char file generated out of in a pre in in some some So so whenever you you update it you have to come you have to extract as a source code or everything That is bytecode, which would be GPL is removed so we only let this method signatures there and then we compile against those and
16:43
basically It looks like that So the interesting thing is we are just tweaking the compiler to take our char file And put patch it into the Java base module So you see here and after that we can simply compile with the Java 11 compiler the whole code and can ship it in the
17:00
Multi-release char with Apache Lucene. So basically how it works just very very quick I have three more minutes about the results. So actually this is the point where the in day where we had The indexing the plain text gigabytes per hour went up from 225 gigabytes to so it's something like 12 15 percent
17:22
I don't remember such queries here or to get the even bigger spike. Sorry for that other thing There was some buck which suddenly caused everything to get slower That wasn't Lucene and in elastic search. Thanks to Chris who is now a PMC member since today
17:41
Although here we had something. He said indexing through point in the elastic search for vectors raised by 30% and interesting is also the merge time because you know every vector has to be multiplied because everything was even 40% faster and The script score query latency improved by 40% so you can see here
18:02
Some very nice things here how to use it So for the mmap directory from last year just use Java 19 20 or 21 that works But for the new feature Java 20 and 21, which is an incubating feature in the JDK You have to pass at module JDK incubator vector to the other command line and then it will use it automatically
18:25
In Lucene so elastic search already implemented that next version of elastic search will enable that module for you If you are running with that correct version and I hope solar when it upgrades Lucene will also do the same So the important thing is the right one is not opt-in the left one is opt-in
18:43
You have to explicitly say I want to have that So now it's one minute I Thank you for this really Packed talk. Yeah. Yeah, we have time for one or two questions. Does anybody have a question on this great topic?
19:06
then One two three that was so many questions. I've got one question Thank you for the nice talk. I was just wondering the point where you were explaining the dot product
19:24
You show you divided the vector in in four is is because is one byte And you were talking about like the dot product between float element. Yeah, so that's the reason there were four lines
19:41
No, the four lanes is basically because all current CPU's Using that you could also implement more lanes, but this is just more work and coast code gets bit Bigger but the Benchmarks we were doing Just said okay. It's enough to unroll that loop into four parts and then we are just dividing it
20:00
So basically you're trying to divide it into into into four parts Because that's what can be done in parallel and the rest we are just splitting based on the lengths of the processing unit so if it's 256 bits, that's a constant that you can query from the From the JVM which is the supported length of those vectors and then you can use that in your for loop
20:23
So it's basically we're splitting first in four and then we are using the rest with splitting into into in the loop using Using the number or as though they exactly to calculate the dot product for the bit size and at the end very low if you have something like an
20:40
not not really power of two Dimension lengths. We are just calculating the rest of the calculation in the conventional way by doing its color So actually if you have something like 100 dimensions It's actually slower than 128 dimensions because at the end you have something I think
21:01
This current CPUs. It's you still need to do additional move vacations at the end just to get exactly Those limits. Thank you. Does it explain it somehow but you can also talk about that a little bit later Alright great. Thank you very much Thank you very much audience