Bestand wählen
Merken

Accurate polygon search in Lucene Spatial (with performance benefits to boot!)

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
the good morning and when I think about all the situations in which we all analyze spatial data we all very different cases but very often accuracies were important and so whether we're tracking forest fires or we're trying to set up a landing zone in our front yard for drone delivery of or even just asking you what times in my and you know if you did that wrong has pretty bad implications for the rest of it and so I won't talk today about a case where a particular open-source tool we've seen has traditionally been inherently not exact always from and how we fix that we contribute fixed back so that you can rest assured of accurate results in turns out we also get performance benefits so imagine we have a
database of units of land stored in them along in it along with their shapes on the Earth in this case this is a bunch of farm fields from across the western states almost call to these are units of land documents so you've got some maybe data about as much abuse and the shape on there and I 1 issue a query we zoom in here
but to a few of them I 1 issue query this white circle the uh asking show me all the documents that intersect migration and you know the correct match should be these the darker shapes in the background here and you would expect we assume that will always be the right things back and we don't always and so
for example in this situation it it's possible via the this the internals of a spatial index and how this works on that we might get this this red get back instead of of or in addition I in this is really the problem that I talk about today but now I want to know about the front that this is only relevant if Ukraine against polygons if if you documents points like bus stops whatever it's not that you know some of you but if you're dealing with census blocks or in recovery areas or building footprints anything from the government where they like to split land parcels from which a polygons this as well the no so today I will talk about this
problem of the fact degrees around say why we care and in order to attend explain why this happens we will give a little bit into the guts of an index some of the loss after from and I will talk about 2 solutions that we've implemented and of benchmark some of those 2 so that it's actually not not a bad thing to fix is false false matches and then all on a talk about the current status of the work in the free of supper community about when I get back to you know talking about these queries around on in if you store polygon items in the scene they're interested in role and so who cares about I would put on a corporation
we can we use a variety a huge variety of polygon datasets so in this illustration of these you can thick black lines show the outline a fields on a growers farmers and we might want to query that says something like show me all of the soil types under this handful of farm fields so the cell types here the the blue shaded white townships in any field and we use a become a corporation these datasets for all sorts of things are we would offer insurance in case of bad weather Tavares and also decision analytics to help farmers and questions why we ensure that plant a crop this year the common will be workable how much Nagin is my field 22 apply fertilizer so I put the example of polygons include the the farm fields and soil like showed here also counties time zones
I'm actually this picture earlier but this time we actually have an about 30 million farm field across the Midwest and the very 1st thing that a new user does when there is a website as they want they need to pick which fields are are there's right and and so the browser is issuing a polygon query i.e. the like the viewport of the browser window back to a database and the new show them all the fields in their vicinity underslip enough of it so this is a spatial query the and and then build you click on the ones that are actually and were once
they show me all the hail from the last 24 hours over Kansas in this case the documents with indexed in the database of the tales shapes include and of the query might be the state of Kansas or we may say show me all hail over particular farm fields and we'll use that to send the growth of a text message saying you've got hell and it was 1 of the
world's I would be folly rich and they are in the words of so turn that losses are active accuracy is deeply important to us and and because in E errors at the back in the data layer will bubble up in compound themselves through our models are insights and recommendations we provide farmers and so the comic corporation we noticed this inelastic search and we've seen on that this was happening so we had to find a solution so they're not everyone has such a high accuracy requirement if if I'm preparing legal documents yes if I'm firing a missile it's a problem is if we send it far a text message saying you got a feel that actually was University of that's a problem but if I am in Portland and looking for a dinner spot like part 2 blocks away my phone do the 3 blocks away it's OK but it turns out that even if you don't have the high accuracy requirements vary can be performance benefits to the solution of this all discuss lot so and take a step back
and talk about what indexes and use that to describe what happens on the index and they're terms enable us to search efficiently so if I got documents this gives a textual documents that are the workshops and abort apostrophe if I'm looking for all documents about spatial tempura old I find that term in the index the list of terms on the left from I can find can do that search efficiently to the sorted in the organs for this and it'll tell me that the the document on the bottom in blue here from contains outward verify what documents about server I can use in process and I see that I'm applicable documents what is important to note that in this is a very often approximate so you if you look closely words like 2 are and are not indexed and this is typically done on purpose but it is essentially and if you think like of of an index in a book and they help to find what's relevant but the index alone can't tell the whole story so we've seen is a free open source
software package in Java that implements the have index I just showed although it's far more fully featured and then and to distribute that feature set out for scalability there are a couple other projects solar and search which basically expose losing I'm in a distributed environment and there's a on in the scene and call the scene spatial which lets me index not only text but also all against that of talking about and also losing also exposes functionality the always come operation of using Elasticsearch pretty well in in a way that the
Lucene facial less you next polygons is the constructs is called a tree-based spatial index where it takes the to the entirety of the Earth and from there they like today or all local and it splits it into a coarse grid and it'll iteratively repeat this process splitting finer and finer grid cells to some desired precision for your query and this is extremely analogous to map tiles of that in this example that I'm showing here is a quadtree the other tree-based visual indexes like are trees that were just mentioned or kd-trees do us all these aren't but turns out that in tree and that's
will has the possibility of using false matches because they approximate everything they present the entire world interacting and to illustrate to really
driver so here's an example that if you imagine the wide red here is from some fixed depth of this hierarchical tree the grid there this is an oversimplification of goes there but that generalizes but if you imagine that this is the the grid of our special industry and white and I want to index this shape in red what I do is I look for all the cells that intersect that
and if we assign a unique ID at every single cell in the entire grid of those cells become the terms of the index that point to the document which is structured the so we can just you know take the general structure of the index of terms . documents and and use this as well and so that's great but there is a problem convicted because you'll notice that these you know the cells in the shape don't exactly match up to surprise you this this blue circle is like a point where ship query on basically all go consult the index or find what the what grid-cell intersects my query but for that idea the next it will point to this red document and it'll tell me as a match which is not true they don't actually so this is really what happens is is this is what happens and so now but that other way we can talk about couple
solutions remember that I said earlier that at least for my company so we consider accuracy 1st and performance I 1st solution was
to put a wholly separate server between the client whether that be a researcher or the front website on and seeing and this this server would do verification of the matches that got we got back from the next so the client issues this query the green circle outside the white circle send it to the verification server which hangs onto it for later but also forwards this query on to Lucene or Elastic and that consults the tree index which finds at the bottom here you know let's say for candidate matches but most of which are probably right and that is a possibility that maybe there's axis and that don't actually basically and so these kids and back into the verification server which is a larger query in it does a brute force and iteratively no intersection comparison with the candidate matches so it says do you really intersect like with the new the next you really as agree on this is computationally computationally intensive operation but at the end we get back from what we send back to the client only the matches the the thing that I think the real thing to kind of
emphasize here is that there's 2 servers and so
I had this is totally accurate works great for us and it really made sense at the time that we implemented this maybe 2 years ago and because the lucene internals are kind of impenetrable as you really understand the code and whereas you know the solution is simple straightforward example that we can have a brand new higher up to speed and weak that's of text so there are however some edge cases and the kind of weird obscures on which a time if we have time to talk about but but but perhaps a little more important is the latency and both in terms of transferring these kind of false matches back through the network in the parsing and we have to deposit both on the verification server side that is to do the brute-force verification resale eyes and since the real match matches back to the client which again parser islamic uh expensive because these are duties on which is quality format it's you taking a bunch of floating point numbers and serializing them as ASCII strings and it's it ends up being quite good especially when the shapes that we have our not nice squares and stuff but there no like this farm fields hail shapes the soil that it really following no no that's pretty fatter
so in distributed systems when it is expensive to move data around like this you move the code to the data and this is the thing that I've been saying throughout this conference as well which is great and so I said a solution and takes apart and is exactly
the same such as before except we've moved it all into this in special so the client user query directly to the scene it consults the tree index it gets back to the US and Canada documents some of which may be wrong kind then it does the brute-force post-filtering right there in using next to the index and send back the verified matters to us so we have the exact same Agassi before but since we don't have a successor in between and we get a performance and I was like a little bit
about the using internals of how this is implemented we say that we use to indexing strategies in conjunction with each other but the first one is what's always been there but it's recursive prefix-tree strategy which is kind of illustrated with tiles the so you've got the this spatial grid um and then we use another so uh indexing strategy so when you index a document and the new query after cancel document all strategies and this was called serialized stock value strategy because it's serializes the documents geometries themselves not just the grid cells but the actual shape of the polygon a multiple but in the index itself we use a very efficient serialization cobalamin bytes which and we benchmarked at least in terms of BOM silk dieselisation to be 7 times faster 70 7 0 times faster than do and present and and is also quite small and it's if you don't write Rajala
you can test this out for the next couple slides but I do I do really show how easy this is to implement if you are using the same and so this is the southern white is all Cody primary have in your application but you create a spatial odds object which takes in this case point is the query that I'm searching for a you take the jury geometric operations that I want to use so in in other words I'm kind of asking show me documents that intersect this point and we can solve the recursive prefix tree index so I've got some field called geometry and I make a tree strategy and I create a query using that dietary strategy and then the core itself to use the new still ICT strategy for a total actors in it's only this much so we create the instantiate the other strategy and and we use this uh combined query called a filter query essentially combines the 2 in the in the bottom here and so it it consults the tree query 1st and consults the verification strategy and this very last line is very important and what it says query 1st filter strategy that's the scene speak for saying always addressed always look for that the tree query 1st and then look at the verification and the reason for this is like I mentioned the verification is actually a rather computationally intensive process at least relative to looking an index and if we were to have mistakenly done that strategy 1st we would essentially be like reinforced matching every single document index I was unity so this idea and of the job is over and I
wanted to see how we got I want to test the performance of this new strategy to make sure that many the such a hit that can have additional that that I've been talking about on the server side is acceptable and so I set up 29 different we've seen annexes content of which use only the prefix tree strategy in other words the old fashioned way in the 20 and 10 of them use both combined and i in the 10 are actually it every 15 different tree levels of the depth which is the precision that can cost the and for each of these I measure the the this is a house slows a query and for a single point query and I've I've done this from and different server but both within like the state of Virginia traveling at the speed of light in talking to each other but and the reason I did that is because I mentioned that 1 of the causes of latency was there were but also at the index size
to illustrate what i was this may be hard to see me doing things here as in time but essentially it's out of 14 thousand documents that are of 4 kilometer by 4 400 squares they don't overlap each other covering the state of Oregon and I know how visible it is there is somewhere in there a single like orange point that i is my query and if you think about it since the point query and the document overlap the the correct response has 1 document I want 1 thing that this is kind of like geocoding away what was it and so I'm and also just to illustrate this is these are the this is the prefix tree grid on that I use it you Astrid and the grid at level 1 degree is actually much question image so it up so it but and you can see a gate finer and finer as you increase resolution of the industry I'm inviolable 5 the a single cells about the same size as the document now they're not you know properly aligned so a single grid cell might still aspect 3 or 4 documents and then if you get smaller still you actually have needed cells within a single document the so results and putting the
latency the pink line here is the tree query alone and you can see that in the 1st 3 for the 1st 3 3 levels were very coarse grid um there's a huge latency on the client side and this is exactly in line at OSHA is exactly in line with the number of false positives like get that so the latency of polling those actions of disk on the server sending them to the client I'm not even including parsing is quite significant the and whereas in the blue line usable strategies in conjunction it does the filtering on a server license that far fewer it was in fact no false positives and we have far faster queries spots you'll notice still that of retrieval like 1 3 3 the blue line is is slower than for lower tree levels and this is because it is a brute force matching every single all these fossils and the fact we look back here
like it too little too you know probably 80 per cent of my documents i still being reversed time
also these ladies times on the left the MS for 100 2 consecutive queries non-parallel so it's not that a single query takes in 24 hours I the In my looking at this diagram you may think that well I can just index everything atrial for forget about the celestial strategy and extended it but in fact all the way up to the level 7 I'm still getting false positives back from the tree you the tree only so is there a lot fewer its maybe only 5 false positives but know that I'm still getting back in inaccurate 7 at which the at the a 2nd strategy gets rid of and so this is kind of 1 did note and that the false that reducing the false positive false matches using this UID strategy you cost you nothing you eliminate well I and I mentioned the you know could exact matches to about 7 but at depth 7 if you use the tree alone and the index size as to explode actually indicates the index size on disk stars to explode because for a single document I might be indexing you know 30 cells so this is a huge blow in really would be the but and you know so in fact what we kind of like to do is I who choose the tree level like 3 or 4 but use the sale ICT strategy the so we get good low latency and make a small and access in this another important conclusion that we can keep a small index on disk and still get fully actor results I'm in other words more accurate results in fast responses will fall in the so there's this also shows 1 other thing a concern that we have is the since were analysis utilizing every single document next to the index every single geometry and that we put our index just from that alone but of that size is constant obviously depends on the size of year the documents you have but in the in my example this is 1 megabyte and because of the official invites compression and then it's in a constant across all 3 levels so we started to get off I'm in the free open-source
after world we have contributed this back to losing spatial is released in version 4 . 7 I think we're out for a 10 but and so on are open tickets in inelastic efficient solution actually quite easy to expose and then it's not done yet but if you're interested if we look at all so happy to discuss and as I imagine this is something we'll do this analysis which world but question is when so I'm happy to talk the afterwards the and so to conclude to wrap up on spatial indexes I typically approximate in other databases may have this from 2 other they may often addresses and so we started out achieving accuracy that do 1st matching all the candidates and then we also increase performance on us in solution by moving the computation uh to the data and this is easy to do cost you nothing and never say I do want to give my thanks to David Smiley's vinifera he did most of the scene sideward and write my colleagues to help produce some of the diagrams in slides so thank you so much for coming out today I hope this is useful to you project questions I thank you this 1 that the of a
broader question in terms of of obviously what other functions for spatial operations that are are you guys looking that doing elastic search that already exist and say a PostGIS database database topological comparisons etc. etc. kind what's next on the menu of you know a robust spatial operations in an elastic search on your wish list for your company the joint I don't think it's likely and we have evaluated this year so I and it does that mean last research provides pretty much all almost all of the functions that we need are we don't really care about topology for it to the example of but yes for the Austin because we could do joins across datasets which we can kind of like hacker losses search but is not as efficient and it's it's a little work but it's OK but the reason that we don't use those GIS because again it could also offer the exact same features plus more is scalability we offer Volodina sets the alright and for a lot of our datasets the won't we tried know pushing it you can scale up you can scale out and now of course because like charter datasets scale up that way but then we've lost the 1 thing that was yeah spices master they're bold all since after the here line the the of in and so the question is how did you at some point make this decision between solar and you know you the product of the of the of the elastic and how to make that decision we did microphone we did an instance so I you know I don't he number 1 and it's even less search further 2 years yeah sorry which probably implies that there may not be great reason we have to pick 1 I but is do you have any observations on them the kind that spatial data that you're indexing and the size of the index itself when in the coastline banner was just simple trick here so so 1 of the can obtain points for a to the public and so we index is well they to be clear we've got a lot of documents the fields I showed you that's 30 million of them and I wanna see it's indexes somewhere around is quite large and this you know this is only 1 of many like the hail ocean of those 0 change over time I should you change as you know we get updates but 1 thing that is screw some data is even if the the data itself dataset itself is not a huge or if it is huge so we get really weird shape scores were really shocking things in nature like river in a rivers not just lines but maybe soil under reverse this like really narrow but squiggly polygon as a single shape might have hundreds of thousands of vertices uh which we can simplify at some loss of resolution is essentially it's not a problem to the index is the center of itself but it's a problem to anything we do it and we've done things like taken if you had a really long in shape and that'll spend a lot of good cells if I want to do a query for this really like shape based on this query here only concerned with the part that's under Macquarie but I'm going to get back all the vertices after about so we've tried things like part the splitting of partitioning the shapes into multiple documents with that in some way to like tell that the really the same thing how did you tackle the problem of updates to dataset and keeping your indexes and then that's a great question most of our updates at least in what we do an elastic search our batch updates its government we get from its data we get from governments or universities and we get at once a month once a year once a day sometimes but I am not a lot of online rights hi the chopper code that you showed up there was In the last 2 searchable was that like a separate piece of code sitting outside and calling
things In this Java code is not implemented unless Easyjet but you would write something that looks a lot like this in Elasticsearch and I've the it's actually ElasticSearch core but but not yet but um yes so this is good that I wrote just the benchmarking where I was not going through lessons there's just for simplicity I was just calling seen directly but this is the code as a as a client to the scene you right so laughter certain so our class to be C so if you use elastic Search and we implemented in lesser search you will actually never even have to touch this their apology some little of fuzzy match false or something in your query envelope great some of what
me who knows I would I would guess that for backwards compatibility which as far as the you because it would make an upgrade after reindex everything because you have to be storing the sales it's and start imagine that the most logical turned off by default also know not everyone cares about so I take and thanks so much for coming in
Resultante
Weg <Topologie>
Wald <Graphentheorie>
Polygon
Open Source
Zeitzone
Shape <Informatik>
Kreisfläche
Datenfeld
Einheit <Mathematik>
Matching <Graphentheorie>
Migration <Informatik>
Datenhaltung
Retrievalsprache
Computeranimation
Aggregatzustand
Retrievalsprache
Automatische Indexierung
Addition
Bit
Große Vereinheitlichung
Punkt
Matching <Graphentheorie>
Gebäude <Mathematik>
Benchmark
p-Block
Polygon
Demoszene <Programmierung>
Font
Minimalgrad
Flächeninhalt
Heegaard-Zerlegung
Bus <Informatik>
Wiederherstellung <Informatik>
Ordnung <Mathematik>
Große Vereinheitlichung
Benchmark
Web Site
Browser
Datenhaltung
Zellularer Automat
Analytische Menge
Polygon
Zeitzone
Quick-Sort
Computeranimation
Entscheidungstheorie
Eins
Entscheidungstheorie
Datenfeld
Font
Bildschirmfenster
Datentyp
Retrievalsprache
Gerade
Hilfesystem
Varietät <Mathematik>
Einfügungsdämpfung
Informationsmodellierung
Datenfeld
Datenhaltung
Mereologie
Retrievalsprache
Wort <Informatik>
p-Block
Grundraum
Message-Passing
Fehlermeldung
Aggregatzustand
Lineares Funktional
Nichtlinearer Operator
Prozess <Physik>
Selbst organisierendes System
Open Source
Applet
Mailing-Liste
Kartesische Koordinaten
Term
Computeranimation
Demoszene <Programmierung>
Skalierbarkeit
Server
Projektive Ebene
Wort <Informatik>
Programmierumgebung
Modul <Software>
Automatische Indexierung
Konstruktor <Informatik>
Prozess <Physik>
Matching <Graphentheorie>
Approximationstheorie
Rechteck
Zellularer Automat
Polygon
Netzwerktopologie
Netzwerktopologie
Parkettierung
Heegaard-Zerlegung
Retrievalsprache
Netzwerktopologie
Shape <Informatik>
Kreisfläche
Punkt
Druckertreiber
Matching <Graphentheorie>
Retrievalsprache
Zellularer Automat
Vorlesung/Konferenz
Datenstruktur
Term
Retrievalsprache
Server
Kreisfläche
Matching <Graphentheorie>
Programmverifikation
Programmverifikation
Paarvergleich
Netzwerktopologie
Client
Forcing
Minimum
Server
Retrievalsprache
Elastische Deformation
Server
Shape <Informatik>
Punkt
Datennetz
Matching <Graphentheorie>
Programmverifikation
Programmverifikation
Zahlenbereich
Term
Client
Datenfeld
Quadratzahl
Syntaktische Analyse
Softwareschwachstelle
Datennetz
Server
Dateiformat
Zeichenkette
Demoszene <Programmierung>
Netzwerktopologie
Bit
Client
Rechter Winkel
Code
Retrievalsprache
Physikalisches System
Code
Retrievalsprache
Total <Mathematik>
Punkt
Prozess <Physik>
Programmverifikation
Zellularer Automat
Rekursivität
Kartesische Koordinaten
Polygon
Räumliche Anordnung
Term
Marketinginformationssystem
Computeranimation
Strategisches Spiel
Eins
Netzwerktopologie
Demoszene <Programmierung>
Prozess <Informatik>
Code
Serielle Schnittstelle
Minimum
Retrievalsprache
Gerade
Nichtlinearer Operator
Shape <Informatik>
Programmverifikation
Einfache Genauigkeit
Netzwerktopologie
Rechenschieber
Datenfeld
Parkettierung
Strategisches Spiel
Speicherabzug
Wort <Informatik>
Serielle Schnittstelle
Rekursive Funktion
Brennen <Datenverarbeitung>
Geometrie
Resultante
Retrievalsprache
Subtraktion
Punkt
Zellularer Automat
Computeranimation
Übergang
Netzwerktopologie
Last
Serielle Schnittstelle
Endogene Variable
Retrievalsprache
Inhalt <Mathematik>
Bildgebendes Verfahren
Bildauflösung
Automatische Indexierung
Physikalischer Effekt
Einfache Genauigkeit
Benchmark
Netzwerktopologie
Verknüpfungsglied
Quadratzahl
Minimalgrad
Zellularer Automat
Server
Strategisches Spiel
Client
Wort <Informatik>
Messprozess
Aggregatzustand
Information Retrieval
Ortsoperator
Benutzerfreundlichkeit
Gruppenoperation
Zahlenbereich
Computeranimation
Übergang
Netzwerktopologie
Netzwerktopologie
Client
Syntaktische Analyse
Forcing
Zellularer Automat
Mini-Disc
Retrievalsprache
Server
Strategisches Spiel
Gerade
Resultante
Automatische Indexierung
Approximationstheorie
Matching <Graphentheorie>
Ortsoperator
Datenhaltung
Open Source
Versionsverwaltung
Zellularer Automat
Computerunterstütztes Verfahren
Übergang
Netzwerktopologie
Demoszene <Programmierung>
Rechenschieber
Netzwerktopologie
Diagramm
Offene Menge
Mini-Disc
Endogene Variable
Strategisches Spiel
Retrievalsprache
Wort <Informatik>
Strom <Mathematik>
Quellencodierung
Analysis
Retrievalsprache
Einfügungsdämpfung
Punkt
Natürliche Zahl
Applet
Computeranimation
Strategisches Spiel
Netzwerktopologie
Client
Skalierbarkeit
Reverse Engineering
Code
Vorlesung/Konferenz
Hacker
Gerade
Bildauflösung
Lineares Funktional
Nichtlinearer Operator
Shape <Informatik>
Datenhaltung
Güte der Anpassung
Digitalfilter
Biprodukt
Entscheidungstheorie
Arithmetisches Mittel
Datenfeld
Rechter Winkel
Heegaard-Zerlegung
Instantiierung
Mathematisierung
Klasse <Mathematik>
Zahlenbereich
Zellularer Automat
Polygon
Term
Code
Demoszene <Programmierung>
Knotenmenge
Multiplikation
Retrievalsprache
Luenberger-Beobachter
Grundraum
Trennungsaxiom
Matching <Graphentheorie>
Mailing-Liste
Einhüllende
Paarvergleich
Netzwerktopologie
Fuzzy-Logik
Mereologie
Räumliche Anordnung
Speicherabzug
Stapelverarbeitung
Vorlesung/Konferenz
Default

Metadaten

Formale Metadaten

Titel Accurate polygon search in Lucene Spatial (with performance benefits to boot!)
Serientitel FOSS4G 2014 Portland
Autor Gerard, Jeffrey
Smiley,
Lizenz CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/31734
Herausgeber FOSS4G, Open Source Geospatial Foundation (OSGeo)
Erscheinungsjahr 2014
Sprache Englisch
Produzent FOSS4G
Open Source Geospatial Foundation (OSGeo)
Produktionsjahr 2014
Produktionsort Portland, Oregon, United States of America

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Lucene, and the NoSQL stores that leverage it, support storage and searching of polygonal records. However the spatial index implementation traditionally has returned false matches to spatial queries.We have contributed a new spatial indexing strategy to Lucene Spatial that returns fully accurate results (i.e. exact matches only).Better still, this new spatial search strategy often enables keeping a smaller index and and faster retrieval of results.I will illustrate why false matches happen -- this requires a high-level walkthrough of spatial index trees -- and real world cases where it makes a difference.Our initial workaround was to query Elasticsearch through a separate server layer that post-filters Elasticsearch results against the query shape, removing the false matches.We've now built a similar approach into Lucene Spatial itself. By virtue of living inside, this new solution can take advantage of numerous efficiencies:1. it filters away false matches before fetching their document contents;2. it uses a binary serialization that is far faster than the GeoJSON we used before;3. it optimizes the tradeoff between work done in the index tree vs. post-filtering, often resulting in a smaller index and faster querying. I will provide benchmark numbers.I'll illustrate how developers and database administrators can use this improvement in their own databases (it's easy!).
Schlagwörter Database
Search
NoSQL
Vector
Indexing
Geohash
Benchmark
Lucene
Elasticsearch
Solr

Ähnliche Filme

Loading...
Feedback