Accurate polygon search in Lucene Spatial (with performance benefits to boot!)

CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/31734 (DOI)

Publisher

FOSS4G

Open Source Geospatial Foundation (OSGeo)

Release Date

2014

Language

English

Producer

FOSS4G

Open Source Geospatial Foundation (OSGeo)

Production Year

2014

Production Place

Portland, Oregon, United States of America

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Lucene, and the NoSQL stores that leverage it, support storage and searching of polygonal records. However the spatial index implementation traditionally has returned false matches to spatial queries.We have contributed a new spatial indexing strategy to Lucene Spatial that returns fully accurate results (i.e. exact matches only).Better still, this new spatial search strategy often enables keeping a smaller index and and faster retrieval of results.I will illustrate why false matches happen -- this requires a high-level walkthrough of spatial index trees -- and real world cases where it makes a difference.Our initial workaround was to query Elasticsearch through a separate server layer that post-filters Elasticsearch results against the query shape, removing the false matches.We've now built a similar approach into Lucene Spatial itself. By virtue of living inside, this new solution can take advantage of numerous efficiencies:1. it filters away false matches before fetching their document contents;2. it uses a binary serialization that is far faster than the GeoJSON we used before;3. it optimizes the tradeoff between work done in the index tree vs. post-filtering, often resulting in a smaller index and faster querying. I will provide benchmark numbers.I'll illustrate how developers and database administrators can use this improvement in their own databases (it's easy!).

Keywords