AV-Portal 3.23.2 (82e6d442014116effb30fa56eb6dcabdede8ee7f)

The best of both worlds: combining geometry and key-value stores using PostGIS and HStore

Video in TIB AV-Portal: The best of both worlds: combining geometry and key-value stores using PostGIS and HStore

Formal Metadata

The best of both worlds: combining geometry and key-value stores using PostGIS and HStore
Title of Series
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Open Source Geospatial Foundation (OSGeo)
Production Year
Production Place
Portland, Oregon, United States of America

Content Metadata

Subject Area
The "national geospatial foundation" of Norway consists of multiple semi-structured and semi-standardized thematic data sets made available in a variety of formats. Storing, extracting and performing lightweight analyses across the different data sets adds value and usefulness to the data sets, which is a prime motivation for releasing the data freely to the public. Earlier approaches have stored the different data sets in a traditional relational manner resulting in hundreds of Postgresql/PostGIS tables Ð some with dozens of attributes. Updating and querying the data sets becomes unnecessary complicated and often a tedious, manual task. In an effort to deal with these issues, we have looked at other ways of storing and querying the data. A schemaless storage mechanism, like NoSQL-databases, fits perfectly to the task. However, NoSQL-database implementations have major drawbacks related to geometry handling when compared with PostGIS. We wanted the geometry handling of PostGIS combined with the schemaless storage mechanisms of a NoSQL database. Postgresql fits this combination perfectly with PostGIS' handling of geometry and HStores handling of key-value stores. HStore is an extension that implements a binary data type in Postgresql that allow storing an arbitrary number of key-value pairs. In contrast to the JSON data type, HStore enable indexing on the key-value stores. Combining PostGIS geometry with HStore's key-value storage for non-geometry attributes was a perfect match for storing the highly varying data sets. The flexibility gain is tremendous and a huge success allowing our data developers to find new ways of combining and making value of the data sets. Future work on the JSONB data type will combine the benefits of both the HStore and the JSON data type, enabling solutions that are even more advanced as well as bridging the gap between NoSQL-databases and relational spatial databases.This talk will present our success in combining geometry and key-value stores in Postgresql by using PostGIS and HStore Ð which lead to a neatly structured geospatial data collection with excellent performance for extractions, both in materialized views, but also running real-time extractions and lightweight analyses used in production decision-making.
Keywords postgresql postgis hstore key-value stores nosql geometry analysis data storage custom extraction
Word Computer animation Key (cryptography) Personal digital assistant Bit Instance (computer science) Resultant
Computer animation Arithmetic progression Metadata
Category of being Word Computer animation Calculus of variations Regulator gene Personal digital assistant Customer relationship management Set (mathematics) Client (computing) Instance (computer science)
Web crawler Email Service (economics) Computer animation Computer file Different (Kate Ryan album) State of matter Mathematical analysis Database Field (computer science) Resultant
Computer file Calculus of variations State of matter Source code Electronic mailing list Variance Field (computer science) Subset Subject indexing Roundness (object) Computer animation Different (Kate Ryan album) Synchronization Network topology Table (information) Task (computing)
Scaling (geometry) Computer animation Database
Subject indexing Computer animation Modal logic Operator (mathematics) Spacetime
Subject indexing Pulse (signal processing) Computer animation Data storage device Data storage device P-value Data type Incidence structure
Subject indexing Mathematics Computer animation Data storage device Binary code Sound effect Database Funktionalanalysis Table (information) Data type Field (computer science)
View (database) Execution unit Mathematical analysis Database Computational geometry Mereology Metadata Subject indexing Computer animation Data storage device Object (grammar) Table (information) Geometry
Area Group action Demo (music) Mathematical analysis Electronic mailing list Sampling (statistics) Client (computing) Funktionalanalysis Distance Metadata Revision control Subject indexing Computer animation Data storage device String (computer science) Network topology Buffer solution Cuboid Software testing Object (grammar) Resultant Row (database) Geometry
Subject indexing Computer animation Data storage device State of matter Binary code Right angle Table (information) Data type God
Boss Corporation Goodness of fit Computer animation Software developer
the so what the muon talk about this the dataset that there have worked itself and the National word geospatial data set in Norway it's a defined dataset the we talk about the use case and the customer requirements the and then it a bit about the current solution that there have been some other solutions that there may be possible and also a bit on 2 key value types that that instance very nice said in the introduction and also our experiences so far in the future results so 1st the Dr.
datasets that its colony in Norway it said at national spatial datasets that is defined by the government it comprises a lot of the metadata and it it this all mean progress for being final finally decided in January I think and there it defines the different datasets
and they also set some requirements on how they should be open to everyone to use so a lot of them will be freely open and freely available a however it don't set and requirements on how it will be available in availability requirements that today the they should be available and in so some some
examples on what kind of data sets so these are here some so that we have the year the traffic there and the grounded bedrock geology so a of variation here and we also have in corals uh outside the more and also the avalanche risk like you small and gravel so these are very the
complex datasets and they comprise a lot of of different there they don't own there's as well thank so the use case for these datasets are of course also very varied uh we have responded shouldn't management and then also word for it for instance insurance companies which are some of our clients we have real estate if you you are going to sell a property you most likely will I want to know what grounded this on and the end and this will also be a requirement for for people selling property in Norway don't have to look got in these datasets to find out different their properties of of the property and also building regulations and another more governmental stuff the most important thing is
that you can also it as it is open access to to a lot of these datasets you will also that provide a lot of stuff for creative use such as reindeer reindeer grazing fields which also will find inside these datasets it so our
current solution to harvest them and stored the state of then we use a lot of different harvesting collecting techniques many of them are mantle as you get these data on on e-mail or only some of the yeah 1 very manual that way of something data you also find that you have a REST services or a similar and also access to the to the databases themselves so what we do this with the combined compile it into a crawl the format which is our proprietor formant and also Interpol's GIS database then and then do a lot of analysis for our customers and we basically we give them what they they want to and the so we the result is also also would often the database often a comma separated files was in the you to have them on the common form of the so our current
solution this is that now the knocked of different datasets has a mentioned and a lot of them are not standardized so you won't find a standard schema to use of the thing like that we also have variations across the and the different municipalities and states to have around 20 estates in you let different schemas on the same datasets to have a very bad source of data for the traditional approach to this and what it and up using in the 1st round of it was to have 1 table for each data file or data source and we ended up adopted 9 and
in the 34 of 4 tables they're all of them with there a lot of fields and had somebody 50 and I think and of course this is very problematic to criterion is Winston this here I guess will say that you want to get an index support of a and the whole tree need to do history all of them because we want to have and a subset of the data the that and this becomes very slow and there and also a very manual task or updating because you have releases on the datasets size the year later release and nothing more you need to updated manually To this end up being in a lot of lists and a lot of the of the manual work to you have to get updates in 1 if the schema on the data sources changing which is because it's not to standardized uh we have to do some work work-arounds and so on acts on the variance of the date to use a lot over of manpower to keep this data and in sync with the with the data sources so then they
looked for the this is talk about some Noah scale solutions it follows are
there other people that had tried to store a lot of unstructured data that within a database of which can also be current fast we have of course the j some uh and geo-data pretty at basically it's a solves the problem you can store everything in for less that you have no schema the this of documents in now Mongo DB encouched
Debian co space and a lot of all the solutions that then we would be very fast to new you can store uh everything in its the problem with this
is that it's not a very good that the spatial indexing and you will find the spatial operators that you haven't to uh which was a necessity for us because we want to use spatial analysis on so the Dow the problem is that we you won't have to and very much you know SQL databases yet at so my opinion uh compared to post-Christian us and that service and be a
recap of in some of the the incidence structure the because you have p-values
key-value stores in the apostles go which shall much store Jason Jason data type which was introduced in 9 to I think the as Vincent also and also talked about you don't have index support on the J. song the the falling down here who but you have a spatial index of course we know that the pulse chase you have have a very good in their flexible the spatial indexing so the problem became that the you can actually store but you won't be able to create fast enough when you get the large datasets which was the huge
problem for us because right wanted to have passed then I lived on some other
key values the store so some other methods of doing it and 1 of them is say each stored which is also used by OpenStreetMap and so on from databases a distort tags and it's the essentially the same problem that we have that allow for a lot of changing fields and a lot of the changing values and a lot to store them in a flexible way so this is their H store it uh the syntax it for story and you also have a lot of functions as spins and showed its basically it's the same as Jason King priors for for just about anything but the bigger the big changes that you have index support because both the GS gist androgen and that's and maybe the VOT index will also uh possibly table and so stored in binary so it's a binary data type and uh which makes it did extremely fast it so that's so effects are a problem very well we can store just about anything the and attributes we can also store the a geometry alongside
with it of conceit but is basically how due to have a uh that tributes unit contained within each and we have the geometry I selected in the geometry following and they also stores the metadata arsenate store which also provides a very flexible way of storing metal like those on each object so we can actually but they've only parts of the dataset and we also have an idea more for off from work actually so this gives us so the solution that a more less wanted the In fact should just about anything but it into 1 table in post yes and we can forget them and out but do whatever we want we can also you there a lot of spatial and with analysis within the database which gives us a much more powerful tool so yeah no
J. Vincent also mentioned that materialized views we use that a lot on Due to prefetch data more last week can refresher Don when we want to and then and you can also ask as mentioned also mention you can have a this special indexes on the material news that gives is really a flexible solution and we want to have a huge computations on the dataset and only what that index on only that dataset for
experiences so far with this this action uh are only good experiences that the you have of test it to on just my a Mac gear and it runs really fast in compared to to all the solutions that have so we have put in almost all of them do Dr. dataset uh riches I not test today date that's that around 2 billion rows uh 3 gigabytes of data we can almost instantly uh within at tributes and spatial so we can use both indexes and it will still simplifying their due and of course the performance uh real decrease if you do spatial indexing in new criteria for a lot of of area a few large bounding box through the decrease but that's the moralist general problem that uh the just index the test their client you uh on which we use for demos but you can see that too it's very easy to crop uh for for anything with in that area and you get you can get the that and that each store you can and get a passage J. Selma was a regular role in the result you can also Crary's lists which you have a lot of problems with before prayer for non touching the objects you can to buffer analysis of buffer the prayers so distant from distance 1st uh release the yeah yeah and that's what we do here a beginning unless the restoring metadata alongside we can ask stored different versions of the data we can query for 2012 data and 2 thousand and 13 data and within data stored within the key value yes
so that he is also a lot of functions it to do to simplify the way of the of the tree and here we can see a sample function that's or less what you last for it so that you can get the uh data to J some you can get it as adjacent string and you can the Caribbean also the trip so this here combines and that each store index so they stood data and the geometry and so this is gives us a very flexible solution that's also Vincent
there and talk about we have if you should thoughts we're using a pulsed 9 3 but then 9 4 we have the Jason B. which is a binary data types of which gives more or less the same thing as state store but made indexes the support and J. and this will of course be a lot easier to do than using H store how it this is hassles index we have a very limited support the tools but the Jason its support of mourners and the right so a little probably switch to that to store everything and Jason the 1 9 4 if released so this is more or less the things that God going through then we realize that we
like some suggestions and also critique if you have an experience this is fairly new way frosted you to do this to store started in 1 table or a few large proposals to if you have any and the greedy and suggestions I'll appreciate the and really like to thank
goal all of the developers on boss crescent moon oppose genesis well and also that the developers on H star and Jason B. because this really gives us a lot of opportunities to to do more well good stuff good with the data and so thank you