Räumliche Indizes in PostGIS – Welcher ist der richtige?

7 views

Formal Metadata

Title
Räumliche Indizes in PostGIS – Welcher ist der richtige?
Alternative Title
PostGis Indizes - Welcher ist der richtige?
Title of Series
Author
Kunde, Felix
License
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
FOSSGIS e.V.
Release Date
2019
Language
German

Content Metadata

Subject Area
Abstract
GIST, sp-GIST, BRIN oder BTREE - in PostgreSQL gibt es verschiedene Indextypen, die mittlerweile auch für PostGIS-Geometriespalten unterstützt werden. Der Vortrag wird kurz die wesentlichen Unterschiede vorstellen und anhand von einfachen Regressionstests mit künstlichen Geodaten aufzeigen, wann welcher Typ am Besten verwendet werden sollte.

Related Material

Video is cited by the following resource
Loading...
Index
Gist Index FRAMEWORK <Programm> Hospital information system
Gist Set (mathematics)
Gist Inheritance (object-oriented programming) Building
Gist FRAMEWORK <Programm>
Gist FRAMEWORK <Programm>
YES <Computer> Gist
Gist RAM Hospital information system
Index Estimator
Index Geometry Architecture of Integrated Information Systems Estimator
YES <Computer> Index Hospital information system Inference
expect good now The carrier is all about moscow in the first lecture around the spatial tower bros and the indexing of felix customers will be deal with which index which is probably the right one to ask sure some posts will be user and I'm curious thanks again yes most i hope that many These are already with Russian have worked and are in one database he is working probably fast in the fix it sometimes in the index why do you actually do that in the usually whenever you have a query which is pretty slow you like
would like to have and perhaps is it also so if you are a bit one works longer with databases prophylactically simple after all spatial index applies to each geometry column you have with it one Just make sure each column everyone query fast and if you like
you do that I have to pin it down for a moment the color he looks pretty good most people are probably so scared
made with this command with this gesture access method and i have in the last two years more often postels lectures give such an introduction what in recent versions so was added and there were under other indexes as well example, the index of version 2.3 so post 23 has been added and since last year also the sc is index and I joined my own lectures sometimes asked that way By the way, yes what which index should you Actually, I know that actually not at all and then have I simply submitted my presentation because the frost is without the answer too knowledge and then it was accepted and well Well, I had to stop myself a bit put on it and do some experiments write and look at that and that is here today the result and everything so a little bit about have the same understanding what the in different methods can and how you are brief my explanation has something behind for everyone stuck like that and if i go index now Do not say that one now Implementation that is actually a framework where different too big methods index engines can build I have so it is a framework it is actually meant for multidimensional data so things the you can not just sort it like this example behind forest types the postgresql offers if you want to query overlap certain temporal I can not do it like that just sort these intervals and you have a similar problem actually even with geometries you 'll be her later still see that there is one but little trick is only once per se there is no sorting and that have also recognized the post developers and because at that time then this gesture framework and there rather implemented tree on it and Airbourne is actually pretty far spreads the concept that does not exist only databases that is also in spacing javascript libraries implemented and the idea of ​​the album is actually always that one says one stores only the enclosing browning box of a geometry in this index because it can be, for example, that in his in his table that one huge complex polygons has the now just not on an index page you fit and you pretty much too quickly seen that with this one very much rough approximation actually too Quite right place queries hinbekommt and
if the whole thing is always so visual Imagines how it was built is Just hire one now because you not point a polygon or or lines and these are simple all enclosing browning boxes All will be at the lowest level of saved to the index that is also called always if you have closer check you have to make girlfriend of the Scroll or ran off, whatever to access the data on the table read that's why you can not get around that's it at least not with the heritage and the idea from an album is easy now certain regions with more
to divide enclosing rectangles
until you arrive at the top of the root and if we only spatially blended do with any point we are there then just throw it will look with which with which box in the root overlaps that and then it goes down in the next knot until you land at the leaf and what you actually already here that's how it is areas where is the overlap gives as rather unfavorable that means the algorithms for that Keep that tree the same is balanced so need to make sure that overlap within this index is relatively low so that is depends of course also heavily on the data and there is a great tool what you can do can use around in the structure of read text
that may be called it comes from the authors of one the streamwork and there
you can I have here simply times Invited 10,000 points and here you can see the structure of that index so this I have indeed just shown that in the three levels but usually has actually had very few here at 10,000 points there are just two levels and if I now only for example 1000 points If only I had this level now that would be at 1000 points nevertheless everything will be searched so you do not really have any Want to win index here we have Two levels that means if I am now here somehow cut in the area then he wants to do well or just searched all the points in this blogs actually that is quite interesting to take a look So here's the td admin 4's
new jurors so I have a few ask questions then be made if one a certain one in this question always 100 times executed and of it the taken mean and few things the I found it interesting here so at least on the laptop the not used from that even query at 10 million points without index actually under one second lasted then you can imagine okay Now if I have 100,000 points maybe I do not need it yet necessarily index now 18 milliseconds lasts Of course, the difference is enormous Of course it always depends on how many how many times you query against the database but so small it may not even be enough so much then what to create also always recommended is loud Documentation is when you have such an index has created and thus always the stop must make this whole realization I think there is a lot of waste both needs to be cleaned up that is why it is always recommended if you have such a next ask to carry out a vacuum on the table
or an end can at least and you For example, I have one here million points I have too according to end effect seen in the other larger data sets now actually less light or land just during the incest just yet was done in vacuum so and what You can still do it when you go index is one can be in clustering Do the command that makes sure so the tables are the tables include those on the hard disk stored there are so rewritten that will be close to what's in the index together is also on the hard disk but not so if you in expert faster if you have the sequentially searches faster goes and that has synonymous with appropriate big data a pretty strong one given football at lower data volumes now up to It has 10,000,000 points now actually not much our class command on big Datasmengen is also always a lot elaborate operation so it takes relatively long okay so a thing comes
the conclusion of logistics is that he so I can be quite large've my biggest example is a billion had points that were really synonymous only the dots and the column still that were 60 gigabytes on my computer the Geest index were 50 gigabytes so that is already enormous norma overhead and what you do is also if you have the course index already created and then completely many writing process still purely makes then it has a corresponding one influence on the writing processes so with my pumps that was 4 to 16 times slower instance and was at lines not so important now but you can always remember that So if you do huge imports, you can index just before maybe just stop and then create again Still, you have to say that he was yesterday ex at most tests with me actually always the fastest was and also pretty reliable actually always good and always very good if I now have ten more dates take how much faster index about will be and that will be bigger okay the index was for me actually so the most exciting because I have so far Never worked prince stands for block grange index and what exactly means you see that here
What is done there is actually only you know it from beginning to end table through and just has a fixed defined size of blocks and then will be easy for every block of the beginning and the end value in these so the beginning cent limit in this blog saved and then you just go through the whole table through to you this whole block has borders if that
was very fast there you can become that will become you
probably already noticed that that only makes sense if the data are somehow sorted if natural sorting in the data is for example if you have one sensor measurement series saved one yes a time keeps on splitting grows that is in any case already sometimes a natural sort alphabetic sorting not probably rarely before yes and now the question is how is it with geometries because the brand index like he implements it saves just the surrounding boning box for every section and if the data is not sorted Actually, that's pretty good as well Little I have here 500 points
just connected like they are in the table are listed then only then with the second with a line and you can see it's a huge one chaos so the index was nothing bring but what you do with what you do
is one could for example one geocache defends itself through the geo with on the geometry column geo manufacturer the concept that you just divided the whole world in burned and the divide then on and on and that under quadrants respectively always the prefix in front of their parents So you can do that pretty well easily determine if that if the initial letters or pay a lot the things are identical are also close together But there are, but there are also then jump into this in this concept that they called morton curve or z curve so you can see it is here now it looks a bit strange if you have a few little points but you can about this z curves concept recognize and But actually, that is cool too not necessarily with this anymore has to work since 24 hours I think if you just play a role geo makes it then a structure covered in the background are shopping in the place it is pretty cool so what you can do now is you can see his unstructured table Just take it and say I'll copy you just do it with an order by i just sort all the geometries and then have sorted structure where i am in have brünn index can create it I was then made but then unfortunately not used, I know too
not exactly why what I always had to do was before
Everybody asks every intersect question say for example Sequential scans are not allowed so I have the query planner forced to use the index the me since created and then they were Query times also according to good
now that's the real thing
killer feature from inside textes because they are always described that they are actually definitely slower are as a guest index and I can actually in my test But if you like that on the so I look at the size comparisons have here once for example at a billion points is the index like said the bric about 50 gb tall index of a blog limits stores it 3.6 mb so I say nothing like that even the created times are here at the area yes that at over one minute so one and a half minutes for creating one at a time I can not find billions of points bad, maybe six hours at yesterday extras that was so It would have been optimal already took much longer and then you have to just the speed Comparative look at a few dates is it might be twenty times slower in the queries but at very large data sets 101 times slower that is actually pretty negligible that is, whenever you actually go Huge amounts of data one would like to have indicia for his geometry if you interrogate his geometric ones just create a blind text just let's see if that already Something brings because there is super fast so hardly needs memory That's just how it can be so you have to try the data first sort things out before now but that too not so much a time consuming one a gesture to create nothing so that
the conclusion of greenax then the
last index sp yesterday extra last to come is like the name already this is also possible again
framework as well as the geest index so man There can be different unshaven trees with build up that's the special thing about the sp index that he is just where there is a stronger one Bundling data simply gives it that There is also a stronger bondage in the index there will be you too soon
see Special features are also available at spd At first, too, that there is no overlap must be in the data and that the with the prefix I declare on the next slides and the concept at post again you save times the farmers So in the bus, the index is on again going so you can see their time in it kathrin
are also still nonsense let the have actually a very uniform Splitting are from the computer graphics quite well known and yes what you get here can see beautiful is in a tree or Airbus is actually always the one same distribution of index and leaves knot and here you have at the sp index can hold a greater depth have there where you have more data and in that there is no overlap There are also white know new ones data if they are imported directly in which quadrant they fall and Similar to the crash you can then just the values ​​that always stand the index knot and say okay you looking for this value so it is now here long or it is just this index bustle can be synonymous accordingly split up because you always know the own value is made up of the traffic system will put parents together in distress so that we actually do that
complicated now of course the ask there is no overlap like memory is not a box world two-dimensional so and so use the Post developers are wearing them we say make the point easy to four dimensional space that is We just save a corner of the box times in this construct and the another corner also like that point this is a final step and I do not know if you're in here the film is quite good, yes you can the points actually recognizes quite well so I have also tried this with to visualize Unfortunately, I do not come to the leaves knot level there he gives me a mistake but at least I can show first three levels and each one point that one actually sees this the pri fixes are those in the index So these are saved respectively not that a point that's four tiny little farmers in boxes always one with their help one the boundaries of the child quadrant can determine So that is quite complete history ok to the results that
probably the most interesting thing you can see that is the sp yesterday is always twice as fast can be created like the index of a size is a bit smaller but does not take so much and the query duration is actually more and more or less equal to the index So with small data I had now that 's a bit faster otherwise it was always about the same fast for slightly less data to Ghost is a bit slower yes, but not always there yesterday cheap at first was not always quite I say calculable like that Result will be so I had too sometimes the case that if i have one clustered table and on it sbg is the next one has that suddenly much bigger and the query taken much longer even if I said yes he lasts the created time was actually always twice as fast back here in the But it was always a big amount of data slower and that maybe it is because of that the implementation just too young that was mine
That 's a little less predictable as the result but otherwise it actually has a few advantages so the conclusio the I once tried to summarize
also in sum to represent what do I what should I do now which index to use so that is the question I had imagined have and there you can say if you are static data has or at least a time of just analyze his data then you could like the bric index in consider when the data change permanently if you have many Writing process has made the index no sense because you want to go wild yes not always the data there theirs Then the question fits the index completely in the ramps that I have there still found that if not More of the case is then spd sticks and geest index already it would be one little unpredictable how long it is takes time to create the query times at the end but are up Every case is good and there is one more recommendation from the documentation says well there through the register of the tig is index in the space party is and no Overlapping can be used to allow data safety greater overlap with boombox work better and faster I have a record now taken really massive bad is suitable for the gis index because that simply rides through completely dresden they are all in the big one have sweets and actually complete all overlap I have no profit now can determine the s & p gesture index That's why that really just stands out to test that further so that's how it works
the summary and now one more few general recommendation if he is now keep completely unsettled says when
should I index there now I do not use that right now a lot of data and what to do Actually, whatever helps is when should be
I actually create an index one should just look when are his Query slow and that can be done very well Nice to check over with this feature statement tool that is either aggregated view of his slow queries there then it is always a matter of it that's the big thing anyway to update the tables and indexes will update that as well a little extra memory is occupied because he calls himself blood and there he is always important that with the setting time clean up with the vacuum command there is also another tool which often The video is also used we use this with salander quite often and then there is one more thing So if an important employer what Yes, sometimes something stands out in 80 percent the falls is mostly helpful if you just tell the people just update all tables statistics then you will see that I may already ask your questions dissolve problems in air that needs to maybe not now index be great command I have already mentioned as mentioned at large datasets has brought me something but it always takes a long time and is time consuming maybe then Just a new table create with a sorting is simply easier so now the question you can always watch what actually queried and if you are maybe even an extra column with One always asks this one in there status or you ask only one query in a particular room starting with a certain browning box you can also expand the index with the works and then only for this small room give ex created if if you now maybe only data from dresden wants to analyze but has data from all over Saxony, these too industry next is a very cool solutions
there you can split in several with Include the index with it an ex sony scanner arises that's a thing which one can use with a tree that's just the index values ​​there also with in the knot in it and if only the columns are queried which are initiated can the data be read directly from the index These are matters which is post 30 on the it is discussed that you at least maybe for point data also one format finds so that the geometries of score directly on the index can be that would be a cool thing indeed the
It was by my side here the link to the slides and if you ask often have to answer that
Thanks a lot to Felix for the very great Introduction to the indexing in postgresql tosca yields index plans read or read execution plans comes now exactly the next lecture fits very well but certainly gives there are still questions from the audience about it who would like to ask or complete my question is the cluster command To what extent does it make a difference whether one a magnet plate or ssd used So yes, it's probably like that the weir in a magnetic hard disk of the advantage is probably even greater because the sequential skender still maybe a higher effect than that now in contrast to random access ssd disks i mean i have that just made with ssd disks i have also seen the difference but me can imagine that it at magnetic tape drives a lot more bigger effect has indeed the micro comes you can have different index so 1 7 database use table yes that's how it works for other taboo yes yes you can yes Combine how you want you can I also have several indexes always put on a column and then but Of course you can also do it with one just disable simple command like that that they are not taken into account So there you have complete freedom it's some index with me geometry index with the attribute index can connect these in this index implementation are actually always on the geometry I do not know if they split covering index indices are there So they are just quite new there are examples of the trees that they have created on certain columns and then said have include and then its geometry code in eleven ya- and that would lead so that's no professional But I could do with geometry Imagine that maybe that soon could be possible so you can now just split any in his own ex with record but actually are not indexed in the index climb this only as a payload be sanded you would have to be me but you as me may be the index of several attribute goes then but of course be queried in this order have to yes, if I have a geocache with me now someone or something like that tried I actually tried the s p index to use the usa because yes actually the same prefix forward has with a text place and then it has works, yes, but it was not the So make up for the quick query We can do well even after that if there is no one else there is one more ask yes'll still time perhaps no independent question so I had the task i have geodaten the time are affected I have an index on my time and I have my geodata index and now want to ask the which index do I take from the There are two at that time that somehow combine the two somewhere So give me all the data out there germany from today or something like that still no investigations in the Direction made the idea with the me I was already a man why yes, the time as synonymous as the one manager is in the geometries to it focus on if possible So I have it in tracking data I always made that one timestamp as this fourth m will be on each support point had with with and you can also use the spirit index many people and hear the people all initiate is not just for 2d but also for 3d and 4d estg is by the way also for 3d by the way so that you can try that we are actually exciting their rich interesting but i know it in addition to the game on the topic also many research paper gives it all again independent index future methods have built because it is probably one tricky problem is there
Loading...
Feedback

Timings

  562 ms - page object

Version

AV-Portal 3.11.0 (be3ed8ed057d0e90118571ff94e9ca84ad5a2265)
hidden