We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Writing better PostGIS queries

00:00

Formale Metadaten

Titel
Writing better PostGIS queries
Serientitel
Anzahl der Teile
188
Autor
Lizenz
CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache
Produzent
Produktionsjahr2014
ProduktionsortPortland, Oregon, United States of America

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
This presentation will demonstrate ways to take most advantage of spatial indexes, SQL constructs, and PostGIS specific functions. For these exercises we'll be using PostGIS 2.1+ and PostgreSQL 9.3+ . We'll demonstrate common cases people often do inefficiently.This presentation demonstrates the following1) Various SQL constructs including ANTI join, LEFT, RIGHT, EXISTS, LATERAL, CASE clauses, aggregates2) What common table expressions (CTEs) are and when to and when not to use them3) We'll demonstrate these concepts in use in a couple of common spatial query problems - e.g. proximity analysis (both geometry and geography), raster analysis and generation, aggregation of data based on various attributes, other correlation queries.
Schlagwörter
25
74
Vorschaubild
29:15
ZeichenketteTypentheorieElementargeometrieGeradeMaßerweiterungPolygonProjektive EbeneTabelleVerschlingungPunktInformationsspeicherungWeb-SeiteWeb SiteSchlüsselverwaltungSchreiben <Datenverarbeitung>Mapping <Computergraphik>BenutzerbeteiligungElektronisches MarketingSoftwareentwickler
AnalysisDatentypOrdnung <Mathematik>SoftwareTransformation <Mathematik>Automatische IndexierungElementargeometrieForcingFunktionalGeradeMereologieMultiplikationPolygonTabelleZählenFlächeninhaltAbfrageServerAutomatische HandlungsplanungAbstandCASE <Informatik>Strategisches SpielÄußere Algebra eines ModulsQuaderCAN-BusUmwandlungsenthalpieAliasingSoundverarbeitungStatechartMini-DiscURLStandardabweichungMapping <Computergraphik>BenutzerbeteiligungElektronisches MarketingEins
TransportproblemFunktion <Mathematik>Automatische IndexierungElementargeometrieForcingFunktionalGeradeMereologieProjektive EbeneRotationsellipsoidTabelleTermQuick-SortFlächeninhaltAbfrageKonstanteReelle ZahlEinflussgrößeAutomatische HandlungsplanungNichtlinearer OperatorAbstandCASE <Informatik>NormalvektorLuenberger-BeobachterÄußere Algebra eines ModulsAdditionPunktBetrag <Mathematik>Schiefe WahrscheinlichkeitsverteilungQuaderSchnittmengeExpandierender GraphShape <Informatik>Reverse EngineeringAuswahlverfahrenDateiformatArithmetischer AusdruckPuffer <Netzplantechnik>MeterTrennschärfe <Statistik>DifferenteRechenbuchEinfache GenauigkeitKurvenanpassungURLStandardabweichungZwei
DämpfungDatensatzTransformation <Mathematik>Ganze ZahlAutomatische IndexierungFunktionalGarbentheorieLoopPhysikalisches SystemProjektive EbeneVerschlingungFlächeninhaltVersionsverwaltungCASE <Informatik>PunktPixelBitmap-GraphikGrößenordnungUmwandlungsenthalpieAuflösung <Mathematik>MultiplikationsoperatorURLTesselationFigurierte ZahlVorlesung/Konferenz
MAPAutomatische IndexierungElementargeometrieDivergente ReiheFunktionalMereologiePolygonRechenschieberNichtlinearer OperatorAbstandCASE <Informatik>PunktBitmap-GraphikRadiusMixed RealityWrapper <Programmierung>EinsVorlesung/KonferenzBesprechung/Interview
Transkript: Englisch(automatisch erzeugt)
Okay. How many people are new to PostGIS here? Okay. All right. So I'll try to be easy. I'm Regina O'Bay and I'm one of the developers on the PostGIS team and also the Project Steering Committee. And I'm also a co-author of a couple of books with my husband Leo Sue who's over there. And the links here on the
on the first page are links to sites that we manage which have examples and also links to the books that we've written. For this talk I'm going to show a couple of examples of things that I see people commonly do with PostGIS and how they do them wrong and the better way of doing it. So for these examples I'm
going to use some OpenStreetMap data which I pulled with this command. And I'm also going to use the extension called HSTOR. How many people are familiar with HSTOR? Okay. So I don't have to, I'll explain a little bit. It's
basically a key value store that allows you to put random keys and associated values with it in a single column. So it's kind of like the schemaless design type concept. So those slides, so after you load awesome data you end up with four tables that look as presented here. You end up
with a point line polygon roads. And I'm going to focus mostly on the point one but the the lessons here apply to if you're dealing with polygons and line strings and other things. So the first thing that people do when that
they often want to do is they want to change the data type. Like if you want to change it from geometry to geography because you decided you don't want to use like the Web Marketer projection which is commonly used for web mapping. How many people are familiar with Web Marketer? Okay. So I don't need to explain that. So if you had let's say a Web Marketer and you
decided that you wanted to use the geography type because you're going to do mostly like proximity analysis. You'd use a command. You can do it in one step. You just basically do alter. You transform to WGS 84 which has the ID
4326 and that's about it. But people usually drop the column. Well they usually add a new column, update it and then drop it. But you don't really need to do that if you just want to convert an existing data type. I mean an existing column. Sometimes you want to do both web mapping and very fine
tuned proximity analysis. In which case you might want to keep both a geometry column and a geography column. So in that case you'd want to create a new column as shown here. So we add a geography column and then we update it
with the data that we loaded from Web Marketer by transforming. And then we create an index, a spatial index. The spatial index is very important so you don't want to forget that because your queries will be really slow without that. And so if you look at the geography columns table then you should see the
new tables that you created. If you notice here when I brought in the polygon it was just geometry but with the ST multi I kind of forced all the single polygons to multi polygon. And I specified it as polygon so now I get a
clean multi polygon type. So the first thing that people want to do with PostGIS is to find how close things are to a specific location. So like let's say you want to find all the restaurants within one kilometer of a particular location. The standard way that people do this is to use the ST
distance function. And the problem with that is that that doesn't take advantage of spatial indexes. So this is kind of like the brute force way of doing it. In SQL server this actually works and can use a spatial index but not in PostGIS. So the alternative way, the more PostGIS specific
way is to use the STD within function. And so here it is the same thing. So you basically ask for what things are within 1,000 meters and that
are restaurants using this hstore query. And this basically does a bounding box check and expand bounding box check and then it does a short circuit distance. So it doesn't actually have to compute the distance so it works
much faster. And if you just want a count of things you don't really want to just pull the data and then count it. You should just do a count. It's way faster because you don't get the effect of having to pull all the data out of disk. So yeah the basic lesson here is SD distance can't
use a spatial index, STD within can. And this is kind of a textual plan to demonstrate using a geometric network like. Okay in that case you probably
want to use PG routing because that would have a concept of a network. So
this one just does straight line of sight distance but it's really the minimum distance within the geometry. So if it's a polygon then it's the minimum distance within the polygon. So yeah I mean you could do kind of linear referencing and do close this point but yeah you'd be better off with PG routing on that. Okay so one of the common things that people do with
OpenStreetMap data is they query both PostGIS and the hstore kind of like the examples that I already showed. And even if you have an index on both the geometry and the tags column, the hstore column, it
sometimes prefers one column, one spatial index over another. So you get you get the geometry being used but not the tags index. And sometimes it uses both with what's called a bitmap scan strategy. So people usually just create two GIST indexes, one for the geometry and one for the tags. But the
alternative is you can actually create a single index that has both the geometry and the hstore. And this applies to other data types that support the GIST index type. They can also use, like if you have an array you could
have geometry in an array. So here's an example of how you create a compound index. For this I drop the original ones because you don't need the other two if you're going to have a single one that contains both. I mean
there is a downside to this. The downside is this index is fatter than the two separate. So if you just always query them separately you don't necessarily want to do that. Another common thing you might want to do is you don't want to just find the distance but you want to find everything.
You also want to know the actual distance of the things that are within the area. So in that case you would use both STD within and ST distance. So you'd use STD within in the where clause and then you'd use ST
distance to sort. And you can actually use the alias name that you use in the select and the order. You don't have to repeat it. So that's why I have dist here because I define it in the select part. Now some people use
WebMarketer for proximity analysis and that's fraught with some problems. The first problem is if you use WebMarketer to say what's within one kilometer, like what restaurants are within one kilometer, it's not really going to give you what restaurants are within one kilometer. So for example this
is the same query with the WebMarketer geometry column and you notice that you only get one answer whereas in the geography we got five or six answers. And that's just with the way WebMarketer skews the world. So you could still use WebMarketer if you overshoot by, well it depends how much
where you are in the world, how much you have to overshoot. But the idea is you want to overshoot and then you want to do a true distance check you know by casting to geography. So the idea behind that is the first part of your query will use the spatial index of the way column. The second part
will then take that set and then filter it down. So you have a fewer set to like do a real check on. The other alternative is you create a functional index that is geography. With some people it's kind of a iffy thing to do
but it seems to improve speed and then you don't have to do this kind of hokey expand. You can just cast to geography directly and just use it and your index will kick in because you have a geography index. The other way
I've seen people do is to what I call a mutant geography Mercator buffer. And the idea behind that is that you buffer in geography because that will give you a true one meter buffer and then you convert that to Mercator. And because it kind of skews things in more or less the same way in the same area it more or less works. So you end up with the same answer as if
you were doing and you could still take advantage of the Mercator spatial index. The next common thing that people ask for is what are the end closest things to me. And for that you don't really care about how far
those end closest things are. You just want to know like what are the five closest restaurants to me. I don't care how far they are. So for that the brute force way is to just do a distance check across all your geometries in your table and then sort by distance and then take the N which is if you have a large table is really slow. For a small table it's you
know reasonable. So the way to get to use a spatial index in that case is to use the KNN operators, geometry operators which were introduced in Postures 2.0. But these are only bounding box they don't actually work
against the geometry. So for points like if you're using a measure preserving projection it's absolute. For anything else it's not. So you have the same issue of the other thing is that it only works in the order by clause. It doesn't work in the where. So here's an example and so this is using what's
called the common table expression where we define a subquery called s1 and this one we see is using the operator and then our final so we
overshoot that and then we take the top five based on the real distance function the base real distance calculation. And here's the explained plan to demonstrate that it is using an index. So I already covered that if you have
point data it's more or less right unless if you're using something like geometry in WGS 84 then it's it's not right. So here we see our answers and
they're much faster which I don't have a much faster part. Unfortunately KNN doesn't work for geography at least not yet. Hopefully in Postures 2.2 we'll have KNN for geography but not right now. But you could use geometry with geography and it's kind of like the reverse of what we talked about
where instead of a just the geography you use a geometry spatial index in addition. And so when you want to use the geometry spatial index you just cast your geography to geometry and it will use the index because it's functional
on that. Just it's probably not clear but this and this are equivalent it's just that for whatever reason I can't throw the standard cast syntax into a create index I have to use a function. But they resolve to the same thing. And so
here you see it's much faster like before we had 29 seconds now we're down to three seconds. And it's it's almost as fast as the geometry one. Now in terms of using KNN if you use geometry what we call Platt-Carré so basically
geometry in WGS 84 it's actually not as good as Mercator from what I've observed. So you always have to do that double check. So you see here that the answers we get are different. They're not the right answers. But if we if we
you but when we use Mercator it is right. Sometimes you want to get you don't have a single location like let's say you want to know the closest the closest transportation point to a to like all the restaurants in your set. Then you don't have a single point of reference. So KNN doesn't normally kick
in. So the way to get it to kick in is to use the lateral clause which was introduced in Postgres 9.3. How many people are familiar with lateral? Okay just one. So basically the idea behind the lateral clause is that it allows you
to do a sub select in the from clause. So here's here's a lateral clause. So the lateral part is this. And the important thing to note is that the P dot way comes from this table. Normally you can't get away with that. So this basically treats P dot way as a constant. So in this case my index my
spatial index can kick in. But normally it wouldn't be able to. I'd have to have an absolute constant here. So that allows me to basically say for each for each restaurant I mean for each Japanese restaurant that I want to go to
or each Japanese restaurant in my whole table what's the closest to transportation locations to it. One of the cool features that was introduced in Postgres 2.1 is the segment ties function for geography. And how is that different from the segment ties for geometry? Geometry
even if you have WGS 84 it treats it like a Cartesian plane. So when you segment ties it's still linear. It's still you know so it's not right. Like if you were to plot it on a curve it's not right. But the geography considers the the spheroid shape and it actually segmentizes across the you know
along the spheroid. So you can throw this on you know you can convert this to WKT. And we also have a new function in 2.2 which isn't released yet which which will output to the Google encoded format. So it makes it
easy to just throw the the the segment ties on the on the on a Google map. So here's kind of the difference which are still from the boundless docks. This is the geometry. You see how that's a straight line. But if you use geography then you get the true curve behavior. So now I'm gonna get into some
Postgres specific stuff. So how many people use Postgres raster here? Oh okay so more than I thought. Okay that's good. So for this I just loaded some elevation data and aerial data. I'm gonna skip those commands. The first thing you want
to do when you're using raster is you usually tile the data because it you know rasters are big. So you don't want stuff like a you know a 10 gigabyte raster in a single column. It becomes kind of hard to query. So you chop it up into tiles. But then if you're trying to get an area of interest and
it goes across tiles you want to first you want to carve out that area. So you want to figure out what tiles fit in your area of interest. And what people mistakenly do is they first union and then they clip. So if your tiles are
relatively big this is really slow. It's actually much faster if you clip first and then you union. It's actually I think in many cases orders of magnitude but it depends on how big your tiles are. The bigger your tiles the more efficient this is. So here's that's just an example. But I forgot to put the
timings in. I'm sorry about that. And one function I really love using is the ST resize function which allows you to basically take a lower resolution without any consideration of the spatial reference system of it. But the
ST resize function has a lot of it's very overloaded. So it has one variant that takes integers which is the pixel size within height. And then it has one that takes percentages and then it has another one it takes text. So it's easy to fall into the wrong loop. So this is what often happens when
people coming to raster they do this. Can anybody figure out what's wrong with this? I'll point it at this section that is a problem. Nobody knows what's wrong with this. Yes. What. Oh yeah that's it. That's it. You got it. OK. So so what happens in this case is it falls into the float version
which thinks that you're talking about percentages and expects everything to be lower than one. But it's not lower than one. So you get this knowing your percentages must be value greater because it doesn't. Well the other thing that Leo complained about is it doesn't allow you to go above 100 percent. That kind of pissed him off. But so he's like this
function is stupid. So so what you have to do is you have to ensure that both are an integer. If you want pixels you have to cast it to integer so it gets treated as an integer. And then you get your nice
picture. And then the other lesson which is not really raster specific but when you're working with any spatial data is you always want to transform the fewer records. So in that case it's usually your location of interest and you want to transform it to what you have indexed unless
if you have a functional index on the transform. So in this case we want to transform our elevation which is in WGS 84 to the same projections our raster. And we get the elevation value. OK. So that's about it. And here's the link to if you want to buy any of our books. Any questions.
I was just wondering if you had these slides available. Oh yeah. I'll post them on. I'll post them on our posts. Yes. We're also giving a talk in Chicago. We're giving some tutorials so we also post those
slides. One is for geometry geography and the other ones for raster PG routing and topology. So we'll post those slides as well. OK. So just remember that site. You want to buy books you want to read our stuff our slides. They're there. Any other questions. Yeah. You were talking
about KNN and using that to see what points are in geometries or geographies. I'm often looking at whether this series of in this series of geometries and this other series of geometries which ones
overlap. Is there a way. Could you use KNN to do something with that or in that case you just use the intersects function. You wouldn't really need over you wouldn't use KNN for that. Oh no. I mean the faster way is you can simplify but it's not like absolutely right. But if I have something
like a polygon that's huge you know that's got like a hundred thousand points I usually simplify it first and then you can actually create a spatial index on the simplified. So so that ends up giving you faster answers too. Or you can do like actually you know you don't you don't create the spatial index on the simplified you create it. You basically you
can write you can write a wrapper function that basically simulates the STD within but it does like a simplification to the level that you want to the accuracy that you want. So you get faster but it's not absolutely accurate. I have a tricky question about KNN. Have you found a
way to make to use KNN to get points within a distance that can beat D within. That can what. So to you use the KNN operator. Yeah. To get points within within a radius so you can consistently beat D within. No because I
don't think those two mix well together. So yeah I would use STD within there and then give up the KNN part.