Postgis Topology to secure data integrity, simple API and clean up messy simple feature datasets.
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 351 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/69124 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Year | 2022 |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
00:00
TopologyStatistical hypothesis testingClient (computing)INTEGRALBitDatabaseComputer animation
00:35
PolygonDecision theoryClient (computing)MetreClient (computing)Attribute grammarMetreAnalytic continuationCASE <Informatik>Decision theoryInsertion lossLine (geometry)Context awarenessPolygonPoint (geometry)Right angleDatabaseSummierbarkeitResultantDiagramComputer animation
03:36
Server (computing)SummierbarkeitComputer wormClient (computing)TopologyAreaAttribute grammarRow (database)InformationTopostheorieBeta functionMultiplication signMereologyDatabaseDependent and independent variablesServer (computing)Latent heatComputer wormLine (geometry)Client (computing)TopologyConsistencyLogicTrailFitness functionAttribute grammarSummierbarkeitBitSoftware developerCASE <Informatik>ResultantInstance (computer science)Single-precision floating-point formatComputer animationDiagramProgram flowchart
06:30
TopologyVertex (graph theory)View (database)PolygonTopologyMultilaterationDemoscenePolygonSet (mathematics)Context awarenessDot productMultiplication signInformationLevel (video gaming)ResultantLine (geometry)Cellular automatonAttribute grammarMereologyMountain passCovering spaceStatistical hypothesis testingDatabasePhase transitionComputer animation
10:24
Java appletVisualization (computer graphics)TopologyFile formatInformationServer (computing)Client (computing)Front and back endsClient (computing)TopologyDatabaseVisualization (computer graphics)MathematicsLibrary (computing)ParsingJava appletConfiguration spaceComputer animation
11:10
Line (geometry)String (computer science)CASE <Informatik>Data structureProjective planeWeb 2.0PolygonPoint (geometry)Line (geometry)Shape (magazine)Product (business)Random matrix
12:04
InformationSpacetime1 (number)
12:33
Cartesian coordinate systemProduct (business)Medical imagingSpacetimeStatisticsLengthSet (mathematics)Computer animation
13:33
InformationOpen setDeclarative programmingModul <Datentyp>File formatInteractive televisionCodeDatabaseInformationPoint (geometry)Declarative programmingLevel (video gaming)Vector potentialFunctional programmingTopologyClient (computing)Server (computing)Front and back endsLibrary (computing)Computer animation
14:35
Declarative programmingTable (information)Attribute grammarSerial portAreaTopologyModul <Datentyp>Software testingArchaeological field surveyObject (grammar)Link (knot theory)Arrow of timeFunctional programmingCore dumpOperator (mathematics)Line (geometry)Projective planeUniformer RaumEndliche ModelltheorieStatistical hypothesis testingFunctional programmingCodeMultiplication signCuboidTable (information)Set (mathematics)TopologyDatabaseGeometryFile formatValidity (statistics)Attribute grammarPhase transitionINTEGRALView (database)Slide ruleLevel (video gaming)SubsetSpacetimeModel theoryMusical ensembleSelf-organizationDirection (geometry)Server (computing)TheoryError messageCASE <Informatik>AreaWordCartesian coordinate systemGroup actionRevision controlUser interfaceInteractive televisionOpen sourceTopostheorieExtension (kinesiology)Source codeOpen setHeegaard splittingParameter (computer programming)NamespaceRectangleComputer virusReading (process)Computer animation
20:35
TopologyData integrityAttribute grammarSurfaceDecision theoryClient (computing)Modul <Datentyp>Software testingDeclarative programmingArchaeological field surveyObject (grammar)Link (knot theory)Slide ruleLink (knot theory)Computer animationDiagram
21:05
Suite (music)Software testingScripting languageFunction (mathematics)Utility softwarePrime idealDisintegrationImplementationTopologyData integrityShape (magazine)MathematicsData modelLink (knot theory)Standard deviationPrimitive (album)Endliche ModelltheorieTopologySlide ruleFunctional programmingPhase transitionClient (computing)GeometryFunctional programmingMultiplication signTopostheorieSound effectValidity (statistics)CoroutinePhysical systemCasting (performing arts)PhysicalismObject (grammar)Online helpModel theoryFunction (mathematics)Latent heatCASE <Informatik>DatabaseCodeAbstractionSoftware testingPresentation of a groupSoftware developerNormal (geometry)Standard ModelData typeRevision controlOpen sourceTable (information)Operator (mathematics)ExistenceWordDialectLevel (video gaming)Direction (geometry)Computer animation
26:40
Computer animation
Transcript: English(auto-generated)
00:03
We're going to talk a little bit more about simple feature and the mess related to database integrity. Then we're going to switch to show a client which is based on Postgres topology, and then Sandro Santelli is going to tell me more about, tell us more about Postgres topology.
00:21
Mattia is going to tell us more about the client. Alright, so let's start off with a mess. Why is it not? Yeah? It's too difficult. Oh, left side. Okay, thank you. Okay, we have this.
00:41
We have a very simple case. This is a polygon. What we want to do is to draw a road here and end up with this result. This is super simple. So we have this context. We have a database with simple feature. We have an API where you can load simple feature.
01:03
So what's the problem? One of the problems are client decisions. The client has to decide what should I do with the existing surface. Should I update it or should I delete it in the original database? Depending on what decisions made there.
01:22
No, sorry. The client has to decide what to do with those three new surfaces. Should I do an insert on the road, which is in the middle? Maybe a natural. To the left of the road and to the right of the road, you also have two surfaces.
01:44
Then one of those will be insert and one will be update. Or you could maybe delete the original surface and do three inserts. But then what about attribute values like dates? It may be natural that date has a new value.
02:02
But not maybe the two on the left or the right side. Another question is where is the clock? Stop now and forever. At some point. Yeah, please.
02:24
Okay, let's continue. What about this, attribute values on borders? If you look at this case, what's actually new here is two lines for the road. Actually nothing else. But if you look at the simple feature, you cannot have attribute values on borders.
02:44
You can, but you have to have the same value for all the borders for that surface. In this case, the road actually contains two new borders and two old borders. Let's continue.
03:01
So attribute values on borders is not possible with simple feature. Okay, look at this case. We talked about borders. Let's say we have a client, one with a tolerance of one meter and one with a tolerance of 10 meter. In the database, we try with one meter.
03:21
Everybody knows what happens here. Don't you? Yes. You can show it anyway. This happens. You will get overlap and gap everywhere. So let's sum up a little bit. The payload is big.
03:43
Why? Because it's a mess of new data, data to read, data to be updated, data to be deleted. So this is a big mess.
04:02
So what's this in the topology world? What we're going to do here, we're going to draw two new lines as you know. So it's only new data. So you can throw away this part. We only need the correct part.
04:21
It's simple for the client and you see that the lines does not fit exactly, but that's handled by the server. So if they use some wrong tolerances, that's not a big problem. So the two new lines are sent to the server and they get this result back.
04:41
So what's the server responsibility here? The server's responsibility to cut out lines that's not used anymore. As you see, it fits perfectly well now. It's also the server's responsibility to keep track of dates. For instance, on the top left there, there's a date attached to that road edge.
05:02
And that is still 2015, which is natural. If you look at the date on the new road, that's from this year, and that makes sense. If you're going to push all this business logic onto the client and you have many different vendors, it will be wrong.
05:23
Because not every developer, as myself, do read all the specifications when we do develop. So, sorry, forgot, fix this next time. Okay, if you look at the surfaces, this surface has not really changed, because it's only the road that is new.
05:41
So the date here is from 1999, that's correct. On this case, on the road, that's new, so that makes sense. So what do we do? We return to the client. So this is what we return to the client. The client has to decide what to do.
06:00
The client selects the road and change the attribute on that road, one single attribute, and send the road back. This way, we secure database inconsistency. So, and we actually then follow pretty much what Codd said in 1970.
06:21
This is what it looks like in the database. In the bottom line, you have a topology, and here you have also a topology for the border. Not important. Here you have, what's this looking like in Kyrgyz, scenes from a Bosnian topology world. Sander will say something about that later.
06:42
Okay, let's skip back to the Korean land cover, 1980. As I said, it was perfect when they sent it down to Europe. When they get it back, it's full of overlap for the Norway part. I think we have about 800 overlaps in this dataset. So, people tend to say that it's no problem to identify lines that match using simple feature.
07:08
This pretty much proves that it's not simple. Because this is a set of simple features, and they do not manage to fit this exactly. I tested many datasets, and you find pretty much overlap any place.
07:23
What time is it? Okay, so what we do is that we move this data into Bosnian topology. And here we need to do some more work on Bosnian topology, but we use content-based grids.
07:41
I will not explain that now, but we basically start on a smaller cell, extend the cell size, and take bigger and bigger chunks. We need to run things in parallel when we run on big datasets, so we need to chunk things up. In Bosnian topology, when we reach this level, things go very slow to merge everything together.
08:03
Hopefully, we can do more work on this later. Okay, a couple of things here. Some say that you can use tolerance levels. I basically taught that myself to clean up simple feature polygons.
08:21
But what I told them is that please do it with no tolerance. And basically, the problem with tolerance is that it's totally random what happens. If you look at the line done at the left of the dark spot, if that one comes first into the database,
08:40
and you run with a high tolerance, that line wins. If the line on the top comes first in the database, that wins. So you get a random result every time. I think that's why Paul Ramsey told me to not use tolerance. Another thing that Sandro told me, we do not have any good methods for doing line merging.
09:08
So basically, this is from Colleen Landcover. It's two millimeters, and you get a valid space, and then when you have a valid space, you decide what to do based on attribute values and sizes.
09:21
And then you get a consistent result every time. Okay, I wonder, this is the last thing, just about simplifying. I wonder why they didn't simplify the Colleen Landcover. Because you see the green dots and the red dots here.
09:40
The red dots are actually not needed. It doesn't give us any more information. It's just extra data. So I run as the simplify, and this is the set of overlaps I get. You get overlaps everywhere. That's because I do simplify on each simple feature.
10:04
This is what it looks like. And here, in more details, that's about two millimeters. If you do this on topology, it looks like this. No overlaps, and everything is basically perfect in the context of overlap.
10:24
Okay. Our first approach started about eight years ago, and it involved Java in backend and JavaScript for the client with the D3 library
10:43
for visualization and parsing of top JSON. And then we had a manual database setup, that means with the configuration files edited manually. But there, we had already the topology advantages,
11:02
and we only sent to the backend what is needed. Only new changes to the backend. This is the first project that went to production. It's a reindeer grazing land solution,
11:22
where the users can define where the reindeer goes. And we also have, in this case, 20 layers, many of them in polygonal shapes, and some with the line string and one with points.
11:46
And here I show the example with the line strings, where you see different cartography for the cases when there is fences or infrastructure.
12:01
The second example I will show is called R5Web, where AR stands for Aerial Resources. Here we only have one layer that covers the whole Norway, and we have polygonal spaces.
12:20
And the users are from municipalities, and they can define new shapes or change existing ones. And I would say something about the statistics of this solution. This one was sent to production about one year ago,
12:45
and as for one week ago, we had 200 users from the municipalities, and the whole Norway has 356 municipalities, and 172 had gained access to the application.
13:03
And 103 are daily active. And the dataset covers the whole Norway. Norway is more than 2,000 kilometers in length, and we have nine million spaces.
13:22
And until now, 5,500 have been modified, and that is what you see in red in the image. And then it comes the solution we have today. A few months ago,
13:41
Lars decided to explore more the potentiality of Postgres, and we decided to use that for a backend. And we use plain JavaScript for the client with only two libraries, one for bundling the code
14:01
and one for the map and interaction functionalities. Now we have a generic client with a declarative database setup. And the client is very small. It's 3.1 megabytes.
14:21
And the common point with the old approach is that we use the topology and only essential information exchange between the client and the server. And the user interface is as simple as this one.
14:42
You see we have a base map that, in this case, opens a read map. And we can draw lines defining new geometry, for example, the big rectangle in blue. And then we can also modify existing geometries.
15:02
Here, with one line, we split an existing rectangle in two, and we assign some attributes to it. And I will leave you the word to Sander for more details.
15:23
Hello. Okay. This is the first time I see these slides, so I don't know exactly what I'm looking at. I think this is the declarative way we are creating the database. It's a JSON format in which I'm just to avoid the SQL
15:43
to create the schema. We are defining it in a JSON format. We are specifying just which tables do exist, and then we assign to... Do you see the arrow? Yeah. We assign to roles.
16:00
The surface layer and the border layer. This code is supported by an open-source project from Nebio, which is hosted on GitLab, under the Nebio open-source namespace. It's a set of PL, PGSQL, and SQL functions.
16:23
So one of these functions accepts this format to create a schema and expose functions to interact with the schema. The model allowed by this application, which is called topo-update-sql,
16:41
is a subset of the possible models that you can implement with the PostGIS topology. In this model, you have surfaces, which are ARIA features, and borders, which are linear features. And you cannot have...
17:00
You are dividing the space into just a uniform set of these surfaces with no gap allowed and no overlaps allowed. So these two roles are mandatory. You have surfaces and you have borders, so you can specify which table plays which role, and you can add arbitrary attributes
17:22
to each of these two tables. And then you can define which operations are allowed in this schema. In this case, we are just defining an add-borders-split-surfaces operation, which is the one you saw in action before, presented by Mattia.
17:44
Okay, we are jumping to another thing for a second, and we are now seeing what you can do when you have a topology-based schema. In the database, the advantage of having a topology organization in the database is that the gaps and overlaps are never possible
18:07
because as long as you use the functions which are specifically written to edit a PostGIS topology, every time you insert a line, the core PostGIS extension automatically detects
18:23
which are the intersection, and if you're splitting a face, you automatically get the two faces. So in theory, you have no possibility to break the planner coverage of the whole space unless you do direct editing,
18:43
because for speed reasons, we are not using, PostGIS topology is not using triggers to enforce this integrity, because it would be very, very slow. So you can break it if you are not careful
19:03
about using the functions which are exposed for the user, but you can still validate the topology. There is a function that is similar to stvalidate of the PostGIS core, which is meant to tell you if a geometry is valid. Even in PostGIS, you can still break a geometry if you want.
19:24
So this one is for topology. You can check that your topology is still valid. In the newest version of PostGIS, I think it's 3.3, we introduced a new parameter for the validateTopology function, which is a bounding box,
19:41
because validating a topology is a really, really computationally intensive operation. So when you have things like 9 million faces, as it happens for NIBIO, it can take days. So they funded a new parameter, which is a bounding box, so you can limit the check into an area you are editing, maybe.
20:05
These are the kind of errors you can get if you break the topology, like, yeah, you can have coincident nodes which are not expected. You shouldn't get them if you use the specific function, or, I don't know, maybe inconsistent view
20:22
of which face is on the left or on the side. I'll continue on my slide. Okay. Yeah, so this is, where are my slides? Can I go back?
20:47
Is there still these slides? Well, the last slide could have been interesting for them. I don't know how to stop this. The one before. Can we switch to the one before?
21:03
Okay, this one could be interesting. Just, it has some links. And I'm not Santinelli, I'm Santilli anyway. Okay, these are my slides. They are old slides, sorry. They are from 2017. I was not supposed to take this presentation, but anyway.
21:21
And my name is Santilli. This is basically about the PostGIS topology itself. It started in 2006, and then it was, it entered PostGIS in the first time in version 1.1.0, so it's a pretty old code support.
21:41
There is an ISO specification, which specifies what are the tables defining a topology in a standard way. So if you use PostGIS topology, you are using an ISO standard model for the faces, nodes, and edges. In 2010, it was integrated, thanks to Reggio Neto Scanner,
22:04
which is also hosting this conference. It's a sponsor of this conference, I think. And they funded the, reaching the, all the functionality specified by ISO. I'm out of time, so I will,
22:21
I don't know what to tell them or not in five minutes. Maybe you can go straight to questions, because otherwise, this you already saw, the effects of simplifying things in isolation. These are the reasons why you would, you could want to use topology.
22:41
One of this is that the relationships are explicit. Like in this case, I'm showing, you can tell if two aerial features touch, because they will share an edge. And this is something you can, you can determine at the database level with normal data types, not with special operations, which are known to be slow.
23:05
This is the conceptual model. If anyone is into conceptual models, I won't explain, I guess. This is the example topology we are using in test cases.
23:27
You can have hierarchical layers. They are not using it in NIBEO, but you can have topogeometry objects, which are features defined by the underlying topogeometry objects. So if you have, for example, municipalities, you could have provinces defined by municipalities
23:42
and regions defined by provinces. Can I say one word? This model that Sandro is showing, it's not a random model, as he said. That's an ESO standard. So that's quite important. Everything is structured according to ESO, and you have validation routines,
24:01
so it's a very rock-solid system. And I would also say thank you to Sandro for very nice work. It would have been impossible without that from Sandro and other open source developers. Let's continue, Sandro. As I moved on, because the slides were showing
24:20
the underlying physical models of edges, nodes, and faces. On top of this abstraction, which is defined by ESO, there is another abstraction, which is the concept of topogeometries, which are features, geometrical features, which are defined by the underlying primitives, is what we call them,
24:41
which is nodes or edges or faces. So you can have an aerial, what they call, for example, a surface. A surface is an object which is defined by faces, the underlying faces. So you can have a surface, you can have an aerial geometry defined by multiple faces. An overlap would be the existence of two aerial geometries,
25:05
which are sharing one face, for example, one or more faces. That's what you would call an overlap. And the gap is when a face exists, but there is no aerial topogeometry using that face in its definition.
25:22
And this is the object that you can store in a column in POSGIST and you can associate attributes with. And there is an automatic cast to the simple feature, the simple geometry, so you can run whatever geometrical operation on these topogeometries,
25:41
because they will be automatically casted to a geometry. These are some functions to populate a topology, some functions to inspect a topology, and you have topologies on output for a topogeometry, for example.
26:01
So by using the underlying topology, you can avoid sending to a client duplicated borders when you are sending a wall coverage. You will send each border exactly once. Okay, example, we can skip, I guess.
26:24
It's over? One minute left, so we can jump to questions, if you have questions directly. Okay, thank you very much.
26:45
Okay, thank you for the very interesting...