Anchoring and PostGIS cure Post-Polygon Stress Disorder
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 188 | |
Author | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/31653 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Producer | ||
Production Year | 2014 | |
Production Place | Portland, Oregon, United States of America |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
FOSS4G 2014 Portland116 / 188
7
10
14
15
16
23
24
25
28
29
33
37
39
40
43
45
46
48
50
56
64
65
69
72
74
82
89
91
98
102
107
111
114
118
128
131
132
135
138
141
143
147
149
150
157
158
161
164
165
166
173
174
175
179
185
00:00
MappingPolygonWave packetCASE <Informatik>BitINTEGRALMultiplication signTexture mappingCombinational logicGraph (mathematics)Product (business)Entropie <Informationstheorie>Compilation album
01:43
PolygonHTTP cookieAreaTexture mappingCASE <Informatik>DatabaseField (computer science)Film editingFile formatSpecial unitary groupComputer animation
03:01
Vector graphicsPolygonGeometryMereologyType theoryMatching (graph theory)Covering spaceData modelAttribute grammarBoundary value problemPoint (geometry)Line (geometry)Data conversionMultiplication signTexture mappingSoftware frameworkTerm (mathematics)AreaDifferent (Kate Ryan album)CASE <Informatik>Vertex (graph theory)CausalityProduct (business)Process (computing)Mechanism designShared memoryExecution unitDirected graphGraph coloringForm (programming)SpacetimeMaxima and minimaFilm editingResultantWhiteboardAxiom of choiceView (database)DecimalError messageBusiness modelUniform resource locatorRight angleInsertion lossDot productINTEGRALCompilation albumUniformer RaumFormal languageMeasurementSimilarity (geometry)Denial-of-service attackFocus (optics)BitGraph (mathematics)Phase transition
10:40
AreaObservational studyBoundary value problemBuffer solutionPolygonPC CardLine (geometry)Form (programming)Field (computer science)Context awarenessLibrary (computing)Disk read-and-write head
11:28
Software frameworkPolygonDifferent (Kate Ryan album)Roundness (object)AreaStructural loadBuffer solutionTexture mappingFilter <Stochastik>Term (mathematics)Process (computing)MultiplicationDescriptive statisticsBusiness modelBitLine (geometry)Point (geometry)Game controllerProjective planeTable (information)Food energyOrdinary differential equationHookingPhysical systemEngineering drawing
13:31
Scripting languageTerm (mathematics)Descriptive statisticsWordBitDatabaseProcess (computing)Pointer (computer programming)Line (geometry)WebsiteAreaTexture mappingWater vaporLimit (category theory)Boundary value problemAngleEuler anglesAxiom of choiceFormal languagePolygonCASE <Informatik>Key (cryptography)Point (geometry)Right angleAttribute grammarExtension (kinesiology)Hooking
16:54
CASE <Informatik>Point (geometry)MetreProcess (computing)Line (geometry)10 (number)Public key certificateInheritance (object-oriented programming)Scaling (geometry)Texture mappingDifferent (Kate Ryan album)Existential quantification
18:07
Connectivity (graph theory)PolygonCurvatureFunction (mathematics)Point (geometry)Client (computing)Video gameProduct (business)AreaForm (programming)View (database)Line (geometry)Front and back endsCASE <Informatik>Process (computing)Matching (graph theory)Hooking
19:56
Message passingINTEGRALProcess (computing)Denial-of-service attackGrass (card game)Software frameworkPolygonOpen sourcePoint (geometry)Business modelCompilation albumTexture mappingAreaComputer fontDatabaseExecution unitRight angleLine (geometry)
21:34
DatabaseMetreThresholding (image processing)QuicksortRight angleTexture mappingScaling (geometry)CASE <Informatik>Observational studyTouch typingAreaLine (geometry)Unified threat managementProjective planeBusiness modelWordMultiplication signVideo gameGame controllerComputer programData modelTopologyView (database)Cellular automatonSet (mathematics)Field (computer science)LogicSubsetMathematical singularityPlastikkartePoint (geometry)WritingNatural numberState of matterSequenceDifferent (Kate Ryan album)Traffic reportingInsertion lossSummierbarkeit1 (number)Repository (publishing)TheoryProduct (business)Group actionUniverse (mathematics)PiSource codeNetwork topologyPolygonProcess (computing)Software frameworkBoundary value problemInformation technology consultingLimit (category theory)Scale (map)MereologyTrailMetadataType theoryRoundness (object)Moment (mathematics)Perfect graphTerm (mathematics)MappingCompilation albumBitVector potentialGoodness of fitSuite (music)Form (programming)Attribute grammarNumberFilm editingExistential quantificationGeometryInformationDemosceneSoftware developerDirected graphComplex (psychology)Student's t-test
Transcript: English(auto-generated)
00:00
Well thank you for staying for my talk. This morning I went to two great talks. One is about tool making. The other one is for keeping things simple. I have some specific cases what you can do for using tools and what we can, what we should do for keeping things
00:21
simple. There are some specific cases. Both are using polygons and the kind of problems can be caused by having polygons in map compilation, updating, and integration. And I will give you some specific cases which cause something I call the post-polygon stress disorder,
00:46
PPSD. And if you stay with my talk hopefully afterwards you're going to be cured or won't have a PPSD anymore. So just a little bit of background on the cases and then specific cases
01:01
and then our solutions. I'm a geologist by training and we do a lot of geological mapping for the province of British Columbia in B.C. And just a few examples. We do need polygons. We need polygons to define our, to capture our features and we do need polygons to represent
01:22
the final map products. And for the province of British Columbia we got hundreds and hundreds of these kind of maps covering the province. And just by the way the size of British Columbia is the Washington, Oregon, and California combined. It's almost four times bigger than UK.
01:43
And so over the years we have been compiling and integrating those individual maps and come with this single integrated seamless digital coverage for the province. Now the use case right now here is we've done our first mapping. Now we want to update one
02:07
of the areas. So it kind of makes sense you would do a cookie cut for this map area, you know, take a copy, cut it out, and the mapper or our geologist will take it to the field.
02:23
Hopefully in a year or two years he finished his mapping and they updated the map for this area. Ideally we can just drop it back to the provincial database seamlessly without any pain or any work. But it doesn't happen that way. So there's all kinds of things that could
02:46
happen along the map edge. But also even for updating the map within the area there's all kinds of cases where when you use polygons to update the kind of like a problem we'll have. There are
03:02
many many problems but I will focus on just two. One is what we called shared boundaries. The other one is called edge matching. I remember in the early 1990s the first GIS course I took there's a lot of those pages, pages, chapters on edge matching and hopefully after my talk
03:25
you will find out the edge matching for me is a history. There's no more edge matching anymore. So for shared boundary just some examples to show you the specific cases. This is a part of a geologic map. We have unit A and unit B share a common boundary in between. But this
03:46
doesn't have to be bedrock geology. It could be land use, could be cadastral, could be municipal boundaries, you name it. And what happened here is not only two lines share the same boundary for the two polygons. In this case we also have a fault cut through here. So the contact
04:04
between unit A and unit B is also a faulted contact. So here really we have three, minimum three features that occupy the same space. Now I went ahead of myself a little bit
04:21
here. So when we need to update one of the features, so let's say the fault has been remapped and we know this fault is the boundary for unit A and unit B. So right away you're going to find some problems here. It doesn't matter what you do. You can spend all your time
04:40
manually trying to adjust the geometry for polygon A and for polygon B. Quite often what you're going to find out is by the end of the day you will have gaps, you will have overlaps along the boundary, both between the polygons and also between the polygons and the line work.
05:03
So that's the first case. The second case is something called edge matching. So we could have map A, something we mapped earlier, and we mapped the adjacent area. We have map B. Obviously you see some differences there in terms of geometry,
05:23
but also the attributes. Ideally we can get them to resolve all the boundary issues, get the map merged seamlessly. It doesn't really happen that way often in the real world. So what we will have here along the boundary between the polygons, you're going to have
05:43
gaps, overlaps, slivers, and the lines may not join, could have overlap, could have disjoint, and the attributes in terms of the map units, they may not be consistent across the border. So I have seen places where people purchase expensive tools and hire a team of
06:09
GS technicians working on this day in, day out, weeks, months, on adjusting those kind of, trying to resolve those kind of problems. And when I thought about it, you know,
06:24
what's the results? There's people spending so much time doing that, and there's a low productivity. When you have your hands on your mouth for the whole day, you've got the injury to your wrist, to your shoulder, right? So it's not too far to get into the PPSD, right?
06:42
So the big question here is, again, just relate to what I hear in the morning. There, you can spend all your time or spend your money to purchase, to develop tools, but sometimes you have to ask a question, you know, do we need, you know, can we, can we avoid these problems in the first place? And so to try to avoid these kind of
07:06
problems, we have to find out the cause of this problem. It's not too different in terms of geology. What we do is we go to a point location, we define, we identify the boundary,
07:21
and from lots of point locations, we join the dots, we form the line work, and out of the line work, eventually we're going to create this bedrock geology, you know, form polygons, color them, create a legend, and cartographically enhance, have them published, right? We actually started with points, lines, so the polygons is not there in the first place.
07:46
So really I think the polygon is the cause of the problem, and we should get rid of polygons in map completion, map updating, editing, and also integration. So the solution,
08:01
we kind of developed this idea, again, perhaps nothing new, follow what we, you know, what we always do. Really in the back end, in the source of the data, what we really need to keep up maintaining is the line work and the points representing the geologic units. So this is
08:20
the only two things we need. So I put in this term called a geologic framework data model. It doesn't have to be called that way. For lack of terminology, I know framework means a lot of different things to different people, but just for the sake, we need some name here, so we call them like GFD for short. So essentially, the lines can be geologic contact, could be false.
08:44
So in some other cases, could be the boundary for land use, land parcel, could be the boundary for the river, for the municipal, whatever. And the points are the centroids representing some attributes describing the land use, land cover. So essentially we just need
09:04
these two type of geometries to represent our data. So by the time you need to create your final products, you can easily convert lines into, create polygons from line work
09:22
and populate the attributes from those points. So just give you a quick example, a simple example. In the province of British Columbia, we have one million vertices to define the geology. So out of the one million vertices, we have hundreds of thousands of line work,
09:42
and it takes us less than three minutes to create 32,000 polygons within post GIS. So thinking from what you have on the left to the right, it's really, really simple process. It's really quick. It doesn't take long, not like in the early 90s, you know, when we formed
10:06
thousands of polygons, we have to run these things over the weekend. So the idea, the framework data model with the lines, only lines and points, also allow us to develop another process, which we call the anchoring mechanism.
10:31
This is the process. We can totally avoid any problems in the edge matching. Let me explain some detail here. So the first one is data checking, checking out. So it's very similar,
10:45
like you checking out a book from a library. So before our geologist head into the field, he will, you know, one of them will give us the study area as outlined by the dotted, the black dotted lines. So from those, from the study area boundary, we're going to select not
11:07
only the geology within the area, but we're going to use that one to select all the polygons had something to do with this updating area. And from this extended
11:22
context, we're going to form a buffer, a tight buffer, and then we're going to use this buffer to select our framework data, which are lines and centroids. So polygons are useful here, right? We need the polygons for the initial filtering. But once we did that filtering,
11:43
we threw that away. So we essentially, we needed this buffer to select everything within the area. So this is our first step. The next step is to, okay, before I went too much ahead, a simple example here. If you take the data out from here, take it out, and run some round trip,
12:07
put into different GIS packages, do some round trip of map projection, and then return the data back here. And if you don't have a precision model or some kind of control,
12:21
I can guarantee you the map you returned, the data, like the framework data you returned for this area, they're not going to be, they're not going to match what we had there before. So this is a kind of like a well, some kind of well-known and understood, well-understood phenomenon, what we call the coordinated drifting. So essentially, if you take a piece of
12:48
data, run through multiple processes in terms of map projection, load into different systems, the data once come out of that process, all the coordinates will drift around.
13:01
So unless you have some magic, otherwise you can't avoid this. So let alone, you're going to do some editing. So how to control this kind of drifting? We borrowed some nautical terms, something called anchor line, road line, hook, and anchor point.
13:24
So road line, R-O-D-E, actually is a term describing the lines between the boat to the anchor. There's a little bit of some description for each of the terms, and it's okay if there's a little bit too much description, too much words here. You will see the actual
13:44
definition from some graphics later. So what do we have here on the database side? We can automatically anchor it. So the outermost line, the one showing up in red, is something we're going to tag as anchor line. So any lines connected to this anchor line is called a road
14:06
line, and here that's where we have a node, and it becomes the hook on the anchor line, and the end of the green line is the anchor point. Make sense? Maybe I have some,
14:24
just a few pointers here. So it's essentially, and you will see why do we need to tag this
14:41
from some real example. So let's say, so the next step is really taking all this data out, but before takeout we tag them, right? Just anchor line, anchor point, road line, whatever. And this is the package we're going to give to our mappers. So it could be the same kind
15:02
of scenario. Let's say you need to update the cadastral. So if you run this kind of process, there will be some additional data need to be taken out, packaged, and for the GIS technician to update. So in our use case, the map will be taken out and get updated by the geologist.
15:26
This could take six months, a year, two years, depending how big is the area. Sometimes could go up to three years. So by the end of the update, we'll have a new map coming back. Again, we don't really care about the polygons they have, what we do care about the line work,
15:43
and the centroids representing the attributes. So before we return these things back to the province, we're going to drop the anchor line they have. So the anchor line for them, it's really just some kind of boundary, some kind of limit. Those are the lines you don't want to touch, you don't want to modify. If you do need to modify, that means that you're
16:03
mapping area extended. You need to come back, we can get you another checking out process, just extend the area even further. So this is the one, what we wanted. Now back to the provincial database. This could be your corporate database. So the first thing we're
16:21
going to do here is within the corporate database is we're going to retire everything had been checked out. And the next step is we're going to drop the updated, as you would expect, because the drifting, right? So sometimes could be some modification as well. The road line where it's supposedly connected to the anchor line, it could be some
16:48
drifting causing disconnect or overlap, whatever. So this is where this process would work. Assuming the anchor line, the road line was initially connected to the anchor line
17:06
at the point of showing as a hook, after drifting away, we can snap them back. So if you feel not uncomfortable, if you've got like thousands of these kind of cases, we can issue something like a marriage certificate. So we can, by using ID, by using
17:26
whatever. So this road line is connected to this hook, it's going to go back to this place, doesn't matter how many meters you drift away. Or versus, in most cases, depending on the mapping of the scale of mapping, in most cases the drifting might be by centimeters, by meters,
17:43
or tens of meters, but it won't be hundreds. If you do, you do get into cases where something's being away for hundreds of meters, maybe that's something different. Anyway, so either you can pair them up, so you know for sure this particular anchor point is going to be snapped to this hook, there's something you can do it, or versus just
18:04
apply some simple geometric snap. And so after you have done that, you can, everything's connected. You can form your new polygons for the whole area, or you can just form the new polygons for the updated area. What I want to make a point here
18:25
is, you know this line, because it's in your corporate database, this line, the anchor line has never been taken down this stage. So that means all the polygons outside of this area are using this line as boundary, because nothing happened, there's no modification.
18:44
Versus all the polygons inside here, they're also using them from rinshaw. So there's start T, there's no gaps, no overlaps, no slivers along this area. So essentially, the only thing we need to do here is run some really, really simple
19:00
geometric snap, or using pairing process. So basically replacing the coordinates at the point of the hook on the anchor line. So essentially the edge matching is fully automated. There's no human intervention here at all. So once you have done that, obviously you can
19:24
produce new polygons, put on the labels, you can run some kind of cartographic enhancement, produce the final products. So in our case, essentially in the back end, everything's lines
19:40
and centroids. So the polygons become a view of the data as a final product, as a product facing the end client. So the client doesn't really actually see everything in the back, like the centroids. The whole process was developed in Postgres, PostGIS. So the process of
20:05
checking out, anchoring, and integration, they are fully automated. Just a few messages. These problems can be totally avoided by not having polygons in the map compilation,
20:23
updating, and integration process. And also the next message is when you have have the fund to purchase expensive tools, I would suggest take a good look. Sometimes ask the hard question, do we really need this expensive tool? Do we really need to have a
20:44
product? Do we really have a problem here? And so the framework data model, the anchoring process, they are fairly easy. They're really simple, in my opinion anyway, because right now we only deal with the lines points, right? What can be more complex than that?
21:03
And the whole thing can be developed, implemented in the open source database. And for us, the PPSD is over, is cured. We don't have it. Thank you very much. Any questions?
21:29
What happens if your fault line is within the red area? Yeah. And it goes out outside of it too? Right. So you just have two segments? Yes. In our framework database,
21:48
everything is fully segmented. That means anywhere there's an intersection, the line is, it will be noted. Yeah. So like here, this is the fault continuous, right?
22:09
But the fault will be broken here. Maybe that's not the best example. Anything else here?
22:25
No, because those segments, they will be noted at the anchor line, right? So in this case, if I have to, if it happened to be here, this line will, like this fault will be cutting into pieces, right? It's not a, they can't, it's not a continuous piece.
22:46
Yeah. You kind of touched upon this a little bit earlier. Did you have some sort of a threshold or tolerance when you were doing this? So if I'm x meters away from my hook,
23:03
then I don't attach it. Or if I am this close, then I will attach it or something like that. Yeah. The kind of tolerance in our case, because our geological map, someone could be mapped at the one to 10,000 scale, but in general, they are mapping at
23:23
one to 50,000 scale. So at the one to 50,000 scale, and in a map like this, even you give like 10 meters, 20 meters, that'll be fine. But what I found in the modern days, people using the kind of like GIS tools after round trip, if they check
23:44
the data out like this, usually they shouldn't see hundreds of meters of drifting. But we did have a special case where we have a really large area, the map being taken out, and from Albers projected to UTM, and within UTM, even a single line used to be a line run this, right?
24:08
And the one got into UTM, and they cut somewhere in the middle. They didn't do anything, but this is a really large area. So when they return the map, there's a 200 meters drift.
24:21
And so sometimes, let's say this line has a few thousand meters, if you cut a line in the middle, you didn't get cut outside, right? So this line could be modified, especially,
24:43
I think the real case here is you have a perfect screw line, and you cut it somewhere in the middle. That line, the moment you put a node in the middle, it's not going to be a screw line. And if you pre-project it somewhere else, it could have caused all kinds of problems. So usually, because our map actually started, the original compilation is in UTM,
25:06
because you only update a small area at one time. So it was always compiled in UTM, and so once we merge them all together, you know, get them all joined,
25:21
it's either in magnitude, longitude, or in hours, and usually we don't like to do any process, in terms of densification or simplification. If you have to run that, we will always take the map out, run those processes in UTM, because that's a little bit more truthful to what was originally compiled. So anyway, in short, to answer your question,
25:43
because we map at 1 to 50,000 to 1 to 10,000, so we can accept meters, or up to 20 meters, that kind of tolerance. But if you map at 1 to half a million, or 1 to 5,000, then those things need to be adjusted. Yeah.
26:07
Could you elaborate on how your centroid approach compares to classical topologies, where you define the points and then define a line that connects these points, and then a polygon is defined as the sum or the sequence of certain lines,
26:26
then you can't have diverging or overlaps anymore. Okay, yeah, understood. We did look at bringing some kind of topology
26:40
way to manage our data, and everything we have looked at, it turned out to be so complex. So what do we have here? Actually, there's actually no topology per se. So all the lines, they're all together. But we do run, we have to make sure anywhere there's an
27:00
intersection, it must be noted. And if you really want to form polygons, let's say in this case, right, you cannot know it, form polygons, all that problem would occur. So we have run, so essentially we give this area, this is some new maps coming in, the only work we have to process is to make sure the map coming in from our mappers,
27:25
they have to be fully noted at every possible intersection. And sometimes if it's a little short by two meters or two centimeters, we need to run some process to detect those spaces and fix them up. So there are cases where people, the geologists would say, well, no,
27:44
I don't want to connect that because I left a gap of two centimeters for reasons, because I just don't want to break that into, cut that into. So in this case, well, we said about two centimeters, it's really easy to, if you get the map or, you know,
28:03
give some process, it could have actually become overlap, right, or crossing. So we'll say, well, is that really two centimeters? Can we make that one two meters? If the two meters is almost guaranteed, it's not going to cause problems. So I think the answer is no, we don't have any topology. We try to keep these things really, really simple. And so that's another way.
28:23
When we form polygons, the number of polygons and the number of centroids is a way to validate, do I have too many polygons or too few centroids for these two? If there are some differences there, we've got a problem. So essentially every polygons we form from this, the updated
28:45
framework data, you will have to have centroid to represent the attributes. You have two centroids, you've got a problem. You have once you have no centroid, then something's missing. So that's a way to validate each other for what could happen there. But we really
29:03
try to keep it really, really simple. Hello, only a short remark. I think what you have
29:21
presented here is more or less a real mention of a topological data model, which was very popular in the 90s, 1980s. It was also used back info with the coverage and also is now used by cross. So if you're using such programs, I think it would be very the same
29:42
because in this programs also, every polygon is defined by the lines and the centroid. Polygons are defined as centroid. Yeah. Polygons, the attributes of polygons are always defined by the centroids and the geometries are defined by the lines, which is around these centroids. That's a topological data model,
30:04
which is very common and is used by a lot of programs. I could give you many, many more examples. If you have to merge two. Yeah, I know this. It's the process was called cleaning
30:24
in arc info. There's also a topological post which has been trying to find a big developer to get a little more what's going on there. I think anything else, either there's a system
30:46
behind the scene has to work really hard to keep up what's going on here. What I can do is I have many more specific cases where if you are dealing with polygons,
31:00
there's some more cases where it's really hard to keep up with some of the use cases. When we started with polygons, we even have polygons, small polygons sitting behind big polygons. How can I find it? Also, small polygons along the line. You can't even see it
31:24
because the polygon is so thin. It's like a pipe or tube. Yeah, I know this. 0.00001 meter running across the line and there's a little bit here, a little bit there. My only remark was this is not an invention by the PGD or how to call it is a very old
31:43
thing. It's a topological data model. That's the only point. I'm also a geologist and I know the geologists. I'm sure if geologists are going into the field, they wanted exactly to edit the green lines. Essentially, part of the reason we
32:04
proposed this, we developed this process is when you give a piece of data to a geologist, you can't assume he's not going to bring your database with him. He will take a piece of data to the field, work on whatever GIS. Again, we can't dictate which GIS tools you have to use.
32:24
Essentially, you have to give him several data. It's easy for him to manipulate. Once it's done coming back to us, it's easy for us because we might hire an intern, a student to work on the data to do some cleaning up work. Anything beyond that is a little too complex.
32:45
I know we can find tools to manage all the complexity. What happened to me is I have seen so many different cases which could pop up. Once I finally had small polygons hanging behind,
33:03
when you immediately can see it, there are some small ones. They're so small. It doesn't matter how much you zoom in. You can see it. It's 0.001 on one side, 0.001 on the other side. So we usually can see it. Those are just the type of things we're trying to avoid in the beginning.
33:23
But I can talk to you a little more, find out if there's some other good topological suite out there which we can use anyway. Because this is a provincial repository for all the geology,
34:06
so essentially we're trying to accommodate the map at a different mapping scale. So it's a single map. This is not a final product. It's an integrated repository,
34:21
all the provincial geology. So we could have area mapped at quarter million, half a million, versus some area mapped at really detailed because we have some good mineral potential. Someone could be mapping at 10,000. So versus your adjacent area, it's mapped at quarter million. So the difference could be huge. That's why we're not only having geometric
34:47
data boundary problem, but also we have geological boundary problem. We could have a border. We think here. You have all kinds of details. Beyond there, there's no detail. So you know something's not right. So we have to create some boundary,
35:04
geological boundary called data boundary. It's not real. It's just the limit of mapping. So we just map to here. We know the geology within this border. Beyond, we don't know. But we have to respect what happened historical because the area has not been updated. That's
35:23
just the case. We have to leave that away. Yeah, we do have metadata to keep track who did the mapping, when, what was the scale. Yeah, we do have some details. Yeah,
35:44
some details anyway. Well, if no further question, thank you very much for attending.