We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Anchoring and PostGIS cure Post-Polygon Stress Disorder

00:00

Formal Metadata

Title
Anchoring and PostGIS cure Post-Polygon Stress Disorder
Title of Series
Number of Parts
188
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Producer
Production Year2014
Production PlacePortland, Oregon, United States of America

Content Metadata

Subject Area
Genre
Abstract
Polygons are great to have in digital maps, much like a canvas that we can render with beautiful colours. It is common that polygon boundaries are shared by linear features (e.g., municipalities divided by a river or a road). If polygons are used as part of the base to edit, update, and integrate digital maps, we have to reconcile the geometric differences among the shared boundaries and fix topological problems in edge matching. For many years we felt blessed that commercial software tools are available to reconcile shared boundaries, and to detect and fix topological problems. However, if wrestling with polygons leaves you feeling buried in slivers, discontinuities, gaps, and overlaps, you've got Post-Polygon Stress Disorder (PPSD). PostgreSQL/PostGIS presented British Columbia Geological Survey an opportunity to identify the causes of PPSD. As a result, we have developed a geologic framework data model and implemented an anchoring mechanism in PostGIS to simplify the process of editing, updating, and integrating digital geological maps. We have dispensed with polygons and eliminated the problems from shared boundaries and edge matching.Healing to PPSD is available in this poster:http://www.empr.gov.bc.ca/Mining/Geoscience/PublicationsCatalogue/GeoFiles/Pages/2014-9.aspx.
Keywords
25
74
Thumbnail
29:15
MappingPolygonWave packetCASE <Informatik>BitINTEGRALMultiplication signTexture mappingCombinational logicGraph (mathematics)Product (business)Entropie <Informationstheorie>Compilation album
PolygonHTTP cookieAreaTexture mappingCASE <Informatik>DatabaseField (computer science)Film editingFile formatSpecial unitary groupComputer animation
Vector graphicsPolygonGeometryMereologyType theoryMatching (graph theory)Covering spaceData modelAttribute grammarBoundary value problemPoint (geometry)Line (geometry)Data conversionMultiplication signTexture mappingSoftware frameworkTerm (mathematics)AreaDifferent (Kate Ryan album)CASE <Informatik>Vertex (graph theory)CausalityProduct (business)Process (computing)Mechanism designShared memoryExecution unitDirected graphGraph coloringForm (programming)SpacetimeMaxima and minimaFilm editingResultantWhiteboardAxiom of choiceView (database)DecimalError messageBusiness modelUniform resource locatorRight angleInsertion lossDot productINTEGRALCompilation albumUniformer RaumFormal languageMeasurementSimilarity (geometry)Denial-of-service attackFocus (optics)BitGraph (mathematics)Phase transition
AreaObservational studyBoundary value problemBuffer solutionPolygonPC CardLine (geometry)Form (programming)Field (computer science)Context awarenessLibrary (computing)Disk read-and-write head
Software frameworkPolygonDifferent (Kate Ryan album)Roundness (object)AreaStructural loadBuffer solutionTexture mappingFilter <Stochastik>Term (mathematics)Process (computing)MultiplicationDescriptive statisticsBusiness modelBitLine (geometry)Point (geometry)Game controllerProjective planeTable (information)Food energyOrdinary differential equationHookingPhysical systemEngineering drawing
Scripting languageTerm (mathematics)Descriptive statisticsWordBitDatabaseProcess (computing)Pointer (computer programming)Line (geometry)WebsiteAreaTexture mappingWater vaporLimit (category theory)Boundary value problemAngleEuler anglesAxiom of choiceFormal languagePolygonCASE <Informatik>Key (cryptography)Point (geometry)Right angleAttribute grammarExtension (kinesiology)Hooking
CASE <Informatik>Point (geometry)MetreProcess (computing)Line (geometry)10 (number)Public key certificateInheritance (object-oriented programming)Scaling (geometry)Texture mappingDifferent (Kate Ryan album)Existential quantification
Connectivity (graph theory)PolygonCurvatureFunction (mathematics)Point (geometry)Client (computing)Video gameProduct (business)AreaForm (programming)View (database)Line (geometry)Front and back endsCASE <Informatik>Process (computing)Matching (graph theory)Hooking
Message passingINTEGRALProcess (computing)Denial-of-service attackGrass (card game)Software frameworkPolygonOpen sourcePoint (geometry)Business modelCompilation albumTexture mappingAreaComputer fontDatabaseExecution unitRight angleLine (geometry)
DatabaseMetreThresholding (image processing)QuicksortRight angleTexture mappingScaling (geometry)CASE <Informatik>Observational studyTouch typingAreaLine (geometry)Unified threat managementProjective planeBusiness modelWordMultiplication signVideo gameGame controllerComputer programData modelTopologyView (database)Cellular automatonSet (mathematics)Field (computer science)LogicSubsetMathematical singularityPlastikkartePoint (geometry)WritingNatural numberState of matterSequenceDifferent (Kate Ryan album)Traffic reportingInsertion lossSummierbarkeit1 (number)Repository (publishing)TheoryProduct (business)Group actionUniverse (mathematics)PiSource codeNetwork topologyPolygonProcess (computing)Software frameworkBoundary value problemInformation technology consultingLimit (category theory)Scale (map)MereologyTrailMetadataType theoryRoundness (object)Moment (mathematics)Perfect graphTerm (mathematics)MappingCompilation albumBitVector potentialGoodness of fitSuite (music)Form (programming)Attribute grammarNumberFilm editingExistential quantificationGeometryInformationDemosceneSoftware developerDirected graphComplex (psychology)Student's t-test
Transcript: English(auto-generated)
Well thank you for staying for my talk. This morning I went to two great talks. One is about tool making. The other one is for keeping things simple. I have some specific cases what you can do for using tools and what we can, what we should do for keeping things
simple. There are some specific cases. Both are using polygons and the kind of problems can be caused by having polygons in map compilation, updating, and integration. And I will give you some specific cases which cause something I call the post-polygon stress disorder,
PPSD. And if you stay with my talk hopefully afterwards you're going to be cured or won't have a PPSD anymore. So just a little bit of background on the cases and then specific cases
and then our solutions. I'm a geologist by training and we do a lot of geological mapping for the province of British Columbia in B.C. And just a few examples. We do need polygons. We need polygons to define our, to capture our features and we do need polygons to represent
the final map products. And for the province of British Columbia we got hundreds and hundreds of these kind of maps covering the province. And just by the way the size of British Columbia is the Washington, Oregon, and California combined. It's almost four times bigger than UK.
And so over the years we have been compiling and integrating those individual maps and come with this single integrated seamless digital coverage for the province. Now the use case right now here is we've done our first mapping. Now we want to update one
of the areas. So it kind of makes sense you would do a cookie cut for this map area, you know, take a copy, cut it out, and the mapper or our geologist will take it to the field.
Hopefully in a year or two years he finished his mapping and they updated the map for this area. Ideally we can just drop it back to the provincial database seamlessly without any pain or any work. But it doesn't happen that way. So there's all kinds of things that could
happen along the map edge. But also even for updating the map within the area there's all kinds of cases where when you use polygons to update the kind of like a problem we'll have. There are
many many problems but I will focus on just two. One is what we called shared boundaries. The other one is called edge matching. I remember in the early 1990s the first GIS course I took there's a lot of those pages, pages, chapters on edge matching and hopefully after my talk
you will find out the edge matching for me is a history. There's no more edge matching anymore. So for shared boundary just some examples to show you the specific cases. This is a part of a geologic map. We have unit A and unit B share a common boundary in between. But this
doesn't have to be bedrock geology. It could be land use, could be cadastral, could be municipal boundaries, you name it. And what happened here is not only two lines share the same boundary for the two polygons. In this case we also have a fault cut through here. So the contact
between unit A and unit B is also a faulted contact. So here really we have three, minimum three features that occupy the same space. Now I went ahead of myself a little bit
here. So when we need to update one of the features, so let's say the fault has been remapped and we know this fault is the boundary for unit A and unit B. So right away you're going to find some problems here. It doesn't matter what you do. You can spend all your time
manually trying to adjust the geometry for polygon A and for polygon B. Quite often what you're going to find out is by the end of the day you will have gaps, you will have overlaps along the boundary, both between the polygons and also between the polygons and the line work.
So that's the first case. The second case is something called edge matching. So we could have map A, something we mapped earlier, and we mapped the adjacent area. We have map B. Obviously you see some differences there in terms of geometry,
but also the attributes. Ideally we can get them to resolve all the boundary issues, get the map merged seamlessly. It doesn't really happen that way often in the real world. So what we will have here along the boundary between the polygons, you're going to have
gaps, overlaps, slivers, and the lines may not join, could have overlap, could have disjoint, and the attributes in terms of the map units, they may not be consistent across the border. So I have seen places where people purchase expensive tools and hire a team of
GS technicians working on this day in, day out, weeks, months, on adjusting those kind of, trying to resolve those kind of problems. And when I thought about it, you know,
what's the results? There's people spending so much time doing that, and there's a low productivity. When you have your hands on your mouth for the whole day, you've got the injury to your wrist, to your shoulder, right? So it's not too far to get into the PPSD, right?
So the big question here is, again, just relate to what I hear in the morning. There, you can spend all your time or spend your money to purchase, to develop tools, but sometimes you have to ask a question, you know, do we need, you know, can we, can we avoid these problems in the first place? And so to try to avoid these kind of
problems, we have to find out the cause of this problem. It's not too different in terms of geology. What we do is we go to a point location, we define, we identify the boundary,
and from lots of point locations, we join the dots, we form the line work, and out of the line work, eventually we're going to create this bedrock geology, you know, form polygons, color them, create a legend, and cartographically enhance, have them published, right? We actually started with points, lines, so the polygons is not there in the first place.
So really I think the polygon is the cause of the problem, and we should get rid of polygons in map completion, map updating, editing, and also integration. So the solution,
we kind of developed this idea, again, perhaps nothing new, follow what we, you know, what we always do. Really in the back end, in the source of the data, what we really need to keep up maintaining is the line work and the points representing the geologic units. So this is
the only two things we need. So I put in this term called a geologic framework data model. It doesn't have to be called that way. For lack of terminology, I know framework means a lot of different things to different people, but just for the sake, we need some name here, so we call them like GFD for short. So essentially, the lines can be geologic contact, could be false.
So in some other cases, could be the boundary for land use, land parcel, could be the boundary for the river, for the municipal, whatever. And the points are the centroids representing some attributes describing the land use, land cover. So essentially we just need
these two type of geometries to represent our data. So by the time you need to create your final products, you can easily convert lines into, create polygons from line work
and populate the attributes from those points. So just give you a quick example, a simple example. In the province of British Columbia, we have one million vertices to define the geology. So out of the one million vertices, we have hundreds of thousands of line work,
and it takes us less than three minutes to create 32,000 polygons within post GIS. So thinking from what you have on the left to the right, it's really, really simple process. It's really quick. It doesn't take long, not like in the early 90s, you know, when we formed
thousands of polygons, we have to run these things over the weekend. So the idea, the framework data model with the lines, only lines and points, also allow us to develop another process, which we call the anchoring mechanism.
This is the process. We can totally avoid any problems in the edge matching. Let me explain some detail here. So the first one is data checking, checking out. So it's very similar,
like you checking out a book from a library. So before our geologist head into the field, he will, you know, one of them will give us the study area as outlined by the dotted, the black dotted lines. So from those, from the study area boundary, we're going to select not
only the geology within the area, but we're going to use that one to select all the polygons had something to do with this updating area. And from this extended
context, we're going to form a buffer, a tight buffer, and then we're going to use this buffer to select our framework data, which are lines and centroids. So polygons are useful here, right? We need the polygons for the initial filtering. But once we did that filtering,
we threw that away. So we essentially, we needed this buffer to select everything within the area. So this is our first step. The next step is to, okay, before I went too much ahead, a simple example here. If you take the data out from here, take it out, and run some round trip,
put into different GIS packages, do some round trip of map projection, and then return the data back here. And if you don't have a precision model or some kind of control,
I can guarantee you the map you returned, the data, like the framework data you returned for this area, they're not going to be, they're not going to match what we had there before. So this is a kind of like a well, some kind of well-known and understood, well-understood phenomenon, what we call the coordinated drifting. So essentially, if you take a piece of
data, run through multiple processes in terms of map projection, load into different systems, the data once come out of that process, all the coordinates will drift around.
So unless you have some magic, otherwise you can't avoid this. So let alone, you're going to do some editing. So how to control this kind of drifting? We borrowed some nautical terms, something called anchor line, road line, hook, and anchor point.
So road line, R-O-D-E, actually is a term describing the lines between the boat to the anchor. There's a little bit of some description for each of the terms, and it's okay if there's a little bit too much description, too much words here. You will see the actual
definition from some graphics later. So what do we have here on the database side? We can automatically anchor it. So the outermost line, the one showing up in red, is something we're going to tag as anchor line. So any lines connected to this anchor line is called a road
line, and here that's where we have a node, and it becomes the hook on the anchor line, and the end of the green line is the anchor point. Make sense? Maybe I have some,
just a few pointers here. So it's essentially, and you will see why do we need to tag this
from some real example. So let's say, so the next step is really taking all this data out, but before takeout we tag them, right? Just anchor line, anchor point, road line, whatever. And this is the package we're going to give to our mappers. So it could be the same kind
of scenario. Let's say you need to update the cadastral. So if you run this kind of process, there will be some additional data need to be taken out, packaged, and for the GIS technician to update. So in our use case, the map will be taken out and get updated by the geologist.
This could take six months, a year, two years, depending how big is the area. Sometimes could go up to three years. So by the end of the update, we'll have a new map coming back. Again, we don't really care about the polygons they have, what we do care about the line work,
and the centroids representing the attributes. So before we return these things back to the province, we're going to drop the anchor line they have. So the anchor line for them, it's really just some kind of boundary, some kind of limit. Those are the lines you don't want to touch, you don't want to modify. If you do need to modify, that means that you're
mapping area extended. You need to come back, we can get you another checking out process, just extend the area even further. So this is the one, what we wanted. Now back to the provincial database. This could be your corporate database. So the first thing we're
going to do here is within the corporate database is we're going to retire everything had been checked out. And the next step is we're going to drop the updated, as you would expect, because the drifting, right? So sometimes could be some modification as well. The road line where it's supposedly connected to the anchor line, it could be some
drifting causing disconnect or overlap, whatever. So this is where this process would work. Assuming the anchor line, the road line was initially connected to the anchor line
at the point of showing as a hook, after drifting away, we can snap them back. So if you feel not uncomfortable, if you've got like thousands of these kind of cases, we can issue something like a marriage certificate. So we can, by using ID, by using
whatever. So this road line is connected to this hook, it's going to go back to this place, doesn't matter how many meters you drift away. Or versus, in most cases, depending on the mapping of the scale of mapping, in most cases the drifting might be by centimeters, by meters,
or tens of meters, but it won't be hundreds. If you do, you do get into cases where something's being away for hundreds of meters, maybe that's something different. Anyway, so either you can pair them up, so you know for sure this particular anchor point is going to be snapped to this hook, there's something you can do it, or versus just
apply some simple geometric snap. And so after you have done that, you can, everything's connected. You can form your new polygons for the whole area, or you can just form the new polygons for the updated area. What I want to make a point here
is, you know this line, because it's in your corporate database, this line, the anchor line has never been taken down this stage. So that means all the polygons outside of this area are using this line as boundary, because nothing happened, there's no modification.
Versus all the polygons inside here, they're also using them from rinshaw. So there's start T, there's no gaps, no overlaps, no slivers along this area. So essentially, the only thing we need to do here is run some really, really simple
geometric snap, or using pairing process. So basically replacing the coordinates at the point of the hook on the anchor line. So essentially the edge matching is fully automated. There's no human intervention here at all. So once you have done that, obviously you can
produce new polygons, put on the labels, you can run some kind of cartographic enhancement, produce the final products. So in our case, essentially in the back end, everything's lines
and centroids. So the polygons become a view of the data as a final product, as a product facing the end client. So the client doesn't really actually see everything in the back, like the centroids. The whole process was developed in Postgres, PostGIS. So the process of
checking out, anchoring, and integration, they are fully automated. Just a few messages. These problems can be totally avoided by not having polygons in the map compilation,
updating, and integration process. And also the next message is when you have have the fund to purchase expensive tools, I would suggest take a good look. Sometimes ask the hard question, do we really need this expensive tool? Do we really need to have a
product? Do we really have a problem here? And so the framework data model, the anchoring process, they are fairly easy. They're really simple, in my opinion anyway, because right now we only deal with the lines points, right? What can be more complex than that?
And the whole thing can be developed, implemented in the open source database. And for us, the PPSD is over, is cured. We don't have it. Thank you very much. Any questions?
What happens if your fault line is within the red area? Yeah. And it goes out outside of it too? Right. So you just have two segments? Yes. In our framework database,
everything is fully segmented. That means anywhere there's an intersection, the line is, it will be noted. Yeah. So like here, this is the fault continuous, right?
But the fault will be broken here. Maybe that's not the best example. Anything else here?
No, because those segments, they will be noted at the anchor line, right? So in this case, if I have to, if it happened to be here, this line will, like this fault will be cutting into pieces, right? It's not a, they can't, it's not a continuous piece.
Yeah. You kind of touched upon this a little bit earlier. Did you have some sort of a threshold or tolerance when you were doing this? So if I'm x meters away from my hook,
then I don't attach it. Or if I am this close, then I will attach it or something like that. Yeah. The kind of tolerance in our case, because our geological map, someone could be mapped at the one to 10,000 scale, but in general, they are mapping at
one to 50,000 scale. So at the one to 50,000 scale, and in a map like this, even you give like 10 meters, 20 meters, that'll be fine. But what I found in the modern days, people using the kind of like GIS tools after round trip, if they check
the data out like this, usually they shouldn't see hundreds of meters of drifting. But we did have a special case where we have a really large area, the map being taken out, and from Albers projected to UTM, and within UTM, even a single line used to be a line run this, right?
And the one got into UTM, and they cut somewhere in the middle. They didn't do anything, but this is a really large area. So when they return the map, there's a 200 meters drift.
And so sometimes, let's say this line has a few thousand meters, if you cut a line in the middle, you didn't get cut outside, right? So this line could be modified, especially,
I think the real case here is you have a perfect screw line, and you cut it somewhere in the middle. That line, the moment you put a node in the middle, it's not going to be a screw line. And if you pre-project it somewhere else, it could have caused all kinds of problems. So usually, because our map actually started, the original compilation is in UTM,
because you only update a small area at one time. So it was always compiled in UTM, and so once we merge them all together, you know, get them all joined,
it's either in magnitude, longitude, or in hours, and usually we don't like to do any process, in terms of densification or simplification. If you have to run that, we will always take the map out, run those processes in UTM, because that's a little bit more truthful to what was originally compiled. So anyway, in short, to answer your question,
because we map at 1 to 50,000 to 1 to 10,000, so we can accept meters, or up to 20 meters, that kind of tolerance. But if you map at 1 to half a million, or 1 to 5,000, then those things need to be adjusted. Yeah.
Could you elaborate on how your centroid approach compares to classical topologies, where you define the points and then define a line that connects these points, and then a polygon is defined as the sum or the sequence of certain lines,
then you can't have diverging or overlaps anymore. Okay, yeah, understood. We did look at bringing some kind of topology
way to manage our data, and everything we have looked at, it turned out to be so complex. So what do we have here? Actually, there's actually no topology per se. So all the lines, they're all together. But we do run, we have to make sure anywhere there's an
intersection, it must be noted. And if you really want to form polygons, let's say in this case, right, you cannot know it, form polygons, all that problem would occur. So we have run, so essentially we give this area, this is some new maps coming in, the only work we have to process is to make sure the map coming in from our mappers,
they have to be fully noted at every possible intersection. And sometimes if it's a little short by two meters or two centimeters, we need to run some process to detect those spaces and fix them up. So there are cases where people, the geologists would say, well, no,
I don't want to connect that because I left a gap of two centimeters for reasons, because I just don't want to break that into, cut that into. So in this case, well, we said about two centimeters, it's really easy to, if you get the map or, you know,
give some process, it could have actually become overlap, right, or crossing. So we'll say, well, is that really two centimeters? Can we make that one two meters? If the two meters is almost guaranteed, it's not going to cause problems. So I think the answer is no, we don't have any topology. We try to keep these things really, really simple. And so that's another way.
When we form polygons, the number of polygons and the number of centroids is a way to validate, do I have too many polygons or too few centroids for these two? If there are some differences there, we've got a problem. So essentially every polygons we form from this, the updated
framework data, you will have to have centroid to represent the attributes. You have two centroids, you've got a problem. You have once you have no centroid, then something's missing. So that's a way to validate each other for what could happen there. But we really
try to keep it really, really simple. Hello, only a short remark. I think what you have
presented here is more or less a real mention of a topological data model, which was very popular in the 90s, 1980s. It was also used back info with the coverage and also is now used by cross. So if you're using such programs, I think it would be very the same
because in this programs also, every polygon is defined by the lines and the centroid. Polygons are defined as centroid. Yeah. Polygons, the attributes of polygons are always defined by the centroids and the geometries are defined by the lines, which is around these centroids. That's a topological data model,
which is very common and is used by a lot of programs. I could give you many, many more examples. If you have to merge two. Yeah, I know this. It's the process was called cleaning
in arc info. There's also a topological post which has been trying to find a big developer to get a little more what's going on there. I think anything else, either there's a system
behind the scene has to work really hard to keep up what's going on here. What I can do is I have many more specific cases where if you are dealing with polygons,
there's some more cases where it's really hard to keep up with some of the use cases. When we started with polygons, we even have polygons, small polygons sitting behind big polygons. How can I find it? Also, small polygons along the line. You can't even see it
because the polygon is so thin. It's like a pipe or tube. Yeah, I know this. 0.00001 meter running across the line and there's a little bit here, a little bit there. My only remark was this is not an invention by the PGD or how to call it is a very old
thing. It's a topological data model. That's the only point. I'm also a geologist and I know the geologists. I'm sure if geologists are going into the field, they wanted exactly to edit the green lines. Essentially, part of the reason we
proposed this, we developed this process is when you give a piece of data to a geologist, you can't assume he's not going to bring your database with him. He will take a piece of data to the field, work on whatever GIS. Again, we can't dictate which GIS tools you have to use.
Essentially, you have to give him several data. It's easy for him to manipulate. Once it's done coming back to us, it's easy for us because we might hire an intern, a student to work on the data to do some cleaning up work. Anything beyond that is a little too complex.
I know we can find tools to manage all the complexity. What happened to me is I have seen so many different cases which could pop up. Once I finally had small polygons hanging behind,
when you immediately can see it, there are some small ones. They're so small. It doesn't matter how much you zoom in. You can see it. It's 0.001 on one side, 0.001 on the other side. So we usually can see it. Those are just the type of things we're trying to avoid in the beginning.
But I can talk to you a little more, find out if there's some other good topological suite out there which we can use anyway. Because this is a provincial repository for all the geology,
so essentially we're trying to accommodate the map at a different mapping scale. So it's a single map. This is not a final product. It's an integrated repository,
all the provincial geology. So we could have area mapped at quarter million, half a million, versus some area mapped at really detailed because we have some good mineral potential. Someone could be mapping at 10,000. So versus your adjacent area, it's mapped at quarter million. So the difference could be huge. That's why we're not only having geometric
data boundary problem, but also we have geological boundary problem. We could have a border. We think here. You have all kinds of details. Beyond there, there's no detail. So you know something's not right. So we have to create some boundary,
geological boundary called data boundary. It's not real. It's just the limit of mapping. So we just map to here. We know the geology within this border. Beyond, we don't know. But we have to respect what happened historical because the area has not been updated. That's
just the case. We have to leave that away. Yeah, we do have metadata to keep track who did the mapping, when, what was the scale. Yeah, we do have some details. Yeah,
some details anyway. Well, if no further question, thank you very much for attending.