We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Creating Global Edge-Matched Subnational Boundaries

00:00

Formale Metadaten

Titel
Creating Global Edge-Matched Subnational Boundaries
Serientitel
Anzahl der Teile
266
Autor
Lizenz
CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
FieldMaps.io is a personal initiative originally created to develop offline interactive reference maps for humanitarian actors. However, in short time, it transitioned to helping develop common operational datasets that form the foundation for humanitarian response planning. Over the past 2 years, enormous effort has gone into releasing a high-resolution composite dataset able to be updated daily from multiple sources. This talk will cover 3 aspects of the project. Algorithm Edge-matching resolves gaps and overlaps between hundreds of separate national data sources, requiring an algorithm that can perform at global scale. The resulting methodology uses something akin to a euclidean allocation raster applied to vector space, free of the compromises other approaches like generalization and snapping make. If you've ever been challenged by topology or data cleaning, you might find some insights into solving your own problems with the ideas contained here. Pipeline The edge-matching algorithm involves multiple complex and computationally intensive steps. Although Geopandas and GDAL usually come to mind when building multi-step geoprocessing scripts, PostGIS ended up being the fastest and best scaling tool for transforming gigabytes of vector data. I'll challenge your assumptions of how it can be used to create pipelines on both desktops and in the cloud, and make a case for why you should include it in your next project. Sources A composite dataset is only as good as the foundations it builds upon, and great care was taken in selecting which sources were used in this project. For international boundaries, I'll go into detail about how I used only public domain sources to create an ISO 3166 compliant dataset. At the subnational level, I'll highlight two projects that each curate updated administrative boundaries: one by the United Nations, another by an academic institution. Whether you're a remote sensing specialist in search of the best topologically valid boundaries to run zonal statistics with, a Python developer frustrated by your pipelines constantly running into memory limits, or just want to run this tool on your own boundaries, I hope you come away from this talk with a valuable concept you can apply to your own work. Data: https://fieldmaps.io/data Tool: https://github.com/fieldmaps/edge-extender
Thermodynamisches SystemThermodynamisches SystemKollaboration <Informatik>Projektive EbeneSelbst organisierendes SystemComputeranimation
ProgrammiergerätNichtkommutative Jordan-AlgebraBitProjektive EbeneDifferenteZahlenbereichSelbst organisierendes SystemMultiplikationProgrammierungSoftwareentwicklerComputeranimation
Textur-MappingMultiplikationPhasenumwandlungComputersicherheitDesintegration <Mathematik>FlächeninhaltSpeicherabzugEreignishorizontTeilbarkeitPhasenumwandlungTextur-MappingVersionsverwaltungComputersicherheitZahlenbereichGrenzschichtablösungPlotterBasis <Mathematik>Selbst organisierendes System
FlächeninhaltSelbst organisierendes SystemGrenzschichtablösungVorlesung/Konferenz
Textur-MappingMultiplikationPhasenumwandlungDesintegration <Mathematik>ComputersicherheitNichtkommutative Jordan-AlgebraDisplacement MappingWasserdampftafelGüte der AnpassungTextur-MappingThermodynamisches SystemTypentheorieAdditionDatenfeldPlotterVererbungshierarchieMultiplikationDämpfungMapping <Computergraphik>SpeicherabzugKontextbezogenes SystemBenutzerbeteiligungMultiplikationsoperatorEreignishorizontHochdruckInformationVersionsverwaltungFlächeninhaltZahlenbereichDisplacement MappingDifferenteDialektComputeranimation
MatchingThermodynamisches SystemNatürliche ZahlResultanteDämpfungGeradeProjektive EbeneMAPThermodynamisches SystemPolygonProdukt <Mathematik>Selbst organisierendes SystemSystemverwaltung
Thermodynamisches SystemFokalpunktTextur-MappingStatistikMatchingAlgorithmusBetriebsmittelverwaltungMinkowski-MetrikEuklidischer AlgorithmusPolygonGreen-FunktionAnalysisMinkowski-MetrikResultanteGeradeBitmap-GraphikSkalarproduktPunktEin-AusgabeSpeicherverwaltungPolygonThermodynamisches SystemDämpfungAuflösung <Mathematik>Projektive EbeneDialektSechseckVersionsverwaltungSchnittmengeTextur-MappingAlgorithmusTurbulente GrenzschichtMAPRechter WinkelFunktion <Mathematik>DatenfeldWasserdampftafelEuklidischer AlgorithmusMapping <Computergraphik>VektorraumVoronoi-DiagrammComputeranimation
Ein-AusgabeAlgorithmusBitDifferenteThermodynamisches SystemTotal <Mathematik>Computeranimation
AlgorithmusFlächeninhaltGebäude <Mathematik>Selbst organisierendes SystemHierarchische StrukturMAPAuflösung <Mathematik>PolygonFlächeninhaltThermodynamisches SystemProjektive EbeneGüte der AnpassungSchnittmengeMereologieRechter WinkelGrenzschichtablösungMAPSchaltnetzDifferenteSicherungskopieOffice-PaketProzess <Informatik>ElementargeometrieNichtlinearer OperatorDatenbankDokumentenserverInverser LimesOffene MengeOrdnung <Mathematik>DatenaustauschGeradeBitKoordinatenTextur-MappingTopologieSystemverwaltungGreen-FunktionPunktVererbungshierarchieComputeranimation
PolygonThermodynamisches SystemGeradeAuflösung <Mathematik>AggregatzustandSelbst organisierendes SystemBenutzerfreundlichkeitMaßstabSchnittmengeThermodynamisches SystemComputeranimation
Thermodynamisches SystemGeradeAuflösung <Mathematik>PolygonSelbst organisierendes SystemMaßstabZustandsdichteInformationThermodynamisches SystemRechenwerkSchnittmengeInformationAggregatzustandZentrische StreckungGeradeVererbungshierarchieComputeranimation
Thermodynamisches SystemAggregatzustandSelbst organisierendes SystemGeradeAuflösung <Mathematik>PolygonRechenwerkInformationMaßstabInverser LimesAuflösung <Mathematik>SichtenkonzeptPunktVererbungshierarchieInternetworkingComputeranimation
Thermodynamisches SystemAuflösung <Mathematik>GeradePolygonSelbst organisierendes SystemRechenwerkAggregatzustandInformationSichtenkonzeptGeradeSelbstrepräsentationTextur-MappingOrdnung <Mathematik>Thermodynamisches SystemComputeranimation
Thermodynamisches SystemTextur-MappingGeradeThermodynamisches SystemAuflösung <Mathematik>VersionsverwaltungComputeranimation
Thermodynamisches SystemPolygonThermodynamisches SystemPolygonDatenbankGüte der AnpassungPaarvergleichComputeranimation
Thermodynamisches SystemSelbst organisierendes SystemJensen-MaßMailing-ListeStandardabweichungCodierungDivisionStatistikDreiInverser LimesElektronische PublikationWeb SiteFlächeninhaltComputeranimation
StandardabweichungThermodynamisches SystemEreignishorizontSelbst organisierendes SystemSelbstrepräsentationDivisionPolygonVersionsverwaltungSichtenkonzeptDifferenteVersionsverwaltungSchnittmengePermutationGrenzschichtablösungAutonomic ComputingComputeranimation
AggregatzustandPolygonZustandsdichteDivisionSelbstrepräsentationDatenfeldThermodynamisches SystemSichtenkonzeptVersionsverwaltungTextur-MappingRechter WinkelZeitzoneAutonomic ComputingIndexberechnungComputeranimation
SelbstrepräsentationPolygonThermodynamisches SystemSichtenkonzeptZeitzoneVersionsverwaltungAutonomic ComputingIndexberechnungDatensichtgerätThermodynamisches SystemSchnittmengeMultiplikationsoperatorDämpfungMathematikComputeranimation
PolygonPunktThermodynamisches SystemZentrische StreckungGeradeGruppenoperationPolygonBitBildschirmmaskeComputeranimation
PolygonGeradeTemplateThermodynamisches SystemPolygonBildschirmmaskeGeradeThermodynamisches SystemAlgorithmusComputeranimation
RückkopplungEin-AusgabeThermodynamisches SystemStrom <Mathematik>ResultanteSystemverwaltungMAPThermodynamisches SystemAuflösung <Mathematik>VererbungshierarchieBasis <Mathematik>Computeranimation
RückkopplungTotal <Mathematik>Virtuelle MaschineEin-AusgabeStrom <Mathematik>Thermodynamisches SystemTeilbarkeitOrdnung <Mathematik>Bit
RückkopplungEin-AusgabeStrom <Mathematik>Thermodynamisches SystemVierGemeinsamer SpeicherURLSchnittmengeBenutzerbeteiligungComputeranimation
Thermodynamisches SystemMAPProjektive EbeneSpeicherabzugSchnittmengeComputeranimation
E-MailKartesische KoordinatenProjektive EbeneComputeranimation
Transkript: Englisch(automatisch erzeugt)
Hi everybody, I'm Max Malinowski. I currently work at Space4Good based in the Hague, Netherlands. But I'm here to talk to you about a personal project that I've been working on in collaboration with humanitarian organizations for the past couple of years.
Creating global edge-matched sub-national boundaries. And so before I dive into that, just kind of a bit about me. So I've worked, so kind of the background of why I kind of got into this project is I've worked for a number of years with different international organizations across multiple countries.
Worked in Jordan, DR Congo, South Sudan and Nigeria. With organizations like Reach Initiative, ACTID, World Food Program and the UN Development Program. And so in these countries, the kind of mapping that I would do in these places was a lot of thematic mapping. The amount of core plots that I've made probably numbers into the thousands.
These were over a whole bunch of stuff like multi-sector needs assessments where we would assess a population for their needs across food, health, shelter, economic and market factors. And then create core plots that then showed the severity of those needs across a given area.
Also did a lot of work doing rapid needs assessments. So after a conflict event or a natural disaster, doing a quick needs assessment of what the target population needs. And then in the map that you see over here, this is the integrated food security phase classification. Which on a regular basis gets the host government and international organizations together to map the severity of likely famine in a particular area.
And so all of these types of mapping needed good boundary data. In addition, I did a lot of reference mapping while out in the field. In addition to a lot of core plots and thematic maps, I did a lot of reference mapping. So this is the Zap3 Refugee Camp in Jordan, which I spent about a year working in. Also a lot of population displacement maps, particularly cross border stuff.
Crossing different international borders. And infrastructure maps, water infrastructure or latrines, that kind of thing. But the number one kind of request that I kept getting over and over again was to make really general purpose country maps. That people can essentially use as a blank canvas to essentially draw on or create additional layers on top of.
And this is kind of an example of what one of those country maps looked like. I did some in print versions, some in web versions. And these end up being quite high demand. What ended up being even more kind of requested and more difficult to do was something like this. Like a regional map.
Something super high detailed with a lot of reference names. So when an event happens at the border of this country, the names of the areas where these events are occurring in are really important to communicate the context. And getting that kind of contextual information usually requires collating data sources from multiple countries together.
And when you get data sources coming from different places, this makes it really difficult to put it all together. So this map that you see here looks really nice, but took a lot of effort to actually put together. These came from six different sources and all needed to be cleaned by hand. And every time something updated, it all needed to be redone from scratch.
And so this is kind of when I started looking at what would it take to make a holistic, easy to use boundary dataset. And I started looking to see what was already out there. And so everything that I found wasn't quite satisfactory.
The one that everybody keeps pointing to is GADM. I'm sure a couple of you here might have heard of this. It's very global. It covers the entire world. But there's a lot of issues. A lot of polygons are generalized and there's a lot of simplification that the project uses to achieve really clean edges.
And sometimes the results are just very wrong. And so somebody working in this country would look at these lines and kind of just really not know what to do with it. There was an alternative, GAUL, which was created by FAO. But it wasn't really clear to be used outside of UN organizations. So not really useful if creating public products with. And then finally, there was Natural Earth, but it only went down to administrative level one.
And some of the boundaries they use are a little controversial due to their de facto boundary policies. So this is when I started getting into looking and seeing how difficult this would be to put together from scratch. And so one of the reasons why I thought it would be useful to put together a project like this is because it makes regional maps much more aesthetically pleasing to make.
And also when you're taking things like raster datasets that have global coverage, it's much more useful to aggregate that to an administrative level than it is to a random hexagon or a square grid. And so I kind of started doing this back when I was working out in the field in 2018.
But this ended up taking a lot longer than I thought it would. And so this is kind of the road to how I kind of put together a global boundary dataset in kind of three steps. So step one is the algorithm itself. So this is kind of like one of the really kind of problematic issues that I ran into.
Like a tripoint between three different countries. On the left hand, you see this is what the original data sources show when you put them all on top of each other. And then on the right, that's kind of what you want. And then rather than doing this manually, I was looking for a way to just automatically get the results on the right. And so eventually, kind of a long story short, is I wanted an algorithm that would handle
gaps, overlaps, internal holes due to water, also external holes, lakes, hung boundaries, and handle islands pretty elegantly. And kind of the result that I landed on was something similar to the Euclidean allocation algorithm. If anybody's used kind of like raster analysis, where you allocate empty space to
the nearest value, I kind of implemented something like this but for vector space. And kind of how that worked is I would feed a boundary into this algorithm that I designed. And it would place dots, little tiny points all along the edges of that input polygon. So these are the points where the Voronoi polygons would be created from.
What it does when you create all of these Voronoi polygons from the edge dots and dissolve them all together based on a shared edge that they form, you get something nice like this. That traverses midpoints, river and water bodies really nicely. So that if you need to change the land boundaries from a low resolution
version to a high resolution version, the end result ends up being nice and clean. And so this is kind of what the end result of that output looks like. It kind of just essentially dissolves the edges of the polygons and gives you this kind of weird abstract looking polygon.
The benefit of this is that it doesn't really matter what you're clipping it to because that comes at the very end. So you don't need to know kind of what the end result looks like before you kind of run this algorithm. And so here you see this is what it eventually gets clipped to if you clip it to an international boundary layer. And then this kind of just shows like the before, middle and after kind of step. In green this is the original dataset as it's kind of like taken from the source.
And then in blue this is everything that gets extended out. And then red is kind of what the end result ends up being. And so the red is like this follows some kind of international boundary line. And so originally this I've gone through like I use a couple different technologies.
Originally this was all kind of written in QGIS. This ended up being really slow for something like taking the Canadian boundaries. This would take maybe up to three days to run in total. Then switch everything over to PostGIS and the entire thing would all run in like 30 minutes. And so this kind of maybe a convert over to PostGIS. And just like it's a little bit more difficult but the speed improvements ended up being really kind of worth it.
And so if you have your own datasets that you want to do some processing with this on. The tools available for download at GitHub repository. It's just a Docker tool that's wrapped up. And you just kind of do a Docker compose up in order to run it.
And so that was the first step. So then step two was looking together at some data sources. So what are good primary data sources that could be put together to make a nice global administrative map. So in order to make a dataset like this it would need kind of three parts. First just a good source of sub-national boundaries.
All coming from their own national sources. And then something of a clipping layer. And this would be a combination of some kind of coastline or land area. With some kind of international boundary lines. And so going through each of the different datasets that I actually used in this. Because I come from a humanitarian background.
I used a source from the Office for the Coordination of Humanitarian Affairs. They have a dataset called Common Operating Datasets. Which is available on the Humanitarian Data Exchange. And for the countries that they operate in. These are usually developed directly with coordination with the host governments. And are usually quite authoritative and really high quality.
Some of the limitations behind this is it's kind of a bit of a manual extraction process. They don't really quite have an API for you to automatically get this. And so there's a little bit of a manual cleaning step. Some layers like this one for Nigeria have different levels of coverage. You'll have like a super detailed source for administrative level 3.
But it only covers a corner of the country. And so you kind of have to know if you're doing a global dataset. Which layer you can actually use for a full country coverage. And additionally on the top right you can see in green. It doesn't really cover the whole world. But just kind of countries that they have an operational interest in. And so since this was kind of the dataset that I wanted to use.
But didn't have full worldwide coverage of. There was an additional one that I used as a fill in backup. And this is one by the William and Mary GeoLab based in the US. They have a project created called GeoBoundaries. Which attempts to create an entire database. Which uses open license compatible datasets.
To build a repository that researchers and really anybody can use. To include this kind of data in their own projects. So the strengths of it is they have an API. They have global coverage. But they've kind of gone to the approach of taking each layer from separate sources. So here on the right you'll see an administrative level 3.
The 3's in green and the 2's in red. And you can see the two layers don't quite align with each other. So if you're looking to make something that's strictly hierarchical. It requires a little bit of data massaging in order to get it to that point. But if you overcome those limitations it's quite a good source.
And so those are the two sources that are used for sub-national boundaries. For coastlines and land areas OpenStreetMap is a great source. They have land boundaries that you can just download directly from OpenStreetMap itself. And there's really kind of no caveats to using it.
And then finally kind of the last piece of the puzzle. Is finding a really high quality set of international boundaries to use. After doing a lot of looking there was one source that kind of stood out above the rest. Which was the US Department of State's Humanitarian Information Unit. Which publishes a data set called Large Scale International Boundaries.
And these are a super detailed set of lines. That over the course of decades have documented international boundary lines. Super high resolution. Really well documented. The only kind of limitation is this is a very US centric point of view. And so it reflects US policy rather than something like the UN's view.
And so if this would be adapted for international or humanitarian use. There would need to be some modifications made to it. In order to reflect that kind of view. And so to find a source that depicted what kind of the UN views as their suggested source.
They do publish a map called the Clear Map. Which is their recommendation for representation of international boundary lines. If you search for UN Clear Map this is what you'd get. It's quite limited. They don't really want you to have a high resolution version of it. Because it isn't supposed to infer any kind of endorsement on their behalf.
But they do have simplified boundaries. Like simplified polygons you can download. And it is a helpful reference just to see kind of a comparison. Finally having just good names, a good database of names to use. The UN Statistics Division publishes their M49 standard.
Which has a highly standardized list of names and codes to use. Which are ISO, which uses ISO codes, ISO alpha 2s and alpha 3s. And if you go directly to their website you can just download it all as a CSV file. Which ends up being quite convenient. The only limitation about this one is it doesn't quite have disputed areas.
And so those had to be added in manually. Since we're here in Kosovo this unfortunately is a disputed or autonomous region when viewed by the UN. So the dataset that I produced has three different versions to represent all three different versions and permutations.
There's an all version which just shows any disputed or autonomous region separately. There's an international version which adheres as closely as possible to the UN clear map. And then you can see onto the right this is actually what kind of the clear map shows. So there is kind of a suggestion to show some indication of autonomous zones.
And then there's a US version which just displays the Department of State boundaries as per the original dataset. And so then finally putting this all together what this kind of looks like. This entire pipeline is run from scratch every time it's generated. So that it accounts for all the changes in every dataset every time it gets run.
This starts first with land coastlines. Just as kind of a starting point. On top of this the Department of State large scale international boundaries are put on top of it. But since these are only lines and not polygons they do need a little bit of a supplement layer to group islands and to extend out into coastlines.
And then once these are dissolved together this forms a nice clipping line that then forms actual polygons that you can use. And so this polygon serves as a clipping layer. So you can then take individual boundaries, put them through the algorithm, have them extended and then clipped to the clipping layer.
And then the final result is a super high resolution global coverage of administrative boundaries that go from level zero all the way to four. And so this entire thing is quite efficient. Runs in about three hours. So there's nothing really limiting it from being run on a daily basis.
The only kind of limiting factor is the fact that most of the sources that make this up need a little bit of manual data cleaning or manual massaging in order to make them workable. And so I've been working with the original data producers to help build some better APIs and automation so that their own host datasets can be more machine readable and accessible to just the public at large.
I've also created just a web dashboard that really quite easily just takes all this really complex data and then just you have a URL that you can then just click and then kind of explore this with. And this has been a good way to kind of just share this with non-technical users.
And so essentially this is the project kind of in nutshell. It's a kind of an effort to improve kind of a core foundational datasets that I see get continuous use in a community that I've put a lot of work into. And this has kind of been my contribution to help improve just the quality of the foundations
to make it possible for other people to build higher quality, more feature rich applications on top of. So that's the project in a nutshell.