We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Not too big, not too small: open source geospatial units that are just right

00:00

Formal Metadata

Title
Not too big, not too small: open source geospatial units that are just right
Title of Series
Number of Parts
351
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2022

Content Metadata

Subject Area
Genre
Abstract
Publicly available data tends to be spatially aggregated to administrative units, limiting the feasibility of nuanced analyses that reflect the natural state of communities and provide actionable insights for a wide range of stakeholders. While higher resolution data is generally available within government agencies, access for external researchers is limited due to well-established privacy concerns. Inspired by our own use case of developing a regional quality of life metric for neighborhoods in Denmark, our team at Aalborg University’s Department of the Built Environment, in collaboration with data.org’s Growth and Recovery Challenge, and Data Clinic, set out to develop and open source not only foundational granular spatial units and data that adhere to privacy laws, but also the accompanying methodology that has the potential for broad applicability in other countries. In this presentation, we will demonstrate the methodology’s generalizability, particularly across common European land use and geographical features, and show how the resulting high-resolution shape files and community data can become crucial tools for government decision-makers, community organizations, and researchers in their efforts to increase transparency and engage in practical, actionable research. Focused initially on our Denmark use case, we algorithmically create spatial units with minimum household and population counts from country-wide hectare cell level data. Our approach uses data on road networks and administrative boundaries to create socially meaningful component polygons. This is achieved by developing tools based on already existing open source packages available in R and Python. The hectare cells are then mapped onto the polygons and clustered using the max-p regionalization algorithm with constraints on the minimum population and household counts to arrive at the final set of spatial units. To improve the accessibility of this data to not just researchers but also administrative decision-makers, community organizations, and the general public, we are developing an online tool to explore and visualize indicators within the resulting fine-grained regions such as disposable income, educational level, housing prices, migration rates, distances to public institutions, and labor market attachments in Denmark. Regional inequality in Denmark has increased over time, and with the help of this tool, we hope to provide the ability to study these key metrics both within and across municipal regions. In the development of the tool, we prioritize user feedback and common use cases to ensure both applicability and longevity. This project has been developed with an open-source mindset by: 1) creating flexible open data resources that can adapt to a wide range of public use cases 2) open sourcing the methodology for use in other countries/regions and 3) enabling the use of existing open data and tools such as Open Street Maps, R and Python in the pipeline. We firmly believe that the project has the potential to improve knowledge sharing and collaboration between GIS experts, decision-makers, researchers and the general public not only in Denmark, but also in Europe and beyond.
Keywords
202
Thumbnail
1:16:05
226
242
Maß <Mathematik>Open sourceBeer steinStatisticsConstraint (mathematics)Cellular automatonInformation privacyGroup actionBoundary value problemDecision theoryComputer networkPolygonScripting languageProcess (computing)Overlay-NetzConstraint (mathematics)Open sourceBoundary value problemAlgorithmDifferent (Kate Ryan album)Information privacyProjective planeScripting languageMoment (mathematics)Price indexMaxima and minimaGoodness of fitCASE <Informatik>PhysicalismVideo gameExecution unitCartesian coordinate systemLevel (video gaming)Process (computing)Cellular automatonCategory of beingPolygonRange (statistics)Decision theoryDegree (graph theory)BuildingOpen setPoint (geometry)SoftwareRight angleDialectMultiplication signStatisticsSelf-organizationAreaComputer animation
Pressure volume diagramWebsiteGroup actionMachine visionStandard deviationMathematical analysisSimilarity (geometry)FeedbackSigma-algebraBuildingBeer steinWindowAreaTrailMathematical analysisFeedbackVector potentialWeb 2.0Structural loadPrototypePerspective (visual)Level (video gaming)Moment (mathematics)BitOpen setMultiplication signProcess (computing)Price indexKey (cryptography)Scripting languageLink (knot theory)Standard deviationSoftware repositoryNeighbourhood (graph theory)Computer animation
Transcript: English(auto-generated)
Hi. Thanks for staying so late. It's almost social dinnertime. So I'm presenting a very brief in the use case and applications category. So not too big, not too small, open source geospatial units that are just right. So I'm here with my colleague, Elise, and Aaron as well.
And this is a project where basically our aim is to build granular open source geospatial units that reflect physical boundaries and adhere to privacy constraints. So basically what we have, and this is in Denmark. We are from Aalborg University in Denmark. So basically what we have in Denmark at the moment is that
we have municipalities which have very good data. And this is released and publicly available. Then we have grid cells, which are very granular, 100 by 100 meter areas, which also has data, but it's not publicly available. And this is due to privacy constraints. And we basically want to go on the sub-municipal level to
provide a more detailed data overview of Denmark on a range of economic indicators and quality of life indexes. So we need to do something to kind of map what we have in the grid cells over to all of Denmark. So also just to, just an overview of who is actually
collaborating this project. So it's us, Aalborg University Research Institution. And then we have Data Clinic, which is a pro bono data for good initiative of the Two Sigma Financial Company. Then we have Statistics Denmark, which sits on all the data. And then we have Data.org, which is funding the whole
thing. Data.org is also a data for good initiative started by MasterCard and the Rockefeller Foundation. So these are kind of all the parties involved. And then the stakeholders, Danish municipalities, organizations, and different decision makers, and also researchers to some degree. That's us.
So how do we actually do this? How do we arrive at something which is on the sub-municipal level, which are like meaningful polygons where we can put in some data and show some useful stuff to both municipalities and the general user? So our starting point is basically we have the open street map road networks here on the left. This is an example of a municipality.
And then the process of polygonizing all of that, we have used existing and custom script packages in both R and Python that we have developed ourselves and put up. And then we basically end up with something like this to start with. And this is good. We had some issues. Obviously, we coast, islands.
Denmark is a complicated country geographically, so it's not so easy to work with. But next, we need to kind of put them into these grid cells, because that's where we have the data. So our next step is to make an overlay. And here we have the grid overlay. And it also looks good, but the problem is that the privacy constraints of the data in Denmark requires that
we have a minimum of 50 households and 100 people. So we need to go through an additional clustering step. And this is basically what we do here on the right-hand side. And among other things, we use the Max-P originalization algorithm and also some custom clustering
scripts that we have developed in R. And basically, in this case, it adheres to the privacy constraints. So our end goal is to actually have this web tool. This is a very, very early prototype of what it would look like. It's set to release in 2023, hopefully to be used by
municipalities. Loads of indicators about household income, housing prices, distances to education, all these things that they might find very interesting. And this is going to go back 30 years in time. So we are now in the process of extending the data window from 1990 to 2020.
And also, from an academic perspective, we really want to do some spatial longitudinal analysis on this, because it will be much more detailed than what we have already. And then, of course, integrate feedback from key stakeholders. And what we really like about this, this potentially has the potential for being applied in other countries or
elsewhere where the OpenStreetMap is available, because this is basically just a custom script that is able to create these neighborhood or areas which are very granular. And also, because at the moment in Denmark, we don't really have a standard of analysis for very small areas, similar to, for example, the US Census Tract.
So we really hope that what we're doing here can help develop this in the future for Denmark. So yeah, you're more than welcome to also visit our GitHub here, where you have a little bit more documentation. And then you also have links to the repos with the relevant stuff in both R and Python.
So yeah, thank you very much. Thank you.