We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

3.6 million points to polygons

00:00

Formal Metadata

Title
3.6 million points to polygons
Subtitle
lessons learned while generating voting districts with QGIS, PostGIS and OpenJUMP?
Title of Series
Number of Parts
295
Author
Contributors
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Parliamentary elections were held in Finland in April 2019 and to better visualize the results, I went on a quest to generate polygons for each of the 1937 voting districts. Voting district polygons are not open data except for few major cities, but address points for buildings are open data in Finland and they also have information about which voting district each address belongs to. The talk aims to give examples and tips of how to work with bigger datasets with OSGeo tools and how to deal with errors and uncertainties in your data analysis.
Keywords
MIDIVotingLetterpress printingInternetworkingComputer virusBitComputer animation
Point (geometry)TwitterLecture/Conference
Information technology consultingType theoryTwitterLecture/ConferenceComputer animation
Content (media)Presentation of a groupVotingWorkstation <Musikinstrument>AreaGoodness of fitPoint (geometry)Attribute grammarLevel (video gaming)AreaCodePresentation of a groupConvex hullType theoryPolygonBitMusical ensembleoutputMapping1 (number)Revision controlIterationAddress spaceWeb portalResultantSlide ruleVotingTwitterWeb pageProjective planeOpen setSet (mathematics)FeedbackShared memorySimilarity (geometry)Dependent and independent variablesSystem callSpring (hydrology)Tape driveDesign by contractRow (database)Content (media)State of matterMultiplication sign
Plane (geometry)Menu (computing)DiagramExecution unitAreaCartesian coordinate systemSoftware testingMereologyPolygonThresholding (image processing)1 (number)Point (geometry)Multiplication signVoronoi diagramWave packetHydraulic jumpVotingRun time (program lifecycle phase)Query languageTwitterDrop (liquid)Open setOperator (mathematics)Java appletDot productState of matterQueue (abstract data type)Office suiteLevel (video gaming)InternetworkingPunktgruppeStapeldateiGreen's functionProcess (computing)Physical lawProjective planeNoise (electronics)
Group actionAreaLocal GroupNumeral (linguistics)Stack (abstract data type)DialectOpen sourceVisualization (computer graphics)Different (Kate Ryan album)AreaBitIdentical particlesPlanningPoint (geometry)Multiplication signVotingRaster graphicsSocial classGreen's functionLevel (video gaming)Graph (mathematics)Office suiteGrass (card game)Water vaporWindow functionGroup actionDisk read-and-write headTask (computing)Grand Unified TheoryResultantOnline helpOpen setMappingDemo (music)Address spaceSelf-organizationStack (abstract data type)PolygonDependent and independent variablesSoftware testingError messageObservational studyTouchscreenArithmetic meanMessage passingCurvatureWorkstation <Musikinstrument>
DistanceMappingFormal languageDisk read-and-write headSpeech synthesisSimilarity (geometry)Scaling (geometry)Fitness functionRule of inferenceType theoryVector spaceEuklidischer RaumQuicksortConservation lawRaster graphicsComputer animationLecture/Conference
Cartesian coordinate systemVotingAreaPoint (geometry)Water vaporOptimization problemMiniDiscPartial derivativeClosed setBoundary value problemMultiplication signAddress spaceLine (geometry)BitCatastrophismProper mapNP-hardRight anglePerturbation theoryLecture/ConferenceMeeting/Interview
Transcript: English(auto-generated)
Hello People in the crowded room and on the internets. Hello Really nice that you're here, even though you're maybe heading to the gala and it's a bit small but
I'm here to talk about points like I just said I thought when I Drafted the they talked at 3.6 million points sounds fancy But this morning when the OmniSci guys were talking about 100 billion points, then I thought that
Well, let's like consider it billions and it sounds better yeah, but what I learned So what I do on Twitter, I look like that in reality this reality I work at a company called gispo. We're we're nine people based in Helsinki
Helsinki Finland and we do phosphor-g consulting Look, so all kind of stuff that you see here. We do that that type of thing. I Post stuff on on Twitter quite a lot so you can find me there and also this Hobby project type of thing that I'm gonna talk here was there and I got a lot of good feedback and good tips
from there, so Thank you for the people who contributed in this But yeah presentation content, I'm gonna keep it pretty simple. So first I'm gonna present what's it about then I'm gonna present how I
Executed all of it and then I'm gonna like draw some fancy conclusions out of it so the problem or I don't know if it was it was an actual problem, but to start with there was elections parliamentary elections in Finland this spring and
Because I make maps I work with GIS I wanted to make election maps because election maps are really like They're fun to do informative and and yeah election maps so Usually the election results are shown on a on a municipal level there are 311 municipalities
in Finland and Some of them are pretty big But the smallest area is a voting district So they're kind of like They're a bit like postal code areas So they're kind of like areas, but they're not and they have stupid things in them. They might not be continuous
but I wanted to make maps with those areas, but Those areas Aren't available like few cities Share them openly but on a on a country level there. They're not there. So I had to create them
and How I did that was that There is point data. So so there are the address points from the whole country are as open data So I had to convert those. I
In other countries with with for example address points and postal code areas, so so those points to So those points to areas so this is the Very nice open data page that we have in Finland and there is it's available to all
One of those as in all countries all over the world. You can't never have enough open data portals It's like you always need a few more But this is like the official one and it has the country poets from the whole country Sorry, the address points from the whole country. So I I downloaded them. I have been using them also before
So this is like the slide background, but those are actual points and and and this is Finland So that that's yeah exactly. That's Russia That's the Baltic Sea and as you can see
It's not really like the quality of the I knew already beforehand that the quality of the data is not optimal So you had to do something with it but the funny thing is that it's it's a well-known data set for having having certain type of problems because the data originates from the municipalities and
And some municipalities have more high quality data than others and then when I colored and examined them a bit it seemed like some had really high quality points and and then it might be that one music municipality has these and
The data there is some Originally some manually input so the coordinates might be the wrong way around so all kinds of typical Bad Bad quality GIS data problems that you might encounter so but there was an attribute with the world which voting area it belongs to so I
Loaded them to post GIS like already saw before Dragged them QGIS colored them by voting area then I was looking at the points and I was thinking yeah Yeah, this is doable. This is this looks looks promising This is I can manage I looked it on the country level
You can't see even the Russian points anymore. I was still thinking yeah. Yeah, I can I can distinguish the areas there So it's I'm gonna manage so I was using my most typical tools that I think for majority, this is the ones that you use also the data was in post GIS and I was I
Was visualizing it with QGIS Can you can you see anything not really okay? It's more about quality data this time and polygons. So what I did first is that I just drew
Convex hulls for all the polygons, so it looked really bad and This is slightly more cleaned version So my first iteration of cleaning the data was that I I joined it with municipality polygons and compared the attributes in there and clean the ones which didn't
Drop inside a minister because I knew that the municipality polygons were good quality So that was a one way to to clean the data at first, but still it didn't look quite good I tried Concave hulls, so this is hail sinky. It looks
Again, quite okay, but then I thought that I want a continuous area this is a Snapping operation from hell to get this get all of those gaps away from there So I was like trying to figure out what's how could I do that?
Then I tried a different approach I Thought that maybe I'll do a grid and then select like the most typical voting area in each grid so for example here there might be ten points and nine of them are one one voting area and then the others are something else so
Yeah, but it it didn't work out either. It was it was a stupid idea But then Some of you might already been thinking at this stage, why didn't you start with Voronoi's? It might have might have been the sensible thing from the very start so Voronoi's are these areas that
When points are here, it's like the most I Would describe it in a way that it's the area that the each point owns There's actually an animation I did that those are Train train traffic in Finland, and I tried like an animated Voronoi earlier
But it was just like a nice thing to put in here doesn't Isn't in any way related to the voting areas as we wanted to put it in Okay Again looks messy, but those are Voronoi's When I started to experiment with the Voronoi's
first I did like a quick test with the QGIS Voronoi processing it works pretty okay for For a hundred points and a thousand points and even for ten thousand. Yeah, it could be done
But then like above that no surprisingly the execution time is unknown but then surprisingly Also when I tried it in in post GIS, I noticed that the execution time grew exponentially When when the amount of points?
Grew so this is like it looks like I have the whole country now in Voronoi's, but actually it's made in patches, so It might be hundred here hundred here, so it isn't still continuous Yeah, so I was kind of stuck because I I think I once waited like eight hours to
For the query to finish, but it didn't deny I didn't find any kind of Solution from the internet, but then I had like a talk about this in Finland in like an OSG I'll meet meet up and then I got a tip from an older colleague
that more experienced that She now have you tried it yeah Yuka who was also in the conference he asked me he said to me that yeah, yeah I already did that like a few years ago He asked
Have you have you tried that open jump and I had heard about open jump Maybe somebody came into the talk because they saw open jump in there in the title But this was the first time I opened open jump. So I have like my all open jump experience is this Hobby project, so then I went I loaded open jump
Well, basically it's like a desktop GIS application It's I also noticed that it's on the OSG alive. So we're Testing testing it out. But basically I was checking it out. What can you do to do with it? And I was reading this is the same stuff that I do with QGIS. Why would this work? Well, basically
It's Java based application I could give it a lot of RAM I gave it a lot of RAM And then in less than 10 minutes, I have the polygons okay, so then after this I clipped them and I thought yeah now now we're close but
Actually It wasn't so clean after all So so this was actually the maybe the most painful part to get rid of the these Like because there could be some really small areas and some really big areas
So there couldn't be just like a threshold under which I could delete the small ones because in in cities there could be small areas and but I knew that there would be have to be like single polygons and I I did some This was also a part like all of these parts are
Something that I post this on Twitter then someone comments that you should you should try this out and you should have done it This way and this was also one so I I wrote some SQL because this is a cool tech conference I have a black background with a green SQL, but I'm not gonna bore you with it
I'm gonna explain what I did in post GIS. So For the messy polygons, I used a window function to sort the pieces like all the all the mess by area And then if it's if it wasn't the biggest piece in the group Then I selected the closest
piece that it's inside or it's close to I Used a lateral join and then gave the ID of that Closest big area to it and did a union
might sound confusing but Here's what I end up ended up with so This is like a like a good screenshot I they weren't so it didn't work out on all areas But I was pretty pleased how it turned out because this is I feel like this was one of those
Issues that it sounds like much easier task to clean the because of course I couldn't clean those manually So I was pretty happy with this. So I had had the areas So I did some maps Those are different political parties based on their popularity
That's a cartogram with the different parties popularity these this is like a QGIS Visualization demo now Then I did with a QGIS Atlas I went through all the regions in in Finland and
Here's actually I think in this kind of visual visualizations. It really brought out The use of of having really small and detailed area so you could distinguish stuff inside cities that Previously wasn't possible because I only had the municipalities as the most detailed level
And I also did a graph Experimental post GIS to open office Visualization where each There are like there are two thousand of the area two thousand voting areas and it basically shows the popularity
In each so basically if it's like a really flat graph it means that throughout the country. It's pretty popular And if it's like really steep it means that it there are a few areas Where it's really popular and then a lot of areas where nobody votes for them that party
I'm not gonna bore you with Finnish politics, and what's what? and this is another example of like a liberal liberal and a conservative Politician comparing those bit inside Helsinki you can't really see the color as well, but a Distinguish is a different kind of things
So what I learned in a nutshell Go beyond your normal stack. I pounded my head to the wall with with post GIS and QGIS for a pretty long time I tried something new and bam As for help it's pretty useful and and that's like a
There hasn't been enough community hype in the conference so far So I'm gonna like say that that's the whole idea of the open community that you ask for help You share your results and then somebody else's it requires some like gut for to share your wrong or non ready results to get the feedback, but it's it's worth it and
then I Got this conclusion when I was writing the slides, but I think it's the best conclusion so far so open data Which the address points are should always have a working upstream, so I noticed a lot of mistakes in the data but so what
Like it's impossible for me to I think there are a lot of organizations and companies in Finland who have reported the same mistakes But it isn't like it doesn't go up in the original data So I think there are a lot of people who will do the same corrections again and again and again So if somebody's a data owner publishing data
Plan it ahead that there's a way for you to Accept corrections to your date. That's really vital All right. Thanks a lot for listening Okay, we have some time for questions. Please wait for the microphone to pose your question
Oh such a such a large audience. Ah finally Just a question is it working? Yeah, so did you consider ever like do some clustering like k-means clustering on the points directly?
Yeah move to rasters and then to use I don't know class area from grass and remove the clumps and That's something I didn't consider No, it's an interesting approach, but I
Know I it wasn't like moving to raster or something. It wasn't wasn't but I Tried out quite a quite many things But actually if someone has a good idea why why the post GIS water noise? So like why it works the way it works. I'm interested in hearing
Having a run into the same issues like you I can suggest or Euclidean distance and rasters Helping making this one also on a large scale. This is working also performant Okay, and compared to the vector approach. Okay. Thanks
questions or comments Take your time. Don't be scared The room is full, but everyone is welcome. There are mostly things here. So that's why they're so
Yeah, yeah, that's fine that's sort of fun Yeah, so the question is about yeah, the question is about Finnish politics and Finnish political geography in a nutshell
From these maps. I learned well some pretty obvious stuff for example Finland is as two official languages Finland and Finnish and Swedish and
Swedish parties like you can expect them to vote for the like the Swedish speaking parties or a party That's not surprising but more interesting is the like in many other countries especially Europe are the right-wing conservative type of parties, where are people voting those more and
Well, I didn't answer get an answer for why but where at least I got I got some idea for that That that was interesting to me at least Where all around the country, yeah, it's it's not like Rural rural areas or or or cities, but it's all around the country pretty pre evenly. Yeah
Did you consider in Estonia I did kind of similar thing but on the opposite side
Application for the local governments where they can Draw the voting districts and things one consideration was to use the catastrophe area So from the addresses to disasters and from from the end of the areas. Yeah. Yeah at some point. I already had like 40 gigabytes of
Catastre data on my disk and I did some experiments with that too, but and and Catastre data would have produced like nicer boundaries than than boron eyes, but It was on my list, but it didn't make it to the finals We still have some time
to continue on your Cluster I think so. I've also been doing this kind of stuff For the Netherlands You're a bit more addresses than we do
The problem with cadastre is Partials is that you get gaps where there are no addresses and then you need to solve that So it's you just move your problem to another area Yeah, and I wanted to cut them with hard lines
So you define a hard line of a road or water or whatever then I discovered that there are no Hard lines which are single like multi-lane road. So then my problem now is how to create proper hard lines if anyone knows how to make hard lines this is like a Problem that I think quite quite many people have been struggling with and try to find an optimal solution
All right, we'll close it there The session will resume at five o'clock. Thank you very much to be. Thank you