3.6 million points to polygons
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 295 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/43577 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
00:00
MIDIVotingLetterpress printingInternetworkingComputer virusBitComputer animation
00:18
Point (geometry)TwitterLecture/Conference
00:51
Information technology consultingType theoryTwitterLecture/ConferenceComputer animation
01:10
Content (media)Presentation of a groupVotingWorkstation <Musikinstrument>AreaGoodness of fitPoint (geometry)Attribute grammarLevel (video gaming)AreaCodePresentation of a groupConvex hullType theoryPolygonBitMusical ensembleoutputMapping1 (number)Revision controlIterationAddress spaceWeb portalResultantSlide ruleVotingTwitterWeb pageProjective planeOpen setSet (mathematics)FeedbackShared memorySimilarity (geometry)Dependent and independent variablesSystem callSpring (hydrology)Tape driveDesign by contractRow (database)Content (media)State of matterMultiplication sign
06:42
Plane (geometry)Menu (computing)DiagramExecution unitAreaCartesian coordinate systemSoftware testingMereologyPolygonThresholding (image processing)1 (number)Point (geometry)Multiplication signVoronoi diagramWave packetHydraulic jumpVotingRun time (program lifecycle phase)Query languageTwitterDrop (liquid)Open setOperator (mathematics)Java appletDot productState of matterQueue (abstract data type)Office suiteLevel (video gaming)InternetworkingPunktgruppeStapeldateiGreen's functionProcess (computing)Physical lawProjective planeNoise (electronics)
12:14
Group actionAreaLocal GroupNumeral (linguistics)Stack (abstract data type)DialectOpen sourceVisualization (computer graphics)Different (Kate Ryan album)AreaBitIdentical particlesPlanningPoint (geometry)Multiplication signVotingRaster graphicsSocial classGreen's functionLevel (video gaming)Graph (mathematics)Office suiteGrass (card game)Water vaporWindow functionGroup actionDisk read-and-write headTask (computing)Grand Unified TheoryResultantOnline helpOpen setMappingDemo (music)Address spaceSelf-organizationStack (abstract data type)PolygonDependent and independent variablesSoftware testingError messageObservational studyTouchscreenArithmetic meanMessage passingCurvatureWorkstation <Musikinstrument>
17:57
DistanceMappingFormal languageDisk read-and-write headSpeech synthesisSimilarity (geometry)Scaling (geometry)Fitness functionRule of inferenceType theoryVector spaceEuklidischer RaumQuicksortConservation lawRaster graphicsComputer animationLecture/Conference
20:20
Cartesian coordinate systemVotingAreaPoint (geometry)Water vaporOptimization problemMiniDiscPartial derivativeClosed setBoundary value problemMultiplication signAddress spaceLine (geometry)BitCatastrophismProper mapNP-hardRight anglePerturbation theoryLecture/ConferenceMeeting/Interview
Transcript: English(auto-generated)
00:07
Hello People in the crowded room and on the internets. Hello Really nice that you're here, even though you're maybe heading to the gala and it's a bit small but
00:22
I'm here to talk about points like I just said I thought when I Drafted the they talked at 3.6 million points sounds fancy But this morning when the OmniSci guys were talking about 100 billion points, then I thought that
00:41
Well, let's like consider it billions and it sounds better yeah, but what I learned So what I do on Twitter, I look like that in reality this reality I work at a company called gispo. We're we're nine people based in Helsinki
01:01
Helsinki Finland and we do phosphor-g consulting Look, so all kind of stuff that you see here. We do that that type of thing. I Post stuff on on Twitter quite a lot so you can find me there and also this Hobby project type of thing that I'm gonna talk here was there and I got a lot of good feedback and good tips
01:25
from there, so Thank you for the people who contributed in this But yeah presentation content, I'm gonna keep it pretty simple. So first I'm gonna present what's it about then I'm gonna present how I
01:41
Executed all of it and then I'm gonna like draw some fancy conclusions out of it so the problem or I don't know if it was it was an actual problem, but to start with there was elections parliamentary elections in Finland this spring and
02:00
Because I make maps I work with GIS I wanted to make election maps because election maps are really like They're fun to do informative and and yeah election maps so Usually the election results are shown on a on a municipal level there are 311 municipalities
02:24
in Finland and Some of them are pretty big But the smallest area is a voting district So they're kind of like They're a bit like postal code areas So they're kind of like areas, but they're not and they have stupid things in them. They might not be continuous
02:48
but I wanted to make maps with those areas, but Those areas Aren't available like few cities Share them openly but on a on a country level there. They're not there. So I had to create them
03:06
and How I did that was that There is point data. So so there are the address points from the whole country are as open data So I had to convert those. I
03:22
In other countries with with for example address points and postal code areas, so so those points to So those points to areas so this is the Very nice open data page that we have in Finland and there is it's available to all
03:42
One of those as in all countries all over the world. You can't never have enough open data portals It's like you always need a few more But this is like the official one and it has the country poets from the whole country Sorry, the address points from the whole country. So I I downloaded them. I have been using them also before
04:05
So this is like the slide background, but those are actual points and and and this is Finland So that that's yeah exactly. That's Russia That's the Baltic Sea and as you can see
04:21
It's not really like the quality of the I knew already beforehand that the quality of the data is not optimal So you had to do something with it but the funny thing is that it's it's a well-known data set for having having certain type of problems because the data originates from the municipalities and
04:43
And some municipalities have more high quality data than others and then when I colored and examined them a bit it seemed like some had really high quality points and and then it might be that one music municipality has these and
05:01
The data there is some Originally some manually input so the coordinates might be the wrong way around so all kinds of typical Bad Bad quality GIS data problems that you might encounter so but there was an attribute with the world which voting area it belongs to so I
05:23
Loaded them to post GIS like already saw before Dragged them QGIS colored them by voting area then I was looking at the points and I was thinking yeah Yeah, this is doable. This is this looks looks promising This is I can manage I looked it on the country level
05:42
You can't see even the Russian points anymore. I was still thinking yeah. Yeah, I can I can distinguish the areas there So it's I'm gonna manage so I was using my most typical tools that I think for majority, this is the ones that you use also the data was in post GIS and I was I
06:05
Was visualizing it with QGIS Can you can you see anything not really okay? It's more about quality data this time and polygons. So what I did first is that I just drew
06:22
Convex hulls for all the polygons, so it looked really bad and This is slightly more cleaned version So my first iteration of cleaning the data was that I I joined it with municipality polygons and compared the attributes in there and clean the ones which didn't
06:43
Drop inside a minister because I knew that the municipality polygons were good quality So that was a one way to to clean the data at first, but still it didn't look quite good I tried Concave hulls, so this is hail sinky. It looks
07:01
Again, quite okay, but then I thought that I want a continuous area this is a Snapping operation from hell to get this get all of those gaps away from there So I was like trying to figure out what's how could I do that?
07:22
Then I tried a different approach I Thought that maybe I'll do a grid and then select like the most typical voting area in each grid so for example here there might be ten points and nine of them are one one voting area and then the others are something else so
07:42
Yeah, but it it didn't work out either. It was it was a stupid idea But then Some of you might already been thinking at this stage, why didn't you start with Voronoi's? It might have might have been the sensible thing from the very start so Voronoi's are these areas that
08:04
When points are here, it's like the most I Would describe it in a way that it's the area that the each point owns There's actually an animation I did that those are Train train traffic in Finland, and I tried like an animated Voronoi earlier
08:23
But it was just like a nice thing to put in here doesn't Isn't in any way related to the voting areas as we wanted to put it in Okay Again looks messy, but those are Voronoi's When I started to experiment with the Voronoi's
08:44
first I did like a quick test with the QGIS Voronoi processing it works pretty okay for For a hundred points and a thousand points and even for ten thousand. Yeah, it could be done
09:01
But then like above that no surprisingly the execution time is unknown but then surprisingly Also when I tried it in in post GIS, I noticed that the execution time grew exponentially When when the amount of points?
09:20
Grew so this is like it looks like I have the whole country now in Voronoi's, but actually it's made in patches, so It might be hundred here hundred here, so it isn't still continuous Yeah, so I was kind of stuck because I I think I once waited like eight hours to
09:43
For the query to finish, but it didn't deny I didn't find any kind of Solution from the internet, but then I had like a talk about this in Finland in like an OSG I'll meet meet up and then I got a tip from an older colleague
10:03
that more experienced that She now have you tried it yeah Yuka who was also in the conference he asked me he said to me that yeah, yeah I already did that like a few years ago He asked
10:21
Have you have you tried that open jump and I had heard about open jump Maybe somebody came into the talk because they saw open jump in there in the title But this was the first time I opened open jump. So I have like my all open jump experience is this Hobby project, so then I went I loaded open jump
10:43
Well, basically it's like a desktop GIS application It's I also noticed that it's on the OSG alive. So we're Testing testing it out. But basically I was checking it out. What can you do to do with it? And I was reading this is the same stuff that I do with QGIS. Why would this work? Well, basically
11:05
It's Java based application I could give it a lot of RAM I gave it a lot of RAM And then in less than 10 minutes, I have the polygons okay, so then after this I clipped them and I thought yeah now now we're close but
11:27
Actually It wasn't so clean after all So so this was actually the maybe the most painful part to get rid of the these Like because there could be some really small areas and some really big areas
11:41
So there couldn't be just like a threshold under which I could delete the small ones because in in cities there could be small areas and but I knew that there would be have to be like single polygons and I I did some This was also a part like all of these parts are
12:03
Something that I post this on Twitter then someone comments that you should you should try this out and you should have done it This way and this was also one so I I wrote some SQL because this is a cool tech conference I have a black background with a green SQL, but I'm not gonna bore you with it
12:21
I'm gonna explain what I did in post GIS. So For the messy polygons, I used a window function to sort the pieces like all the all the mess by area And then if it's if it wasn't the biggest piece in the group Then I selected the closest
12:43
piece that it's inside or it's close to I Used a lateral join and then gave the ID of that Closest big area to it and did a union
13:03
might sound confusing but Here's what I end up ended up with so This is like a like a good screenshot I they weren't so it didn't work out on all areas But I was pretty pleased how it turned out because this is I feel like this was one of those
13:20
Issues that it sounds like much easier task to clean the because of course I couldn't clean those manually So I was pretty happy with this. So I had had the areas So I did some maps Those are different political parties based on their popularity
13:41
That's a cartogram with the different parties popularity these this is like a QGIS Visualization demo now Then I did with a QGIS Atlas I went through all the regions in in Finland and
14:01
Here's actually I think in this kind of visual visualizations. It really brought out The use of of having really small and detailed area so you could distinguish stuff inside cities that Previously wasn't possible because I only had the municipalities as the most detailed level
14:22
And I also did a graph Experimental post GIS to open office Visualization where each There are like there are two thousand of the area two thousand voting areas and it basically shows the popularity
14:42
In each so basically if it's like a really flat graph it means that throughout the country. It's pretty popular And if it's like really steep it means that it there are a few areas Where it's really popular and then a lot of areas where nobody votes for them that party
15:00
I'm not gonna bore you with Finnish politics, and what's what? and this is another example of like a liberal liberal and a conservative Politician comparing those bit inside Helsinki you can't really see the color as well, but a Distinguish is a different kind of things
15:22
So what I learned in a nutshell Go beyond your normal stack. I pounded my head to the wall with with post GIS and QGIS for a pretty long time I tried something new and bam As for help it's pretty useful and and that's like a
15:42
There hasn't been enough community hype in the conference so far So I'm gonna like say that that's the whole idea of the open community that you ask for help You share your results and then somebody else's it requires some like gut for to share your wrong or non ready results to get the feedback, but it's it's worth it and
16:04
then I Got this conclusion when I was writing the slides, but I think it's the best conclusion so far so open data Which the address points are should always have a working upstream, so I noticed a lot of mistakes in the data but so what
16:22
Like it's impossible for me to I think there are a lot of organizations and companies in Finland who have reported the same mistakes But it isn't like it doesn't go up in the original data So I think there are a lot of people who will do the same corrections again and again and again So if somebody's a data owner publishing data
16:41
Plan it ahead that there's a way for you to Accept corrections to your date. That's really vital All right. Thanks a lot for listening Okay, we have some time for questions. Please wait for the microphone to pose your question
17:02
Oh such a such a large audience. Ah finally Just a question is it working? Yeah, so did you consider ever like do some clustering like k-means clustering on the points directly?
17:23
Yeah move to rasters and then to use I don't know class area from grass and remove the clumps and That's something I didn't consider No, it's an interesting approach, but I
17:40
Know I it wasn't like moving to raster or something. It wasn't wasn't but I Tried out quite a quite many things But actually if someone has a good idea why why the post GIS water noise? So like why it works the way it works. I'm interested in hearing
18:04
Having a run into the same issues like you I can suggest or Euclidean distance and rasters Helping making this one also on a large scale. This is working also performant Okay, and compared to the vector approach. Okay. Thanks
18:24
questions or comments Take your time. Don't be scared The room is full, but everyone is welcome. There are mostly things here. So that's why they're so
18:41
Yeah, yeah, that's fine that's sort of fun Yeah, so the question is about yeah, the question is about Finnish politics and Finnish political geography in a nutshell
19:09
From these maps. I learned well some pretty obvious stuff for example Finland is as two official languages Finland and Finnish and Swedish and
19:21
Swedish parties like you can expect them to vote for the like the Swedish speaking parties or a party That's not surprising but more interesting is the like in many other countries especially Europe are the right-wing conservative type of parties, where are people voting those more and
19:41
Well, I didn't answer get an answer for why but where at least I got I got some idea for that That that was interesting to me at least Where all around the country, yeah, it's it's not like Rural rural areas or or or cities, but it's all around the country pretty pre evenly. Yeah
20:13
Did you consider in Estonia I did kind of similar thing but on the opposite side
20:21
Application for the local governments where they can Draw the voting districts and things one consideration was to use the catastrophe area So from the addresses to disasters and from from the end of the areas. Yeah. Yeah at some point. I already had like 40 gigabytes of
20:41
Catastre data on my disk and I did some experiments with that too, but and and Catastre data would have produced like nicer boundaries than than boron eyes, but It was on my list, but it didn't make it to the finals We still have some time
21:05
to continue on your Cluster I think so. I've also been doing this kind of stuff For the Netherlands You're a bit more addresses than we do
21:22
The problem with cadastre is Partials is that you get gaps where there are no addresses and then you need to solve that So it's you just move your problem to another area Yeah, and I wanted to cut them with hard lines
21:42
So you define a hard line of a road or water or whatever then I discovered that there are no Hard lines which are single like multi-lane road. So then my problem now is how to create proper hard lines if anyone knows how to make hard lines this is like a Problem that I think quite quite many people have been struggling with and try to find an optimal solution
22:05
All right, we'll close it there The session will resume at five o'clock. Thank you very much to be. Thank you