data mountains - turn your data into mountains!
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Untertitel |
| |
Serientitel | ||
Anzahl der Teile | 542 | |
Autor | ||
Lizenz | CC-Namensnennung 2.0 Belgien: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/61507 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
FOSDEM 202326 / 542
2
5
10
14
15
16
22
24
27
29
31
36
43
48
56
63
74
78
83
87
89
95
96
99
104
106
107
117
119
121
122
125
126
128
130
132
134
135
136
141
143
146
148
152
155
157
159
161
165
166
168
170
173
176
180
181
185
191
194
196
197
198
199
206
207
209
210
211
212
216
219
220
227
228
229
231
232
233
236
250
252
256
258
260
263
264
267
271
273
275
276
278
282
286
292
293
298
299
300
302
312
316
321
322
324
339
341
342
343
344
351
352
354
355
356
357
359
369
370
372
373
376
378
379
380
382
383
387
390
394
395
401
405
406
410
411
413
415
416
421
426
430
437
438
440
441
443
444
445
446
448
449
450
451
458
464
468
472
475
476
479
481
493
494
498
499
502
509
513
516
517
520
522
524
525
531
534
535
537
538
541
00:00
Turm <Mathematik>Turm <Mathematik>Rechter WinkelFlächeninhaltProgrammiergerätSchnittmengeTermMapping <Computergraphik>StellenringMultiplikationsoperatorMereologieMetropolitan area networkGreen-FunktionComputeranimation
01:36
Turm <Mathematik>FlächeninhaltMereologieComputeranimation
01:49
Turm <Mathematik>Dienst <Informatik>Mapping <Computergraphik>TermFlächeninhaltDichte <Physik>Überlagerung <Mathematik>Turm <Mathematik>SchnittmengeTouchscreenComputeranimation
02:29
Overlay-NetzDichte <Physik>Dichte <Physik>FlächeninhaltAttributierte GrammatikBitMAPZeitzoneSoftwareentwicklerSchnittmengeZahlenbereichAdressraumVoronoi-DiagrammSpezifisches VolumenOverlay-NetzPrinzip der gleichmäßigen BeschränktheitAlgorithmusAusreißer <Statistik>DifferentePlotterSymboltabelleCodeTermDifferenzkernPolygonWeb SiteTurm <Mathematik>DatenbankIdentitätsverwaltungProzess <Informatik>DruckverlaufMetropolitan area networkFlächentheorieZentrische StreckungStabComputerspielDistributionenraumCulling <Computergraphik>MatrizenringVorzeichen <Mathematik>Kontextbezogenes SystemDreiecksfreier GraphComputeranimation
08:05
FlächeninhaltFunktion <Mathematik>Dichte <Physik>Minkowski-MetrikWrapper <Programmierung>Lokales MinimumZahlzeichenInverser LimesAttributierte GrammatikCodecHIP <Kommunikationsprotokoll>RaumauflösungSteuerwerkSmith-DiagrammPunktProxy ServerDichte <Physik>Offene MengeFunktionalSoftwareentwicklerProgrammbibliothekSpannweite <Stochastik>Orientierung <Mathematik>MAPBitVersionsverwaltungGeradeCodecEin-AusgabeGrenzschichtablösungKrümmungsmaßLokales MinimumEinfacher RingGebäude <Mathematik>ZentralisatorFlächeninhaltWrapper <Programmierung>AlgorithmusTurm <Mathematik>SchnittmengeAusreißer <Statistik>Lesen <Datenverarbeitung>MathematikCASE <Informatik>PixelProzess <Informatik>Innerer PunktBimodulMultiplikationsoperatorMetropolitan area networkLastComputeranimation
13:40
FlächeninhaltInnerer PunktZellularer AutomatVersionsverwaltungKontrollstrukturCodeModul <Datentyp>SoftwaretestVerschlingungVersionsverwaltungMinkowski-MetrikDichte <Physik>CodeFlächeninhaltSchedulingSoftwaretestMetadatenVideokonferenzBimodulZahlenbereichMultiplikationsoperatorMetropolitan area networkComputeranimation
15:32
Flussdiagramm
Transkript: Englisch(automatisch erzeugt)
00:05
Thank you. Hello, everyone. It's good to be back. It's been a while. This is my first time giving a talk here. I'm really pleased to be here. My name is Joe. I am a coder. I work in London for local government.
00:21
I work a lot with geospatial data, and I am a Python programmer. Have we got any Python coders in today? Anyone using Jupyter? Cool. Right. So, let's go. So, in lockdown in 2021, we had a census in England and Wales, and the data is coming now.
00:45
Most of the data, all of the data, sorry, is spatial data. So, we want to look at this on a map. Why? Most of the data is geospatial. In local government, everything that we do generally happens somewhere, whether it's collecting a bin, looking after young people, looking after old people,
01:04
cleaning the streets. We always have to think about where this is happening. Apparently, 60% of all data is geospatial data. So, I spend a lot of my time making maps in terms of data this.
01:21
Now, I'm going to be focusing on one part of the census data set today, and that's the East End of London in an area called Tower Hamlets. This may be familiar to some people. If you've ever seen places like Columbia Road, Bethnal Green, Canary Wharf,
01:43
these are all parts of the East End of London, and this is the main area I'm going to be talking about. So, where is Tower Hamlets in London? So, what you can see here is a very small area. It's 20 square kilometers, but this is quite a special area
02:01
because in the whole of England and Wales, it has the highest population density. It has the most people packed into a small area. It also has the fastest growing population. So, it's becoming more and more dense. So, in terms of providing services for residents, we need to have a big think about where all the people are
02:24
and how they fit in. Now, when we make maps, the first thing we usually do is we make a choropleth map. However, the data set for population density in our area, and I do apologize, I couldn't fit it all on screen,
02:44
it doesn't appear very well as a choropleth. The reason is because the data set is not very evenly distributed. There is, as we will see, some areas with extremely high population density. So, over here, you've got Whitechapel.
03:02
We have very high population density in Whitechapel. Over here, we have a new development which used to be industrial land. Again, very, very high density developments, big, big towers full of people. And then we also have, just to the south of the financial sector, some areas of very high population density
03:21
with a lot of people packed into a small place. But in terms of the data, this map doesn't really help very much. So, the choropleth data didn't work for us. So, we began to think, what else can we try? And we checked the data distribution, and sure enough, we've got some serious outliers.
03:40
This is why the choropleth map didn't work very well for us. So, what did we do next? We tried to log transform the data. And yeah, you can see this area here. You can begin to see the density there. There's quite a few large developments with a lot of people squeezed in.
04:00
Whitechapel, you don't see so much happening there. And then, but you do see just to the south of the financial sector, high density of population. The areas with low density, this is where all the banks are. So, obviously, there's no people living in there. This is an old dock near to the Tower of London.
04:20
There's no people living there. There's some very nice pubs though, if you ever find yourself in that area. And the Dickens Inn is excellent. I can recommend that to everybody. And then up here in the north, we have Victoria Park, which is where the East End borders with Hackney. And obviously, there's no people there,
04:41
at least having their address registered there. Log transform data looks better on a choropleth map. However, you can see the legend, you lose the data. So, you can try to fix the legend. But we want to write as little code as we possibly can.
05:00
We don't want to keep fixing legends and things like that. So, we began to think about other ways to visualise our data set. So, what did we do? I am a Python coder, but there's a really nice package in R called Cartogram. And this is a technique called a density equalisation algorithm
05:24
that basically turns your data set into a Voronoi first, and then it rescales the polygons from the Voronoi relative to an attribute of the data. And this technique is quite popular.
05:42
There's a wonderful geographer called Danny Darling, who has an amazing website called World Mapper, which I strongly recommend you have a look at. And they do things like showing poverty, inequality, food pressure, all around the world. And they size the geographies
06:01
relative to the attributes of the geospatial data. So, this is a great technique. There is one issue here though, is that if you want to overlay different layers, then it becomes difficult. And also the map does look a little bit unfamiliar as well. But it does show,
06:21
particularly where you have like clustering, where you have a number of census areas, and I'm going to say a little bit more about census areas, where you have a few together that have a high data attribute value. Then they all get bigger together. So, what we can see here is, just to the south of the financial sector,
06:42
you can see there's a lot of worker bees all crammed into this place, and then it increases the volume on the map. So, it's a nice database, but still we have a small challenge if we want to add more data over the top. And also it's a bit unfamiliar for people that don't use cartograms.
07:00
So, this is a map made using data wrapper. It's a very nice website. And they have something called a symbol plot. And what this does is, it just basically shows little mountains, little peaks that show the value of the data attribute
07:22
that you're interested in at the place where that data is happening. And so again, we can see over here, you've got Whitechapel, lots of people packed in there. Just to the south of the financial sector, lots of people packed in there. The new developments here by the river in Blackwall,
07:43
and here by the river in the old industrial zone. So, this is quite interesting. It gives us some context, and it gives us the data. I really like this database, but it's data wrapper, so it's not FOSS, and it's not Python,
08:00
and I like to use Python. So, it was great, but it helped, but it didn't do everything that we needed it to do. The other thing that you will notice, and I'll try to explain this briefly, is that we have one really high value here. And there's a reason for this. It's an outlier because,
08:22
actually, it's this value here. It's an outlier because, and the reason why it's an outlier, is because the actual census area is really, really small. And the thing about the people who produce the census data is that
08:40
they have to create census areas using roughly 100 to 600 people. Generally speaking, it's about 300 people, but they have to make it all fit together like a big jigsaw puzzle. So, sometimes it's hard for them to make it work really well. So, in this case, this census area with really high density
09:00
is actually just one building. And so, it's not a particularly big building, but everyone's squeezed in there. So, the data is quite hard to work with, but it is interesting. So, when I was working with data wrapper, I really liked it, and it did remind me of when I was young and I was reading Lord of the Rings books,
09:22
I used to really like the map at the front of all these mountains, showing the misty mountains in those books. And so, I was thinking, I could probably make a mountain with Python. How hard can it be? It turns out it's really easy.
09:42
This is the essence of the library. It's just one function. You take a point on a map, you turn that point into a line. The line has a start point, which is just a couple of points of longitude, a tiny little bit of longitude
10:00
to the west of your point. Then you convert your point to a latitude, which is kind of like a proxy for the height of the mountain, using some kind of algorithm that you choose. In my case, I'm just using a range. So, I take the minimum and maximum value
10:21
of the input range, which is a separate function here, and range one is essentially the minimum population density and the maximum, and then I convert that to latitude values. And then the third point on the line is just a little bit of longitude
10:43
to the east of my point. And then you use that to create a small triangle, really easy, really easy, and a lot of fun as well. So, this is what I made with Python, and it's very similar to the data wrapper map, but I was going for like a kind of hand-drawn kind of a look
11:04
to make it look like something from Lord of the Rings. And it's the same thing, you've got Whitechapel here, you've got the financial sector here, and so on and so on. So, that was fun. But population density, we were just talking about the reasons
11:21
why it's a messy dataset. There's one place in Chelsea which has a population density of two million people per square kilometre. So, this is a very difficult dataset to represent using any tools available. So, it's interesting. The other thing about Kensington and Chelsea
11:42
is this is where Grenfell Tower is, if anybody knows about that story. This is where it happened. So, let's try some other datasets to see if they're really messy. This is people that live in one-bedroom homes. So, this is tiny little flats filled with people.
12:03
And so, you can see all the worker bees for the financial sector, a lot of those are living in one-bedroom flats. And actually, the new build. This is a very new development here and this is a very new development here. So, it looks like people who are building homes now are building a lot of one-bedroom homes.
12:21
Two-bedroom homes. Generally, everything is kind of the same. Nothing really jumps out here. Three-bedroom homes. What you can start to see with three-bedroom homes is that, yeah, it's generally even. But actually, in this area here, which is Bow, which is near the Bow Bells Church,
12:42
which is used to decide if someone's a traditional East End cockney or not. That's kind of this area really. So, the cockneys seem to have three-bedroom homes generally. And then four or more. And what you see here is, in the areas where the financial workers live, there's still quite a lot of four-bedroom homes.
13:02
But in some of these new build areas, there's very, very few relative to the rest of the area. So, let's look at another slightly more famous area. This is Westminster in central London. And so, you can see this is where Hyde Park is. There's no one living there.
13:21
Again, this is the population density data set. And then you've got an open street map, base map, just to help with orientation. And then in a future version of the module, I think I might do some more stuff with open street map. And then if you look at some of the outer London areas,
13:44
and this is where I live, you can see areas of urban density, but you can also see some very suburban areas where the population density is lower. This is where most people are living in houses, basically. And you can also see green space. So, we're nearly finished.
14:02
I just want to give a massive shout out to NBDev. It's really good if you use Jupyter. Just check it out. Number one, if you're trying to do version control on Jupyter notebooks, it helps you with any clashes, any merge conflicts, because it removes the metadata in the JSON
14:22
that sometimes causes conflicts. If you have a team of people working on the same notebook, this is a real lifesaver. And also, it just bakes in good practice. So, it means that your code gets shared on GitHub really easily. It helps you, or encourages you at least,
14:40
to write good documentation for your team and the community. It also encourages you to write good tests and it enables you to publish modules. So, big shout out to them. I'd also like to thank Jarek, who has produced a wonderful PWA for FOSTEM called Sojourner Ox.
15:03
Do check it out. It's a really good way of looking at the schedule for FOSTEM and you can watch the videos with Sojourner Ox. And also Ed, who's going to be giving a really cool talk on OSM and Wikidata. And finally, I'd like to thank all the council coders everywhere.
15:23
Thanks for having me.