Interactive Mapmaking with Python
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 130 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/49913 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Right angleMeeting/Interview
00:41
Euclidean vectorPoint (geometry)PolygonCircleMultiplicationInterface (computing)File formatOperations researchAxonometric projectionNetwork topologySubject indexingGeometrySoftware maintenancePlot (narrative)Wechselseitige InformationMonster groupSimulationCircleGraph coloringOpen sourceGeometryLink (knot theory)Disk read-and-write headControl flowTesselationPolygonProjective planeSlide ruleZoom lensComputer reservations systemInterface (computing)PlotterObject (grammar)Different (Kate Ryan album)Frame problemCASE <Informatik>PlastikkarteFile formatNumberOperator (mathematics)Coordinate systemParameter (computer programming)MultiplicationFunctional (mathematics)Type theoryPhysical systemData structureFormal languagePoint (geometry)Integrated development environmentElectronic mailing listString (computer science)Subject indexingUniform resource locatorArithmetic meanWordSoftware maintenanceComputer fileConnectivity (graph theory)Shape (magazine)Right angleLine (geometry)Overlay-NetzLibrary (computing)Field (computer science)Software developerInteractive televisionCombinational logicOnline helpMessage passingVolume (thermodynamics)Shared memoryReading (process)WeightGoogolGUI widgetComputer animation
09:46
Cluster samplingVisual systemSet (mathematics)Scale (map)Point (geometry)Kepler conjectureLaptopInstallation artExtension (kinesiology)Kepler conjecturePoint (geometry)Extension (kinesiology)CASE <Informatik>Interactive televisionInformationSubject indexingNumberConnectivity (graph theory)Interface (computing)Web browserDifferent (Kate Ryan album)GeometryProjective planeCrash (computing)1 (number)DataflowConfiguration spaceTesselationUniform resource locatorRevision controlPolygonPlotterSource codeOpen setMultiplication signState of matterGraph coloringFunctional (mathematics)GoogolWeightOverlay-NetzFilter <Stochastik>Plug-in (computing)CuboidElectronic mailing listGene clusterFrame problemSpacetimeParameter (computer programming)Drill commandsInstallation artScaling (geometry)Type theoryOnline helpMessage passingBasis <Mathematik>Physical systemRight angleTemporal logicField (computer science)Group actionBitLaptopRadiusNP-hardFlow separationMetadataVolumenvisualisierungComputer animation
18:50
Function (mathematics)Bit rateSynchronizationMusical ensembleComputer virusWrapper (data mining)Interface (computing)Endliche ModelltheorieGraph coloringLine (geometry)Zoom lensFunctional (mathematics)Connectivity (graph theory)Parameter (computer programming)PlotterEstimatorEvent horizonCodeGene clusterArithmetic meanConfiguration spaceInternet service providerFrame problemTesselationScripting languageSummierbarkeitSimilarity (geometry)Multiplication signGeometryElectronic mailing listOpen sourcePoint (geometry)Template (C++)Goodness of fitWrapper (data mining)NumberRange (statistics)Coordinate systemKepler conjectureLibrary (computing)Axiom of choiceComputing platformCross-correlationRepository (publishing)Interface (computing)CASE <Informatik>Function (mathematics)ScatteringCircleRight angleGraph (mathematics)Computer fileFigurate numberMobile appRadiusPolygonPhysical systemGoogolVolume (thermodynamics)Semiconductor memoryShared memoryTwitterBlock (periodic table)Field (computer science)Phase transitionSet (mathematics)Numbering schemeCuboidGUI widgetComputer animation
27:54
Open sourceAugmented realityVolume (thermodynamics)Kepler conjectureCuboidCASE <Informatik>Sinc functionInternet service provider1 (number)TesselationCodeFunctional (mathematics)PlotterTrajectoryProjective planeLattice (order)Meeting/Interview
Transcript: English(auto-generated)
00:06
So, for our next talk, we have someone special joining us, you know. So this person who I'm going to introduce is like one of the funniest people I know.
00:21
And you know, I sometimes secretly wish I was him. So let's welcome that person now. Okay, hello everyone. It's my talk right now. Thank you for that humbling introduction.
00:42
My name is Sangarshan. I work at Grofers and it's pronounced Grofers. So I like building imaginary projects in my head, talking about those imaginary projects in my head. And you know, I also love all those imaginary projects. So I'm going to be talking about one such case where that imaginary project actually
01:04
got executed and which was amazing. So that is the story of, this is the story of how I did that. So I'm going to begin by proclaiming my love for pandas. You know, whenever we want to work out, whenever almost all of us want to work with data,
01:22
we move to pandas and you know, rightly so, pandas is amazing. It's like the Tom Hanks of Python libraries. And of course, it rests on top of all these amazing, you know, packages which are present
01:40
in the open source environment. These include, you know, matplotlib, we have numpy, we have pandas and then we have scikit-learn. You know, all these amazing open source packages from the scientific Python community makes Python the go-to language of when anyone wants to do anything with data.
02:04
So we're going to talk about a special type of data, which is location data. Location data has location component linked with it. For example, I'm reading a CSV file over here, which has latitude and longitude. So this is essentially a location component.
02:25
So a data with a location component can be represented as a geometry. And there are, so geometries are basically just X and Y coordinates. And the various ways you can arrange them.
02:41
So these geometries can be a pair of lat-longs or they can be a list of lat-longs, just a polygon. And also I'm grossly oversimplifying what geometries are. And Euclid would be really angry with me right now, but you also have line strings, which are like line strings and we have multi-polygons, which is a fancy plural word for polygons.
03:04
And we also have circles. Now to work with these geometries, we have this amazing package in the open source community called Geopandas. So with Geopandas, we can work with data frame and in this case, a geodata frame.
03:22
And we can stay with a familiar interface. And Geopandas will let us read and write commonly used geospatial formats, like KML files, shape files, and GeoJSON files using the other library called Fiona.
03:42
And then we also can perform different spatial operations like merge, overlay, and joins with the help of another amazing package called Shapely. And also of course, the crux of the talk, we can plot them on a map with matplotlib. And also I'm living a whole lot of things which we can do with Geopandas,
04:05
like handling projections and recently added stuff like vectorized operations with PyGeos and also indexing your data with R3s or any other data structure. And if you are interested in knowing more, this is a tutorial link which I have attached here.
04:26
I will share these slides on the break room. And this tutorial has been made by the maintainer. So it's like pretty comprehensive and it's pretty amazing. All right, so moving on to our data frame. So we have this data frame, which I read before, and we have LAT and LON.
04:44
So now we convert it into a geodata frame. So how we do that is we import Geopandas and also we convert this LAT and LON column into a point. A point is a geometry. So we set geometry equals combination of these two columns.
05:03
So it creates a column called geometry and it creates a point geometry object using Shapely. So this is that point. And also on the second line, we are setting something called CRS.
05:20
So CRS is the Coordinate Reference System. So this is how we project our Earth in a way that we can understand and interpret. So in these cases, we have used EPSG4326, which is like the most commonly used coordinate reference system,
05:47
which is also used by GPS systems and all the navigation systems. So this is the reason why latitudes and longitudes have spatial meaning in them. If you don't have Coordinate Reference System, they are just arbitrary X and Y coordinates.
06:05
They are not latitudes and longitudes. So how do I plot them? So with Geopandas, everything becomes really simple. It's just a dot plot. So you just do import matplotlib and then you just, you know, geodata frame dot plot.
06:22
And then you have this map. But this does not look like a map, right? It's missing something, interactivity. So to do interactive maps with Geopandas, we actually have to move away from Geopandas and into other amazing open source libraries, which are there for creating interactive maps.
06:44
So how do we do that? To do that, we employ the most powerful weapon a developer has, which is Googling. So that's what I did. I just Googled for a while and, you know, I actually got to know that folium is actually one such package that actually lets me do this.
07:01
So I take in the geodata frame and then I import folium. And then now I create a folium dot map. So this folium dot map is just an empty canvas, kind of like a canvas where you can draw on top of. So this just creates a dummy map for me where I pass in the location, the
07:23
zoom start, the tiles, and also we have various other parameters you can pass to the map. So after creating that dummy canvas, now we want to plot our data on top of this canvas. So to do that, we create a feature called points. So these points feature is made up of our geodata frame, which we created before.
07:45
And also we had a parameter called tooltip. A tooltip is also a feature in which the field is named. So when I hover over this map, I can actually see the name of each and every point that I just plotted. Now, after creating this feature, I take the canvas and I add this feature to the canvas.
08:05
A very Pythonic API, very simple and incredibly effective to create beautiful looking maps. Now, onward to polygons. So we again read a geodata frame, which is made up of polygons instead of points.
08:21
As you can see, it's made up of multi polygons and also polygons. And these are all different countries and continents. So to create a map out of them with Geopandas, we just again do just a dot plot. Pretty easy. And we now choose the column which we want to plot on top of. For example, here I've chosen GDP.
08:41
So that means this map is colored based on the GDP. And also we pass the color map so that it goes from this color to that color. So I have also attached a link for the color maps. You know, the different color maps for matplotlib which are present. And also I give legend equals true so that I actually see this widget.
09:04
So now onward to the interactive maps. To do that, first, we actually create this color map. To do that, I'm using another package called brancadot color map, using which I'm actually importing this color map. And then I'm passing the number of steps I have to split it out of.
09:21
So after creating the color map, I define my empty folium canvas. And also I define a style function. So this style function contains the various parameters that I'm going to be passing to my map which I'm going to create right now. So it's made up of a fill color which denotes the color which I have to fill in that particular polygon.
09:43
And also the color of that polygon itself. And also the weight of that border which is the polygon color. And also how much is the opacity of the color I have to fill it into. So these are the parameters. And then we again, we create a geo JSON feature.
10:01
We pass a metadata frame. We create the tooltip. And this time I'm passing the name and also the GDP. So when I hover over it, I see both of them. Amazing. So I also pass in the style function and then just add it to the canvas. And also notice that I add the color map to the canvas. So I see this incredibly distorted legend.
10:21
But I still see it. So folium also has several special maps that you can actually create with it essentially. So these are called plugins that you actually create with these. So one such plugin is called marker cluster.
10:44
So with marker cluster, you just import a marker cluster from folium.plugin. And now I think a little weird where you wherein you need to take a geo data frame. And convert it into a list of latitudes and longitudes. So we do that over here.
11:02
Now we have a list of lat longs. And now we create an empty canvas. And we again do the drill, add child marker cluster locations. We pass the list. And we get it. It's a cluster. So the number two here means that this cluster is made up of two points.
11:23
So the number eight means that this cluster is made up of eight points. And the cluster 76 means that this cluster is made up of 76 points. So I can click on it. It splits into several clusters again. I click on it. It splits into clusters. Still, I'm able to see each and every individual point, which is pretty awesome.
11:43
And now heat map. So heat map is something which, given the current situation, we actually see on a daily basis. So we know how important they are to convey the message. So this is the heat map we're creating. Also, this does not represent anything. This is just an arbitrary heat map.
12:02
To do this, we take the location and just create a heat map out of it with a radius of this particular map. So heat maps actually are pretty interesting because you just have to take one look at
12:21
it to get all the information out of it, essentially. So these are pretty cool. So now more data and more problems. So one issue which I started to notice when going down the folium path is when the number of points in your map start to increase, folium starts to get a little bit ineffective.
12:44
Folium cannot handle a lot of points. So one solution which you can deploy to make folium actually handle a lot of points is to actually create clusters like this. Create markbusters so that you can actually plot a lot of points.
13:01
But in some cases, the number of points you have is a lot. So your DOM will just crash. Your browser crashes, and you will not be able to plot all these points. So that's where things get a little hard. But no fret not.
13:21
We have a new kid in town, and that's Keplegian. So this is the textbook definition of Keplegian. So essentially, it's this amazing project which was developed at Uber. So they have built Kepler on top of DecGL and WebGL. So they actually can render millions of points and perform spatial aggregations on the fly,
13:49
which is pretty amazing. You can create amazing, cool-looking maps with Kepler when you want to scale out your points or scale out your data.
14:02
So to install Kepler, when you are on a Jupyter notebook that's greater than version 5.3, you can just do a pip install. Kepler GL installs it, and when you are in JupyterLab, you just do JupyterLab extension, install the Kepler extension.
14:22
And you enable it after installing, and then you are good to go. You just be able to import Kepler GL, create a map out of it, and you see this amazing interface. So another amazing thing about Kepler is that it uses configs to customize the maps.
14:42
So a config looks like this. It has version, it has states, filters. I'm not going to go deep into the config because it is amazingly customizable. So I'm not going to go really deep into that.
15:00
But I'm going to actually give you an overlay of all the things that you can control with the config. So Kepler's UX flow is actually made up of five layers. So initially, you have your base map, right? So the base map is made up of tiles that you can acquire from various sources.
15:21
For example, you can acquire tiles from OpenStreetMaps, you can acquire map tiles from Google, you can acquire map tiles from Mapbox, several styles, and that makes up your tiles. And on top of these tiles, you overlay your data. And also on this data, you can perform filters, aggregations, and all the lot.
15:49
And to perform filters and aggregations, you can actually represent your data layer as five different types with Kepler. So we have hexagonal layers, we have arc layers, we have point layers,
16:03
we have path layers, and we have grid layers. And the amazing thing about it is that these layers, some of these layers are actually interchangeable. For example, you can have a point layer that you can represent as a grid. And you have H3 grids, which are actually a spatial indexing system,
16:26
which is also developed courtesy of Uber, which can help you plot a lot of points without having to actually show those points on a map. Rather, index them and aggregate them as grids,
16:41
so that the number of points in your map significantly decreases. And also, you have interactions you can perform on the data that has been plotted on a map. And these interactions include stuff like tooltips. For example, the tooltips are the ones which I showed before,
17:03
where you hover over a component in a map and you see the name or something related to that particular geometry. And also, Kepler has these interactions. For example, you can geo-code a particular location by just using an interaction. You have brushes and you have also the coordinate interaction.
17:24
Given the customizability of Kepler and given the amount of layers it can process, you can actually see one in action right now, where an aggregation of points is getting performed on the fly.
17:45
So, we're just choosing a color and we're choosing to color those grids based on a particular field, which is actually getting done pretty fast. And also, this is one of the things I absolutely love about Kepler,
18:03
is how amazingly beautiful spatiotemporal maps you can create. So, a quick definition of spatiotemporal maps. A spatiotemporal data is when you have both space and time linked with a particular data, which is pretty cool.
18:22
So, an example of this can be a trip or a practical example would be a drone data. So, you have a drone that you're flying around and when you get the data out of that drone, you actually have two components of where it was and when was it there.
18:42
So, you can use this information to create a timeline chart using Kepler and it looks as beautiful as this. So, this is actually a trip data. So, you can see trips over time. It looks pretty beautiful if you can create them. But, as always, these are not the only libraries that are there
19:05
and given the amazing community that is present in the open source world, you can actually, you know, you have like tons of libraries that you can depend on. You don't have to just, you know, you have countless libraries.
19:24
That's what I'm trying to say. So, I'm going to just explore them. So, one such library is to book it. So, with book it, you can actually do something similar. So, here I'm just, you know, importing the figure, the output file, and I'm importing the tiles.
19:43
I'm using CartoDB styles, which is going to be my base map provider. And, you know, I'm just creating an empty map. Here, I'm actually choosing my coordinate reference system, which is Mercator. And also, I'm providing the range of the X and Y, which is, you know, pretty interesting
20:04
and is not a feature that you have in the other libraries. So, with everything, you get something new out of it. You know, it's not all the same and, you know, it's good to actually explore this. It's actually fun to explore these libraries, you know, and how people are doing the same thing,
20:23
but in really different and really amazing ways. And also, we have Plotly. Plotly and Bokeh are not just for maps, but for, you know, just basic interactive plots. And also, we have Altair. So, over here, with Plotly, I'm actually using Plotly Express so that this code is a little smaller.
20:47
With Plotly, it gets a little bigger. So, with Plotly Express, I'm importing the carshare dataset, and I'm creating a scatter mapbox plot by just passing the data frame, the latitude, the longitude, the color, and the size.
21:02
So, here, actually, this is pretty cool because I'm able to represent more than I would be able to represent normally. So, I'm using color to represent a feature, and I'm using the size to represent another feature. So, I have these two features complementing each other on the same map.
21:20
So, this is really useful in cases where you want to spot correlation between two events happening linked to the same spatial component at the same time. So, it's cool that you can do this with Plotly. So, this talks about Geopatra, which is something I worked on on my free time,
21:45
which is kind of like a hack function of all the scripts that I wrote to quickly create interactive maps, but without going to other libraries by staying in the realm of geodata frames, essentially.
22:02
So, if you have not cringed yet, Geopatra is named Geopatra because it rhymes with Cleopatra, and a Cleo and Geo kind of rhyme, and I thought I was being clever at the time.
22:20
So, to create a folium plot, you just import Geopatra, and you do the dataframe.folium.plot, and you just plot it. So, it's just a single line to plot your Geopatra. Oh, but one thing which even I thought myself while just writing this is,
22:42
why can't they just be these scripts which I keep for myself, and why does it have to be another library? It can just be a gist of functions that can plot Geodata frames. Not a library, right? So, again, it's not a library. It's a wrapper, essentially. A gigantic wrapper. And it's an interface, an interface to interact with the existing libraries without leaving the Geopantas ecosystem.
23:09
And also, I have Netflix syndrome. So, if you don't know what that means, Netflix gives you a lot of choices when you open the app, and people get really overwhelmed at that choice,
23:22
and it's really hard for you to choose something. So, it's the same thing for me. There are these amazing libraries which complement each other, and also they are different in their own way, and I don't want to choose one of them. I want all of them.
23:40
And also, I actually want to spend more time plotting and spend less time Googling for APIs, because I have a very bad memory, and I tend to forget, you know, how do I create a chloropedic map with volume? How do I create a heat map? I just keep Googling for different packages, and that sometimes gets tiring.
24:08
So, alright. So, let me move to the documentation of Geopantas, essentially. So, with Geopantas, I'm actually just taking these geodata frames out, and I'm creating these interactive maps.
24:23
These are the GeoJSON maps. So, this documentation is at geopatra.readthedocs.io, and the GitHub repository is github.com slash sangarshan slash geopatra. So, since my name is really weird, I actually get my name to be the username in every platform that I go to.
24:45
So, on Twitter, on GitHub, on every platform, it's just my name. So, I create a GeoJSON map with just one line. I create a circle map with just one line. I pass the parameters of the tooltip, which is named so that I hover over it to actually see the name,
25:02
and also I have the radius, which is the radius, and I fill it with a color, and the fill color is red, and the zoom is 100 so that I can zoom out. So, these are the chloropet maps, which is, again, one line. You color by the population instead of the GDP, but they are obviously correlated,
25:26
and the tooltip is the name, so I hover over it to see the name. I set the color. I set the zoom, and I set the style. And mark clusters are, again, one line. I create the marker clusters, and I also have weighted model clusters, which are kind of in the experimental phase, essentially.
25:47
So, these are the ways where you can cluster with not just points but with value. So, these two means there are two points, right? But here, the clusters mean that when you go here, the sum of the values here,
26:02
you know, account to 986. It's not essentially the number of points. And, you know, they look really weird because it's not completely ready yet. I'm just playing around with things. And also, of course, we have the heat maps that are always there. And, you know, this is what I'm actually excited about, which is Kepler.
26:24
So, with Kepler, I also have a way to create an interactive map with just a single line, which is, you know, again, the data frame dot Kepler dot plot. And, you know, it creates this map for me. So, I'm actually controlling the config of Kepler with, you know, a ginger template, essentially.
26:43
So, the parameters you pass here just replace those pre-built templates which are there. And, you know, it's pretty easy to plot them, actually. And also, to create a simple polygon, I pass the color, I pass the stroke, which are now a list of RBGs, but they will not be released soon.
27:04
I have a PR ready to, you know, make them into actual colors rather than passing a list of RGBs. But for now, they are. And also, I pass the stroke thickness, which is a parameter that I pass the opacity. And, you know, to create this plot again.
27:20
So, now they are all the same color, right? I can also pass a new parameter called color field, which is the population estimate. And I also pass the color scheme, which is blues. So, right now, because of this, it is actually colored based on the population estimate rather than being colored arbitrarily.
27:43
So, yeah, that's it for Geopatra. And thank you for coming.
28:00
Okay, I'm just going to look at the questions. Okay, I have two questions. Can you talk about how Kepler is better than Mapbox since it's built on top of it? Okay, so Kepler and Mapbox are actually complementing. Like, one is not better than the other, essentially.
28:22
So, actually, Kepler uses Mapbox internally, to be honest. So, Mapbox is the tile provider for Kepler. So, the base maps which you see when you want to create plots on Kepler, they are all Mapbox tiles. So, it's not kind of a competition, but kind of complementing each other, essentially.
28:46
So, yeah, that's that. Also, another question. Do either folium or Kepler support non-geographic map, for example, mapping a star map? I'm not sure about folium, but with Kepler, you can actually provide your own custom base map.
29:08
So, if you actually Google for it, you can actually find a map where people have plotted a baseball stadium. And where they have actually modeled the whole stadium on the base map and plotted how different shots have flown,
29:26
you know, the shots which were hit and the different trajectories they had gone into. So, like, that was pretty cool. So, I guess it is possible, you know, where you can pass your own custom map, you know, custom tiles and, you know, create maps out of it.
29:43
But it is actually pretty hard to do, but it is possible. But they are mainly present for geographic maps. That's their main use case, but it is possible to extend them and customize them. And, of course, the code is all open source.
30:01
So, all the projects I talked about are open source. Are there any map packages that also integrate AR functionality? So, I assume you mean augmented reality. And to be honest, I don't know.
30:20
I have not worked with AR maps. So, I just have done stuff with Unity and Euphoria, which is pretty basic. So, I'm not really sure. I can actually maybe Google for it and, you know, get back to you, or we can discuss about it.
30:43
So, okay. I think that is it. We are out of questions. So, thank you very much, everyone. Thank you for joining. I hope you liked my introduction. That was something I was planning for. And, you know, thank you for being here.
31:03
Thank you all.