Data Science with OpenStreetMap
Formal Metadata
Title 
Data Science with OpenStreetMap

Title of Series  
Author 

Contributors 

License 
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. 
Identifiers 

Publisher 

Release Date 
2018

Language 
English

Production Year 
2018

Production Place 
Salzburg

Content Metadata
Subject Area  
Keywords  OpenStreetMap Open Data Data Science 
Related Material
00:00
Attribute grammar
Open set
Amenable group
00:23
Intel
Building
Distribution (mathematics)
Code
Length
File format
Database
Function (mathematics)
Sphere
Counting
Mereology
Area
Variance
Digital photography
Geometry
Type theory
Different (Kate Ryan album)
Visualization (computer graphics)
Finitary relation
Query language
Special functions
Information
Vertex (graph theory)
Extension (kinesiology)
Amenable group
Area
Source code
Service (economics)
Mapping
Web page
Point (geometry)
Interior (topology)
Quadrilateral
Attribute grammar
Element (mathematics)
Type theory
Process (computing)
Quadrilateral
Website
LipschitzStetigkeit
Geometry
Ecoinformatics
Point (geometry)
Frame problem
Statistics
Functional (mathematics)
Table (information)
Computer file
Dependent and independent variables
Average
Focus (optics)
Operator (mathematics)
Subject indexing
Boundary value problem
Energy level
MiniDisc
Gamma function
Operations research
TurboCode
Execution unit
Key (cryptography)
Information
Interface (computing)
Database
Line (geometry)
Cartesian coordinate system
Subject indexing
Graphical user interface
Visualization (computer graphics)
Personal digital assistant
Query language
Function (mathematics)
Network topology
Blog
Revision control
Key (cryptography)
Musical ensemble
Table (information)
Integer
Window
Voronoi diagram
Library (computing)
Extension (kinesiology)
05:54
Metre
Functional (mathematics)
Scaling (geometry)
Table (information)
Key (cryptography)
State of matter
Projective plane
Set (mathematics)
Price index
Pivot element
Plot (narrative)
Punktgruppe
Attribute grammar
Connected space
Subject indexing
Geometry
Process (computing)
Personal digital assistant
Amenable group
Library (computing)
Geometry
Row (database)
08:14
Linear regression
Structural load
Logistic distribution
Outlier
Multiplication sign
Set (mathematics)
Web 2.0
Data model
Geometry
Singleprecision floatingpoint format
Visualization (computer graphics)
Square number
Circle
Software framework
Endliche Modelltheorie
Library (computing)
Amenable group
Social class
Theory of relativity
Mapping
Linear regression
File format
Logistic distribution
Hexagon
Vector space
Prediction
Software testing
Curve fitting
Point (geometry)
Frame problem
Random number
Functional (mathematics)
Mapping
Graph coloring
Wave packet
Population density
Radius
Vector graphics
Uniqueness quantification
Software testing
Gamma function
Slide rule
Code
Counting
Volume (thermodynamics)
Letterpress printing
Wave packet
Visualization (computer graphics)
Personal digital assistant
Electronic visual display
Library (computing)
15:30
Open set
00:03
so hello guys my talk is gonna be about
00:07
data science with OpenStreetMap so I'll continue the things Jakob told you and give you some other flavors to it so first as Jacob said there are different amenities that are different tags that you can use for example one tag that is common is the
00:25
amenity tag so in OpenStreetMap as you said as you saw before you have key value pairs and for each key you can have some value and one very common key is the amenity key and as you can see there are a lot of things like beer garden cafe barbecue bar a parking spot whatever and you can explore these tags in a site called tag info there you have like a statistic of the whole of street map data how many buildings are there how many highways are there how many walls are no and yes so how do you load this data you can use tag info as an API you can download for example in our case the key to emanate amenity key and you can download this with the API and you can see like the most most used amenity key is parking then place of worship school bench and so on and if we visualize these tags we can see like most amenities are parking spots then you have like place of worship school bands restaurant and then it drops off and as Jacob said apparently I'm a guru
01:58
and of in overpass API so I'm going to show you how to download overpass you have an API which you can access with their own query language of course you can do that also with Python and this is a fairly simple example where we we want query which is this is the code blog you saw before and over in the overpass turbo which was before the the GUI interface and you can see here we want a JSON output we want inside Austria this is admin level is the definition of the boundary and then you use this for searching all amenities restaurants so we can see oh and then we want to count all of them and you can see where is it so we have eleven thousand nine hundred and ten restaurants in Austria in OpenStreetMap and with this data so first we've taken so we know what kind of tags we want with opens overpass API we know how to download the data and now we want to store the data so for this I'm going to use post GIS post J's is a special database extension for Postgres SQL it offers various spatial types like geometry and geography you have various spatial indexes like archery KD tree and quad tree for faster search and also you have special functions no yeah this is like the most interesting part about post post keys that you have various different functions which you can apply in your search so you can do like SQL query like you would always do and you would add something like okay I want a length of this line or I want the area of this area or you want the x value of this point and you can do much more complicated things like you want a point inside an area or you want an intersection of areas you can even do things like Voronoi and Dylan a and everything so creating a table and post pieces like you would you do in normal SQL the only difference is that you have this new type as we saw before the geography type and you can insert it in in the same way but in this case you would use the the some of these special functions which we had and these would take something like the wellknown text format and now that we have the data somewhere stored how do we process this data so there's a neat library called you pandas which is basically just pandas with another column for geometry and it you can use everything that you can use in pandas you can use this there as well and it uses Shapley for the geometric operations it uses Fiona for the file axis and for visualization it needs this big heart and not plot lip this is only important if you install it on Windows because most of these things don't work properly there so you would need to install each of them by by by hands okay yes so how do we load this Lotus you have again the same thing for peyten you have a great
05:55
library called psychup g2 and you can connect to there and you can get the connection and then you can simply use this function from post keys to load all the the data and in this case we have a data set of all amenities in Austria with their amenity tag with their States and the geometry and some other made up some other attributes and we can take this data and visualize them so this would be all the amenities in Austria you can see the Alps somewhere here and like most of it is collected and in the North yeah and you have also you have very easy functions for job for projection so in when you want to project from the lat/long projection to some special projection for austria only so in this case we would have a projection which is in mid meters and you can see the the scale is Anitra's there as well so what are the
07:10
most common amenities in Salzburg first we have to transfer more data so you saw the the data before so we have just each tag and each state and so we want to have for each state's the tag so we regroup we do a group by as we will do in SQL and then we count rosette indexes just to to make it work and then you do a pivot and you transform it in such a way that you have the keys and amenities as columns and rows its columns and indices and when we visualize it the most common amenity is bench if you remember before the most common amenity in the world is parking and apparently in Salzburg there's not enough of it yeah okay so when you we have all our
08:18
data we we know how to work with it what what can we do with it so let's let's ask some interesting questions like what is the most French city ok so we want to use simple data science techniques to determine whether a city is French so how would we do that so first we I prepare the data set for all amenities in France in Austria Swiss and Germany and I'm gonna use Germany and France to determine which one is more France so I took like the opposite of France to be Germany yeah and I use this as a as a classifier which we're gonna use later on to determine whether a city is French or not okay yes so first we're
09:17
gonna take all only Germany in France and then we need feature vectors for all the amenities so we have the counts of each amenity in each city and then we want the label to be zero for Germany and one for France this is like our target vector and our feature vectors and now we we separate our data into testing and training so we can see if our model is working on other data as well and to the model we're gonna use it's gonna be logistic regression which is commonly used for classification but it's it's basically a regression between two classes so if you have in our case France and Germany this would give us a continuous value between these classes so if we give him a feature like let's say the freezer from a feature from Salzburg we give him the feature vector from Salzburg and then he he can calculate probability that it's French or German and this is gonna be useful for later this is fairly simple like you can use skycat scikitlearn again with peyten and you train the model and then we have some we receive our scores so for training we have almost 90% accuracy for testing is 82 that's it's not good but it should be enough for our case and let's see what's the most French Austrian city okay and any guesses so far it's true yeah ladies love the most but it's it's one of the most dear there are not many cities but okay I'll show you know Salzburg is actually the least French city so the most French city is Linz I will get to it why this might be the case and yeah Vienna is apparently somehow French might be the accent there okay let's do a visualization of the map of the most French cities so we have our data set from swish Swiss cities from French cities Austrian cities and German cities and we want to use that data to to so we want to use that data to visualize how French they are so we're gonna use our French net score so what we're gonna do is we're gonna use the logistic regression we trained before so we have our classifier and there we have this function called predict probability and there you can say okay how much is the probability that it's Class A or how much is the probability that it's Class B for Class B is French so we take one for French and we calculate this for each feature vector in our whole data set then we have to normalize the Frenchness and then we're gonna use very very great library called volume I actually learned it today so it didn't work immediately but it worked it's using leaflet GS so it's basically a web visualization framework that relation library and it's it's very easy to set up so you just have to import it and then you have a map and you can already display what's the only thing which is now different is that we're gonna use a color map from matplotlib which we gonna convert to hex hex format in color and this one gun we're gonna use this as a circle marker for each point and we are gonna have a name for each city when we click on it and then we have the fill color which we have from the map and this is our map so the bloom or dark blue is our the blue is German and the purple is French and you can see here Linz is like the most French of all cities even more French than the French yeah so the reason why I think it might be is that when I was looking at Lin's Lin's has the highest density of all cities in parking spots so as Jakob said before OpenStreetMap is by public you done by public users so if some user is like very ambitious and he wants to dedicate all his time to mark every single parking spot in Linz which they did like they had squares for each parking spot in Linz and you you might have these outliers and that's my such L my my suspicion that it what's the reason for having this high Frenchness here yeah and you can see you can interactively take a look at it and you can like look I have popups and everything and yes so this was if visualization doesn't work and yes this was it [Applause]