AV-Portal 3.23.3 (4dfb8a34932102951b25870966c61d06d6b97156)

Data Science with OpenStreetMap

Video in TIB AV-Portal: Data Science with OpenStreetMap

Formal Metadata

Data Science with OpenStreetMap
Title of Series
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Production Year
Production Place

Content Metadata

Subject Area
Keywords OpenStreetMap Open Data Data Science
Attribute grammar Open set Amenable group
Intel Building Distribution (mathematics) Code Length File format Database Function (mathematics) Sphere Counting Mereology Area Variance Digital photography Geometry Type theory Different (Kate Ryan album) Visualization (computer graphics) Finitary relation Query language Special functions Information Vertex (graph theory) Extension (kinesiology) Amenable group Area Source code Service (economics) Mapping Web page Point (geometry) Interior (topology) Quadrilateral Attribute grammar Element (mathematics) Type theory Process (computing) Quadrilateral Website Lipschitz-Stetigkeit Geometry Point (geometry) Frame problem Statistics Functional (mathematics) Table (information) Computer file Dependent and independent variables Average Focus (optics) Operator (mathematics) Subject indexing Energy level Boundary value problem MiniDisc Gamma function Operations research Turbo-Code Execution unit Key (cryptography) Information Interface (computing) Neuroinformatik Database Line (geometry) Cartesian coordinate system Subject indexing Graphical user interface Visualization (computer graphics) Personal digital assistant Query language Function (mathematics) Blog Network topology Revision control Key (cryptography) Musical ensemble Table (information) Integer Window Voronoi diagram Library (computing) Extension (kinesiology)
Metre Functional (mathematics) Scaling (geometry) Table (information) Key (cryptography) State of matter Projective plane Set (mathematics) Price index Pivot element Plot (narrative) Punktgruppe Attribute grammar Connected space Subject indexing Geometry Process (computing) Personal digital assistant Amenable group Library (computing) Geometry Row (database)
Linear regression Structural load Logistic distribution Multiplication sign Outlier Set (mathematics) Web 2.0 Data model Geometry Single-precision floating-point format Visualization (computer graphics) Square number Circle Software framework Endliche Modelltheorie Library (computing) Amenable group Social class Theory of relativity Mapping Linear regression File format Logistic distribution Hexagon Vector space Prediction Software testing Curve fitting Point (geometry) Frame problem Random number Functional (mathematics) Mapping Graph coloring Wave packet Population density Radius Vector graphics Uniqueness quantification Software testing Gamma function Slide rule Code Counting Volume (thermodynamics) Letterpress printing Wave packet Visualization (computer graphics) Personal digital assistant Electronic visual display Library (computing)
Open set
so hello guys my talk is gonna be about
data science with OpenStreetMap so I'll continue the things Jakob told you and give you some other flavors to it so first as Jacob said there are different amenities that are different tags that you can use for example one tag that is common is the
amenity tag so in OpenStreetMap as you said as you saw before you have key value pairs and for each key you can have some value and one very common key is the amenity key and as you can see there are a lot of things like beer garden cafe barbecue bar a parking spot whatever and you can explore these tags in a site called tag info there you have like a statistic of the whole of street map data how many buildings are there how many highways are there how many walls are no and yes so how do you load this data you can use tag info as an API you can download for example in our case the key to emanate amenity key and you can download this with the API and you can see like the most most used amenity key is parking then place of worship school bench and so on and if we visualize these tags we can see like most amenities are parking spots then you have like place of worship school bands restaurant and then it drops off and as Jacob said apparently I'm a guru
and of in overpass API so I'm going to show you how to download overpass you have an API which you can access with their own query language of course you can do that also with Python and this is a fairly simple example where we we want query which is this is the code blog you saw before and over in the overpass turbo which was before the the GUI interface and you can see here we want a JSON output we want inside Austria this is admin level is the definition of the boundary and then you use this for searching all amenities restaurants so we can see oh and then we want to count all of them and you can see where is it so we have eleven thousand nine hundred and ten restaurants in Austria in OpenStreetMap and with this data so first we've taken so we know what kind of tags we want with opens overpass API we know how to download the data and now we want to store the data so for this I'm going to use post GIS post J's is a special database extension for Postgres SQL it offers various spatial types like geometry and geography you have various spatial indexes like archery KD tree and quad tree for faster search and also you have special functions no yeah this is like the most interesting part about post post keys that you have various different functions which you can apply in your search so you can do like SQL query like you would always do and you would add something like okay I want a length of this line or I want the area of this area or you want the x value of this point and you can do much more complicated things like you want a point inside an area or you want an intersection of areas you can even do things like Voronoi and Dylan a and everything so creating a table and post pieces like you would you do in normal SQL the only difference is that you have this new type as we saw before the geography type and you can insert it in in the same way but in this case you would use the the some of these special functions which we had and these would take something like the well-known text format and now that we have the data somewhere stored how do we process this data so there's a neat library called you pandas which is basically just pandas with another column for geometry and it you can use everything that you can use in pandas you can use this there as well and it uses Shapley for the geometric operations it uses Fiona for the file axis and for visualization it needs this big heart and not plot lip this is only important if you install it on Windows because most of these things don't work properly there so you would need to install each of them by by by hands okay yes so how do we load this Lotus you have again the same thing for peyten you have a great
library called psych-up g2 and you can connect to there and you can get the connection and then you can simply use this function from post keys to load all the the data and in this case we have a data set of all amenities in Austria with their amenity tag with their States and the geometry and some other made up some other attributes and we can take this data and visualize them so this would be all the amenities in Austria you can see the Alps somewhere here and like most of it is collected and in the North yeah and you have also you have very easy functions for job for projection so in when you want to project from the lat/long projection to some special projection for austria only so in this case we would have a projection which is in mid meters and you can see the the scale is Anitra's there as well so what are the
most common amenities in Salzburg first we have to transfer more data so you saw the the data before so we have just each tag and each state and so we want to have for each state's the tag so we regroup we do a group by as we will do in SQL and then we count rosette indexes just to to make it work and then you do a pivot and you transform it in such a way that you have the keys and amenities as columns and rows its columns and indices and when we visualize it the most common amenity is bench if you remember before the most common amenity in the world is parking and apparently in Salzburg there's not enough of it yeah okay so when you we have all our
data we we know how to work with it what what can we do with it so let's let's ask some interesting questions like what is the most French city ok so we want to use simple data science techniques to determine whether a city is French so how would we do that so first we I prepare the data set for all amenities in France in Austria Swiss and Germany and I'm gonna use Germany and France to determine which one is more France so I took like the opposite of France to be Germany yeah and I use this as a as a classifier which we're gonna use later on to determine whether a city is French or not okay yes so first we're
gonna take all only Germany in France and then we need feature vectors for all the amenities so we have the counts of each amenity in each city and then we want the label to be zero for Germany and one for France this is like our target vector and our feature vectors and now we we separate our data into testing and training so we can see if our model is working on other data as well and to the model we're gonna use it's gonna be logistic regression which is commonly used for classification but it's it's basically a regression between two classes so if you have in our case France and Germany this would give us a continuous value between these classes so if we give him a feature like let's say the freezer from a feature from Salzburg we give him the feature vector from Salzburg and then he he can calculate probability that it's French or German and this is gonna be useful for later this is fairly simple like you can use skycat scikit-learn again with peyten and you train the model and then we have some we receive our scores so for training we have almost 90% accuracy for testing is 82 that's it's not good but it should be enough for our case and let's see what's the most French Austrian city okay and any guesses so far it's true yeah ladies love the most but it's it's one of the most dear there are not many cities but okay I'll show you know Salzburg is actually the least French city so the most French city is Linz I will get to it why this might be the case and yeah Vienna is apparently somehow French might be the accent there okay let's do a visualization of the map of the most French cities so we have our data set from swish Swiss cities from French cities Austrian cities and German cities and we want to use that data to to so we want to use that data to visualize how French they are so we're gonna use our French net score so what we're gonna do is we're gonna use the logistic regression we trained before so we have our classifier and there we have this function called predict probability and there you can say okay how much is the probability that it's Class A or how much is the probability that it's Class B for Class B is French so we take one for French and we calculate this for each feature vector in our whole data set then we have to normalize the Frenchness and then we're gonna use very very great library called volume I actually learned it today so it didn't work immediately but it worked it's using leaflet GS so it's basically a web visualization framework that relation library and it's it's very easy to set up so you just have to import it and then you have a map and you can already display what's the only thing which is now different is that we're gonna use a color map from matplotlib which we gonna convert to hex hex format in color and this one gun we're gonna use this as a circle marker for each point and we are gonna have a name for each city when we click on it and then we have the fill color which we have from the map and this is our map so the bloom or dark blue is our the blue is German and the purple is French and you can see here Linz is like the most French of all cities even more French than the French yeah so the reason why I think it might be is that when I was looking at Lin's Lin's has the highest density of all cities in parking spots so as Jakob said before OpenStreetMap is by public you done by public users so if some user is like very ambitious and he wants to dedicate all his time to mark every single parking spot in Linz which they did like they had squares for each parking spot in Linz and you you might have these outliers and that's my such L my my suspicion that it what's the reason for having this high Frenchness here yeah and you can see you can interactively take a look at it and you can like look I have pop-ups and everything and yes so this was if visualization doesn't work and yes this was it [Applause]