Bestand wählen
Merken

Data Science with OpenStreetMap

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
so hello guys my talk is gonna be about
data science with OpenStreetMap so I'll continue the things Jakob told you and give you some other flavors to it so first as Jacob said there are different amenities that are different tags that you can use for example one tag that is common is the
amenity tag so in OpenStreetMap as you said as you saw before you have key value pairs and for each key you can have some value and one very common key is the amenity key and as you can see there are a lot of things like beer garden cafe barbecue bar a parking spot whatever and you can explore these tags in a site called tag info there you have like a statistic of the whole of street map data how many buildings are there how many highways are there how many walls are no and yes so how do you load this data you can use tag info as an API you can download for example in our case the key to emanate amenity key and you can download this with the API and you can see like the most most used amenity key is parking then place of worship school bench and so on and if we visualize these tags we can see like most amenities are parking spots then you have like place of worship school bands restaurant and then it drops off and as Jacob said apparently I'm a guru
and of in overpass API so I'm going to show you how to download overpass you have an API which you can access with their own query language of course you can do that also with Python and this is a fairly simple example where we we want query which is this is the code blog you saw before and over in the overpass turbo which was before the the GUI interface and you can see here we want a JSON output we want inside Austria this is admin level is the definition of the boundary and then you use this for searching all amenities restaurants so we can see oh and then we want to count all of them and you can see where is it so we have eleven thousand nine hundred and ten restaurants in Austria in OpenStreetMap and with this data so first we've taken so we know what kind of tags we want with opens overpass API we know how to download the data and now we want to store the data so for this I'm going to use post GIS post J's is a special database extension for Postgres SQL it offers various spatial types like geometry and geography you have various spatial indexes like archery KD tree and quad tree for faster search and also you have special functions no yeah this is like the most interesting part about post post keys that you have various different functions which you can apply in your search so you can do like SQL query like you would always do and you would add something like okay I want a length of this line or I want the area of this area or you want the x value of this point and you can do much more complicated things like you want a point inside an area or you want an intersection of areas you can even do things like Voronoi and Dylan a and everything so creating a table and post pieces like you would you do in normal SQL the only difference is that you have this new type as we saw before the geography type and you can insert it in in the same way but in this case you would use the the some of these special functions which we had and these would take something like the well-known text format and now that we have the data somewhere stored how do we process this data so there's a neat library called you pandas which is basically just pandas with another column for geometry and it you can use everything that you can use in pandas you can use this there as well and it uses Shapley for the geometric operations it uses Fiona for the file axis and for visualization it needs this big heart and not plot lip this is only important if you install it on Windows because most of these things don't work properly there so you would need to install each of them by by by hands okay yes so how do we load this Lotus you have again the same thing for peyten you have a great
library called psych-up g2 and you can connect to there and you can get the connection and then you can simply use this function from post keys to load all the the data and in this case we have a data set of all amenities in Austria with their amenity tag with their States and the geometry and some other made up some other attributes and we can take this data and visualize them so this would be all the amenities in Austria you can see the Alps somewhere here and like most of it is collected and in the North yeah and you have also you have very easy functions for job for projection so in when you want to project from the lat/long projection to some special projection for austria only so in this case we would have a projection which is in mid meters and you can see the the scale is Anitra's there as well so what are the
most common amenities in Salzburg first we have to transfer more data so you saw the the data before so we have just each tag and each state and so we want to have for each state's the tag so we regroup we do a group by as we will do in SQL and then we count rosette indexes just to to make it work and then you do a pivot and you transform it in such a way that you have the keys and amenities as columns and rows its columns and indices and when we visualize it the most common amenity is bench if you remember before the most common amenity in the world is parking and apparently in Salzburg there's not enough of it yeah okay so when you we have all our
data we we know how to work with it what what can we do with it so let's let's ask some interesting questions like what is the most French city ok so we want to use simple data science techniques to determine whether a city is French so how would we do that so first we I prepare the data set for all amenities in France in Austria Swiss and Germany and I'm gonna use Germany and France to determine which one is more France so I took like the opposite of France to be Germany yeah and I use this as a as a classifier which we're gonna use later on to determine whether a city is French or not okay yes so first we're
gonna take all only Germany in France and then we need feature vectors for all the amenities so we have the counts of each amenity in each city and then we want the label to be zero for Germany and one for France this is like our target vector and our feature vectors and now we we separate our data into testing and training so we can see if our model is working on other data as well and to the model we're gonna use it's gonna be logistic regression which is commonly used for classification but it's it's basically a regression between two classes so if you have in our case France and Germany this would give us a continuous value between these classes so if we give him a feature like let's say the freezer from a feature from Salzburg we give him the feature vector from Salzburg and then he he can calculate probability that it's French or German and this is gonna be useful for later this is fairly simple like you can use skycat scikit-learn again with peyten and you train the model and then we have some we receive our scores so for training we have almost 90% accuracy for testing is 82 that's it's not good but it should be enough for our case and let's see what's the most French Austrian city okay and any guesses so far it's true yeah ladies love the most but it's it's one of the most dear there are not many cities but okay I'll show you know Salzburg is actually the least French city so the most French city is Linz I will get to it why this might be the case and yeah Vienna is apparently somehow French might be the accent there okay let's do a visualization of the map of the most French cities so we have our data set from swish Swiss cities from French cities Austrian cities and German cities and we want to use that data to to so we want to use that data to visualize how French they are so we're gonna use our French net score so what we're gonna do is we're gonna use the logistic regression we trained before so we have our classifier and there we have this function called predict probability and there you can say okay how much is the probability that it's Class A or how much is the probability that it's Class B for Class B is French so we take one for French and we calculate this for each feature vector in our whole data set then we have to normalize the Frenchness and then we're gonna use very very great library called volume I actually learned it today so it didn't work immediately but it worked it's using leaflet GS so it's basically a web visualization framework that relation library and it's it's very easy to set up so you just have to import it and then you have a map and you can already display what's the only thing which is now different is that we're gonna use a color map from matplotlib which we gonna convert to hex hex format in color and this one gun we're gonna use this as a circle marker for each point and we are gonna have a name for each city when we click on it and then we have the fill color which we have from the map and this is our map so the bloom or dark blue is our the blue is German and the purple is French and you can see here Linz is like the most French of all cities even more French than the French yeah so the reason why I think it might be is that when I was looking at Lin's Lin's has the highest density of all cities in parking spots so as Jakob said before OpenStreetMap is by public you done by public users so if some user is like very ambitious and he wants to dedicate all his time to mark every single parking spot in Linz which they did like they had squares for each parking spot in Linz and you you might have these outliers and that's my such L my my suspicion that it what's the reason for having this high Frenchness here yeah and you can see you can interactively take a look at it and you can like look I have pop-ups and everything and yes so this was if visualization doesn't work and yes this was it [Applause]
Offene Menge
Amenable Gruppe
Attributierte Grammatik
Mittelwert
Retrievalsprache
Punkt
Prozess <Physik>
Web log
Umweltinformatik
Kartesische Koordinaten
Information
Übergang
Netzwerktopologie
Spezielle Funktion
Typentheorie
Gruppe <Mathematik>
Bildschirmfenster
Visualisierung
Punkt
Flächeninhalt
Quick-Sort
Gerade
Schnittstelle
Funktion <Mathematik>
Automatische Indexierung
Distributionstheorie
Lineares Funktional
Nichtlinearer Operator
Dicke
Statistik
Schlüsselverwaltung
Amenable Gruppe
Schreib-Lese-Kopf
Datenhaltung
Gebäude <Mathematik>
Element <Gruppentheorie>
Abfrage
Turbo-Code
Dateiformat
Knotenmenge
Rendering
Randwert
Funktion <Mathematik>
Wurzel <Mathematik>
Automatische Indexierung
Einheit <Mathematik>
Benutzerführung
Information
Versionsverwaltung
Schlüsselverwaltung
Lipschitz-Bedingung
Tabelle <Informatik>
Web Site
Subtraktion
Maßerweiterung
Dienst <Informatik>
Räumliche Anordnung
Code
Homepage
Datenhaltung
Open Source
Ganze Zahl
Datentyp
Zählen
Programmbibliothek
Retrievalsprache
Operations Research
Maßerweiterung
Eins
Tabelle <Informatik>
Varianz
Elektronische Publikation
Endogene Variable
Mapping <Computergraphik>
Viereck
Rahmenproblem
Flächeninhalt
Voronoi-Diagramm
Mereologie
Attributierte Grammatik
Räumliche Anordnung
Binäre Relation
Visualisierung
Mini-Disc
Innerer Punkt
Tabelle <Informatik>
Einfach zusammenhängender Raum
Lineares Funktional
Zentrische Streckung
Jensen-Maß
Amenable Gruppe
Punktgruppe
Pivot-Operation
Räumliche Anordnung
Datensatz
Menge
Automatische Indexierung
Prozess <Informatik>
Zählen
Programmbibliothek
Meter
Plot <Graphische Darstellung>
Räumliche Anordnung
Projektive Ebene
Indexberechnung
Schlüsselverwaltung
Aggregatzustand
Attributierte Grammatik
Webforum
Punkt
Logistische Verteilung
Summengleichung
Zählen
Last
Softwaretest
Wellenpaket
Lineare Regression
Code
Visualisierung
Rechenschieber
Mapping <Computergraphik>
Radius
Lambda-Kalkül
Softwaretest
Lineares Funktional
Amenable Gruppe
Prognostik
Dateiformat
Dichte <Physik>
Ausreißer <Statistik>
Menge
Dateiformat
Eindeutigkeit
Programmbibliothek
Wellenpaket
Decodierung
Klasse <Mathematik>
Framework <Informatik>
Datensichtgerät
Unendlichkeit
Informationsmodellierung
Benutzerbeteiligung
Zufallszahlen
Programmbibliothek
Logistische Verteilung
Spezifisches Volumen
Lineare Regression
Kreisfläche
Sechsecknetz
Vektorgraphik
Relativitätstheorie
Datenmodell
Einfache Genauigkeit
Vektorraum
Ausgleichsrechnung
Hochdruck
Mapping <Computergraphik>
Rahmenproblem
Quadratzahl
Räumliche Anordnung
Kantenfärbung
Visualisierung
Offene Menge

Metadaten

Formale Metadaten

Titel Data Science with OpenStreetMap
Serientitel Maptime Salzburg
Autor Janakiev, Nikolai
Mitwirkende Miksch, Jakob
Lizenz CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/38510
Herausgeber Maptime Salzburg
Erscheinungsjahr 2018
Sprache Englisch
Produktionsjahr 2018
Produktionsort Salzburg

Inhaltliche Metadaten

Fachgebiet Geowissenschaften / Geographie, Informatik
Schlagwörter OpenStreetMap
Open Data
Data Science

Zugehöriges Material

Ähnliche Filme

Loading...
Feedback