Data Science mit OpenStreetMap
Formal Metadata
Title 
Data Science mit OpenStreetMap

Title of Series  
Author 

License 
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. 
Identifiers 

Publisher 

Release Date 
2019

Language 
German

Content Metadata
Subject Area  
Abstract 
Data Science ist ein populäres Schlagwort, das schon vielerlei Bereiche befallen hat, nicht zuletzt die Welt der Geoinformatik. Hier geht es darum, wie man gängige Methoden von Data Science auf OpenStreetMap Daten mithilfe von OpenSourceWerkzeugen anwenden kann und daraus neue Einblicke erzeugen kann.

Related Material
00:00
Inference
00:31
EXCEL
00:55
Datentransformation
Visualization (computer graphics)
Data analysis
02:06
Mobile app
Statistics
02:50
Geometry
API
Version <Informatik>
Visualization (computer graphics)
06:52
Data analysis
09:48
Linear regression
Artificial neural network
10:26
Print <4>
Data model
Linear regression
Linear regression
Visualization (computer graphics)
Row (database)
SuperLearn
Multiplication
13:41
knoX
HTML
Visualization (computer graphics)
Display
Lead
14:55
SLIP <Programmiersprache>
Lucas sequence
Menu (computing)
15:41
YES <Computer>
Linear regression
Display
16:11
Linear regression
16:40
Linear regression
KDE
Network switching subsystem
17:04
Linear regression
Software repository
22:14
Inference
00:09
violent we look at a thing the speech is still new 1 so over creates here in the broadest sense science about the geodata and there There are also established ones methods and nicola jana animals explained us now how to use these methods too can apply to openstreetmap data hello right behind nico from salzburg and
00:34
Now I hear something over openstreetmap I hope you understand a dialect now returns in the beginning is always always start with a joke when you do that
00:53
have done just now to the overview
00:56
the first step would be times now I I'll show how to collect data can with short question who of you uses everything python or know it with dolls out ok it will be a bit Code inside is exactly the next one step and how can you get the data then transform how can you do that analyze data and in the explorer explorer data nazis and then at the end if one then the results with a song lab visualize exactly to the data crisis yes open students like very huge database there is everything possible what you could analyze what you wanted now could do that in my life In the case of the Facilities could be mine tips you probably like from openstreetmap knows the data is as key value per it in the attributes saved and the kiddy no longer looked at the manager was so for the facilities and there is everything Possible barbara beer garden restaurant and somebody like that
02:00
then picked out the data analyze exactly and what data namely now that day in front
02:10
The app used for the app So the most popular at the end in obhut you look in general
02:18
just so that then knows which me should use right there is that first one would be at the park against place to work bench woolrich what time are you from smoke is the region I work with So I want the end is in So I want to count certain regions know how many restaurants in one see how many bars see how many bars beer gardens whatever and first I have then city â€‹â€‹regions chose the then from open stream
02:54
Believe again and they will have it than in those I have the amanit counted what I also used was the night regions of the euro states So from the statistical office of
03:06
European Union all the way to classify regions now becoming finer and finer different levels are now not necessarily federal states or special regions are more like that thought that you have some range of population in the regions has exactly and The first way I collected data because the europacity is one very special album very special can put queries to openstreetmap this works very well because had we has as one the the android on which he accesses and there you can equal the source with the requests You can then jump directly to library of piles call to make so in the case like have Jason the region is germany and i want everything with the manager is the same restaurant in the region and we are then the number rausgeben so in Germany we now have 74,000 restaurants approximately as a note exactly one Another way I used was if we date with geo fabrik download or from openstreetmap Direct government downloads with then can one then the data at the computer for example, you analyze the post yourself gravel and another tool you used to do that was osmium or posthumously and there it works exactly the same application with this time Liechtenstein and there we have class which one then the then over the whole openstreetmap arrow goes through and then it targets for every a4 note then it counts the energy where restaurants exactly and in liechtenstein there are a lot less restaurants that are 45 exactly how do we do the data I did that with geo done differently that's basically the case of Peter Frey pandas or in the harvest and that works that way we have then an extra column with the geometry inside the whole then uses harmful to the geometric operations fiona for data access and then the ktm and stayed for that visualize at this line 2 in windows it was often a problem that you had the one there every single one of the four manually
05:38
but can now install it's relatively easy with a counter
05:43
and as with the counter imagine there it works relatively well with the Most geo packages are very fast example so you can geodaten right easy from post keys with geoplan islam that works like this has a connector over two corp since we need you a connector for database with the safest passwords then you can go directly to query sql as pick up and then can you go back directly as a data step get and with the geometry in the Now , that's all you all are facilities in Austria and the directly you can then project the same and visualize in one step so you can directly to adjust the crs oh you can visualize it directly with matlab and then has the most features the mine marketplace live as well It can be pretty much everything but for very special things recommend the card just under sugar the facilities in Europe will come up with an idea
06:55
that the population density is very strong correlates with the facilities that we'll be useful later on that you know that right there is an example like that
07:06
Institution in Saxony are so one ladder leipzig looks pretty good and dresden is fine and chemnitz is one Tucked away in the surrounding area cities right now we come to data analysis of the following time that is About halfway or ahead of split about 80 percent of the work if we the whole data should have to clean up has to filter and everything possible and now comes the exciting one
07:30
part where you really things from the find out the data In that case were now the
07:35
most common facility in saxony there it is the first loose with restaurant fastfood doctor coffee and so on now as a comparative now only had the establishment in salzburg and there is one First place parking would not be me
07:51
noticed that there is so much because the people still whine day went but Apparently there is the most parking and then only benches so on the you can sit and then restaurant is accurate and now at data science So there is always some question
08:09
so do not answer that one you pull out any connection it can not just be the data analyze and any looking at correlations that 's nice in the beginning than the one question is dedicate and then the answer as possible and a free one Diemer had the fun was the what the French state from the german state or european state in general and vivi do you play that and how could you do that now classify that one says okay one now has a number of features so the number of restaurants or anything else and if you have this the signature of a city has as You can still classify them in the country if this is possible at all exactly and one of the basic functions the The basic method you use is the statistical classification that works about as you then on feature for every city in the case
09:11
you now only two features that would be it For example, the number of restaurants and now, for example, the Number of schools and then you can then say okay that's great 1 and class 2 and we are looking for this function the two classes best divided that is the classical one Machine learning that can be done with make pictures with a possible and one did not have to be chosen right now Of course there are many including machine learning that classic neural networks or the Planning but in the case I have quasi
09:44
the subclass taken from the whole what the logistic regression is that is one of the building blocks for the
09:49
neural networks, it is true regression but it is actually a Classification algorithm of the reason is one has now in the driver class 1
09:59
So only here and class 2 b 100 and then attempts to find the function the best of these two classes split and then we have percentage functions that one says okay how likely is that that Classes is 1 or that's class 2 and it's basically the regression behind it right now we come back to the store and preparatory data that works
10:27
just as we have seen before I have the data in that case directly cleaned as already mentioned I then normalized the data that I have them per capita so parking per head parking laundry per head etc and exactly and every class of them every one of each feature vector that would be it each line has then assigned alone that one day we will class that we want to qualify exactly and now that I'm the model in germany and france wants to train but now especially the two jerseys taken out and then we have out which the feature vectors for times the library prepares the associations means hitler shows and needs one certain format with what makes so the matrix needs for the fish Vector these six and then you need target vector that is then null and one as intended at the logistic regression that it is between the two classes can divide exactly and the next thing that is very important that the record in training and Testing itself for sharing that's why important that if you do that now Model trained on the training set Do you want to know if that's really true? generalize really works So I really learned something and that's why we have testing So it has a test where he has then can ok check that generalized model fits it with the holder data the he has never seen and she can then classify whether it fits exactly and this works very well is two lines code equal to the classic celebration is the logistic regression The result is then the training data and so the training data purely then you have learned in sky we have then it is always for all classic celebration neural networks or something else has you always have this fit function and there you suffer the training data purely just then you will see the dekra se okay for the training is just under 90 percent and at testing is at 83 percent okay the next who now it is like See how it works other cities how it looks other cities classified from what we saw that logistical regression he goes from zero to one So you can see it percent how would be Now if you have this model now to other cities for example in Austria or Switzerland applies Which cities are now more German more French then you know that model is to apply and have there we quasi francisco out there then we have the logistics directly
13:22
regression the probability of one logistic regression directly involved and then as the next step then the data visualization would be accurate The thing they did was use that folio that is so ripping off the library With an app that can create an app
13:43
It just works that way connects now he supplies slides there In that case I want a color map We have the whole city â€‹â€‹for every city cool and then I have the card made quite classic with the location where it appears with the xoom level then I go off by all So through all the features through and then I suppose for every robe then expects of this franchise not previously cording Cup then entitles the rgb value mappus, too, I value the hacks and then i can do that for any city then integrate fourth brand with the all with in addition also a pop up for the city and the whole can then html page saved are exactly and then it looks
14:39
exposed for example, you can do that in now integrate Jupiter directly or in one website and then you can go directly to aldi see technical cities like Classified are thus in the case the blue it is the turquoise blue is
14:56
then german and pink it is more paris france and, funnily, the According to the model is the most French City of Linz in Austria That is more or less reason for that model because there in openstreetmap there certain on latina For example, I have to pay attention to what I do discovered that in linz has existed
15:24
many parking lots are marked and in other cities for example not so strong and that's why I think me that there are certain features then more influence on the logistical region have exactly and the most german german
15:43
state is kassel apparently to dettingen yes, yes, I have it in advance too So tell about the euro city regions I have the same with the nuts regions because I have made a small delivery card
16:16
just the first thing you can see is
16:23
you can now see where per capita are the most parking garages you can see where it is in the most cafÃ©s pro head you can see where the Most bankers and then out of these I have managers then yes Logistic regression calculates exactly
16:41
since that would have exactly the same had me now Germany and France classified there to france and england classified and then england and
16:50
Germany classified and then sees one out of the ok like this with others regions and then you can do so others set things exactly
17:09
yes it made many thanks for this
17:23
interesting lecture what to do with openstreetmap data could do anything I think everyone has realized that that's totally upgradeable well the Everything is really trying too analyze I think there is also ask to so the first question the I was raised is actually there are yes two things mixed his is when Now , for example, these banks say it gives per capita or so then it is yes On the one hand, how many banks there are really in the country or in the city and the other question is how much of it are made that's exactly what it means These statements are now all mix these two things and I can on the one hand, something about the osm community testimony is perhaps even in the rooms hear my parking garages maps and spain someone who likes Banks respectively is in the Countries really have something different take a look that you have other data sources maybe they'll find the ones that allow them to calculate that out so if I am now for example, a really one a car door authoritative list of all have parking garages or other banks Of course I could with that, so to speak, how many percent are there noticed by osm they are complete and first if i have an indicator for that I make a wrong that I relatively completely the data is also noticed then I could really testify meet the differences between the countries and not just the difference between the men that 's what they left out, though But that's about it with the time was out that so much to wrap up was right there there is a quote from wittenstein the wittgensteins room where you say okay can a table with a ruler But you can measure with a table also measure the ruler and in that case you can see that there are so many measured by banks but that's exactly what you can say okay many openstreetmap have these banks measured and this is exactly the problem you have there specially to the data have now seen that at eur there is too generally have the hotel data no record found that now as detailed as openstreetmap there are more francs you always have to analysis now the logical repression had used the one reason why you used the you already had others classification algorithms like now For example, the learning tried or Does it have a special reason why you have made quite nice has the the main reason is that the data is right few are 1 1 a big warning for logistic regression what they are You usually do that not so the problem is that many of The data is very strongly correlated So unfortunately very strong correlation between each feature and that You should be very strong for mine logistic regression analysis exactly that would be the first and the second would be I was looking for an algorithm first time is easy to explain and Second, one of a percentage So back there I looked that you then use the model then can order other cities with it classify meters and not only okay it is french now or German to really leave one question in question yes one more question for the analysis you if I saw that correctly three different logistic models expected it would not be more useful all in one to stick model so multivariate that to calculate the model all together so it's that three single regression or one big one 19 31 so you could all be throw together or or you do this dare you know for example, I have it now simple way now only with one with two classes usually does used with two classes Thank you for this first lecture in the block openstreetmap thank you very much this entry about science Evaluation of fruit map data scientifically speaking [Applause]