An open risk index with learning indicators from OSM-tags, developed by machine learning and trained with the WorldRiskIndex
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 295 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/43567 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
FOSS4G Bucharest 201917 / 295
15
20
28
32
37
38
39
40
41
42
43
44
46
48
52
54
57
69
72
75
83
85
87
88
101
103
105
106
108
111
114
119
122
123
126
129
130
131
132
137
139
140
141
142
143
144
147
148
149
155
157
159
163
166
170
171
179
189
191
192
193
194
195
196
197
202
207
212
213
214
215
216
231
235
251
252
263
287
00:00
Pressure volume diagramMachine learningPlanningMachine learningUniverse (mathematics)Price indexWave packetLevel (video gaming)Speech synthesisWord2 (number)Multiplication signOpen setForcing (mathematics)Personal digital assistantGoodness of fitComputer animation
00:40
Pressure volume diagramMachine learningMultiplication signPersonal digital assistantUniverse (mathematics)Different (Kate Ryan album)Graph (mathematics)outputComputer fontProjective planeAttribute grammarWordVulnerability (computing)Flow separationAngleSoftware developerLine (geometry)Lecture/Conference
02:57
Computer fontProcess (computing)Vulnerability (computing)Event horizonSource codeSlide ruleConnectivity (graph theory)Component-based software engineeringSphereNatural numberAdaptive behaviorChannel capacityBit rateGenderParity (mathematics)Integrated development environmentForestData managementVolumeGroup actionMathematicsExtreme programmingProduct (business)Solid geometryFinitary relationPower (physics)State of matterService (economics)NumberOnline helpFamilyDynamic random-access memoryScaling (geometry)Open setLevel (video gaming)Vulnerability (computing)Graph (mathematics)Perspective (visual)BuildingImplementationMeasurementTerm (mathematics)Different (Kate Ryan album)Price indexConnectivity (graph theory)Software developerChannel capacityLikelihood functionWordInformationSound effectCASE <Informatik>Family of setsMathematical analysisSource codeSystem administratorSpacetimePoint (geometry)Adaptive behaviorSet (mathematics)Hazard (2005 film)BitCalculationOrder of magnitudePattern languageWebsiteEndliche ModelltheorieFrequencyDialectDecision theoryReduction of orderDenial-of-service attackDimensional analysisDiagram
07:55
Vulnerability (computing)Pressure volume diagramRandom numberData modelForestMachine learningPolygonVertex (graph theory)Data conversionKey (cryptography)Graph (mathematics)Open setAttribute grammarPrice indexSet (mathematics)Key (cryptography)Level (video gaming)Scaling (geometry)WordComputer fontMultiplication signData structureMathematicsVirtual machineSource codeComplete metric spaceCalculationDevice driverInformationLink (knot theory)Vulnerability (computing)Different (Kate Ryan album)DatabaseMoore's lawComputer fileConnectivity (graph theory)Computer animationProgram flowchart
10:26
Key (cryptography)Price indexPolygonLine (geometry)CASE <Informatik>Set (mathematics)Moore's lawAreaCounting2 (number)ApproximationLecture/Conference
11:52
Process (computing)DatabasePressure volume diagramVarianceCross-correlationThomas KuhnMachine learningLeast squaresLinear regressionNetwork topologyRandom numberForestPrincipal component analysisWiener filterGlattheit <Mathematik>FrequencyEndliche ModelltheorieLevel (video gaming)Least squaresData structureLinear regressionKey (cryptography)Price indexSubsetCross-correlationWave packetDecision tree learningResultantDimensional analysisNonlinear systemStatisticsReduction of orderConnectivity (graph theory)Set (mathematics)Graph (mathematics)Arithmetic meanPairwise comparisonAlgorithmPerspective (visual)Software testingMathematical analysisVulnerability (computing)LinearizationVirtual machinePoint (geometry)CountingPrincipal component analysisForestRandomizationOpen setNetwork topologyDisk read-and-write headOffice suiteDesign by contractMultiplication signJSONXML
15:51
Channel capacityIntegrated development environmentGenderWiener filterGlattheit <Mathematik>FrequencyForestMachine learningRandom numberView (database)Endliche ModelltheorieLevel (video gaming)WordIntegrated development environmentWebsiteOpen setResultantConnectivity (graph theory)Price indexKey (cryptography)Slide ruleVulnerability (computing)
16:20
DiameterSet (mathematics)Channel capacityIntegrated development environmentScaling (geometry)Key (cryptography)Price indexType theoryStatisticsView (database)Endliche ModelltheorieSocial classHypermedia
17:38
SummierbarkeitVulnerability (computing)ForestRandomizationOpen setEndliche ModelltheorieArrow of timeCategory of beingVulnerability (computing)PredictabilityDifferent (Kate Ryan album)Channel capacityConnectivity (graph theory)Level (video gaming)Social classSound effectMessage passingWordGastropod shellForcing (mathematics)BitSet (mathematics)Computer animation
19:36
Bridging (networking)Machine learningPrincipal component analysisJunction (traffic)Glattheit <Mathematik>Control flowNatural numberVulnerability (computing)Connectivity (graph theory)Mathematical analysisSet (mathematics)Similarity (geometry)VarianceWordComputer animation
20:09
Mathematical analysisElectric currentGroup actionCountingResultantLevel (video gaming)Endliche ModelltheorieMultiplication signBitWordFreewareOpen setTwitterVulnerability (computing)
20:31
Vulnerability (computing)StatisticsDatabasePressure volume diagramComputerInformationInheritance (object-oriented programming)Data modelCharacteristic polynomialDimensional analysisChannel capacityScale (map)SpacetimeScalabilityTerm (mathematics)MeasurementAdaptive behaviorEndliche ModelltheorieCountingResultantLevel (video gaming)Open setWordScalabilityVulnerability (computing)Validity (statistics)Presentation of a groupMultiplication signComputer animation
21:56
Open setExtreme programmingMathematical analysisNumberVulnerability (computing)Level (video gaming)CASE <Informatik>MereologyNormal (geometry)Cross-correlationKey (cryptography)Event horizonNeuroinformatikValidity (statistics)State of matterDialectPoint (geometry)Reduction of orderSet (mathematics)FreewareResultantArithmetic mean40 (number)Centralizer and normalizerMultiplication signGraph (mathematics)Lecture/Conference
Transcript: English(auto-generated)
00:09
Okay, good afternoon again, so time for the second talk and yes, feel free to start. Second talk again about OpenStreetMap, an open risk index with learning indicators
00:24
from OSM tags developed by machine learning and trained with the World Risk Index. Thank you very much Marco, and also thank you very much for the opportunity to give this short speech here today. My name is Daniel, I'm working at the University of Stuttgart at the Institute of Spatial
00:42
and Regional Planning as a research assistant and also lecturer. And at the institute we haven't used any FOSS tools before, so now my colleague, which convinced me last year actually to come for the first time to the FOSS4G,
01:01
and so now we're trying to include it in our teaching and get it more established at the university and our institute. So basically that's the reason we are here to get new inputs and also to present something and the idea behind this little side project was
01:21
to get a justification to come here that our professor would pay for our travel expenses and for us to get some new inputs. So we had to come up with something and yeah, that's what you're going to see today. I think now looking at the title it seems not so much comprehensive
01:40
to be honest in my own words. I hope that I can clear it up in the next 20 minutes, at least a little bit. So what is the idea in a nutshell? It's basically vulnerability is a spatial concept with spatial attributes and looking
02:04
at it from different angles somehow the idea was somehow it should be included in OpenStreetMap, I had no clue how and I definitely had no clue how to get there but somehow this idea came up last year during several inputs so on
02:21
and so that was basically the starting question or the starting idea where we got started somehow. And we thought it's fun to do that in some free time and didn't realize how much work it could be and it was actually. Okay, so just a brief outline about my talk as we are today,
02:44
not a disaster risk conference so I first start with some background of the World Risk Index, the idea development to understand what is the idea of a socioeconomic vulnerability because I assume you're all not very familiar
03:00
with this concept and then the second step to how to derive with OpenStreetMap vulnerability index on a global scale. So what is disaster risk? You can basically say it consists of three components.
03:22
We have the hazard component, the exposure and the vulnerability. If we now look on global climate models it's clear that the hazard side increase in magnitude and frequency and also the exposure is going to increase of course. Looking at the vulnerability it's a little bit more interesting and just looking
03:46
on a global scale is not enough because globally we could say it stays more or less the same, slight reduces. But the interesting question, how is it in space and time?
04:00
So somehow we need to measure it, we have to look at it on different scales and different levels and time and space and that's just some analysis done by our institute of the World Risk Index looking from 2012 to 17 at the vulnerability and its development and it's clearly showing that the regions worldwide are developing differently and in light
04:26
of climate change adaptation the question is can we detect decision points or how we can be counter actors developments as we clearly having rising exposure,
04:40
rising hazard, we don't want to stick at the same vulnerability level because that means just we're going to increase losses. So the question is here how to detect those patterns and maybe explain it to find measures to counter act on different scales, administrative scales and different levels so it's a huge need to have scalable,
05:06
understandable indicators or composite measurements. One approach quite accepted quite or let's say a little bit different
05:20
but similar to the informed risk index from the JRC developed by Professor Bergman was the World Risk Index trying to focus on explaining natural hazards so in this case floods, sea level rise, storms, floods, earthquakes that's combined to index
05:40
of how many people of each country are annually affected and the vulnerability component. So the question what is socioeconomic vulnerability it's basically here in this concept described as three main dimension the susceptibility so in the short term
06:02
if you have to impact what is the likelihood of suffering then the coping capacity so in a short term perspective to avoid or to reduce negative consequences but also look at the adaptive capacity so is the society able in the long term
06:24
to include new measures to avoid to adapt to different stressors, different levels like research, building codes, implementation so it's quite looking not only at the phenomena so because basically if a flood happens where nobody is it doesn't matter really
06:44
so we don't mind so the question is really here the human perspective of explaining it. Now looking into more into detail like zooming into this vulnerability the World Risk Index
07:01
consists of 23 indicators like I said susceptibility, coping and adaptive capacity and again some subtopics and I was involved in updating this risk index by different governmental sources and it was a lot of fun and realizing that yeah the indicator sometimes the calculation was changed
07:27
countries were dropped out different years only missing years so combining different sources different country names yeah you name it whatever you want you know data you know how different data sources look
07:42
and moreover on a global scale these global data sets are interesting to go through and so doing that the idea okay basically I thought there is a global data set
08:05
combining a lot of those information we have seen so infrastructure all the socio-economic component and that's the open treatment database it has a global coverage of course not the same coverage in each country but having a lot of social attributes, cultural attributes even what we were missing in the World Risk Index
08:27
hospital, school so I thought just looking at these teams in both data sources somehow they should be connected or somehow it should be possible to interchange and like derive a robust calculation based on open street map to develop also this vulnerability
08:44
but I had no clue really how to do it so four basic steps each step posing some challenge of its own was the final outline we wanted to go through or we had to go through
09:03
so first of all how to derive indicators from the open street map database we just have seen the keys and tags and key value pairs and yeah different spelling of the same topic how you cope with that how you cope this anarchy somehow how you can combine it to valid indicators
09:26
and on a global scale so secondly of course working with data you somehow have to clean them up to structure them for completeness, robustness and in a third step the idea was to use machine learning
09:44
so to understand to unveil the curtain to get this linkage from open street map to vulnerability once in a supervised way but also in an unsupervised way which I briefly going to explain so that we finally could have some global map for vulnerability entirely based on open street map data and indicator
10:14
coming to the first step the indicators we downloaded the full blended file we just heard about it
10:23
reduced the text with a white list to somehow make it possible to work with our computational power we had available at the institute converted all notes to clips, imported to postgis
10:41
and in the first step we removed text with less than 20 countries so finally we ended up with two data sets and they were per country just the counts once summed up for the keys so not key value just the keys how often such count of key per country
11:04
and the second data set we worked from now on was the tag data set so key value pairs as indicators also count per country so in case of lines and polygons not the areas to simplify it because it just would have exceeded the computational effort we would have needed
11:25
so it was a huge simplification in a first approach to say okay we counted per country normalize it with the population which is more or less not perfect approximation so we tried to adapt with it and the next step this data set now we had was for text
11:53
basically 170 countries with 110 columns each presenting one key as an indicator
12:02
and for the text it was 1342 columns also presenting indicators with counts per country so from the statistical perspective we said okay to have a global meaning
12:21
we want to have that it covers at least 50 percent of the countries then indicators with zero and near zero variants were removed and also we did a last step pairwise correlation to say okay for statistical analysis
12:40
we want to remove two highly correlated indicator pairs so we ended up with two data sets reduced and having around 30 keys and roughly around 100 tags for each country so we could get started with the analysis
13:08
the idea was to use a linear based model and a non-linear model in order we really didn't know how vulnerability is it like some linear relation non-linear
13:22
so the idea was to take lasso regression as a dimension reduction method compared to rich regression which is not reducing the dimension we choose lasso because we wanted to get a subset which is understandable and usable and in between it was a minor step to somehow better understand the data with a regression tree
13:45
and continue to the random forest and like I briefly mentioned on an explanatory level just look at the 20 countries with the highest vulnerability and understand the underlying structure of the data of the open street map
14:07
so sub select the 20 countries and analyze with principal component analysis what are the major components within those highly vulnerable countries so I started first with the lasso regression
14:24
it was quite some struggle to get the model running and was quite happy on a Saturday evening that finally after a day the model ran through and the result was basically dead
14:40
so instead of having a nice map and analyzing going into detail my model kicked out all predictors and that was not what I had expected and I didn't want to present here really to say okay sorry guys it was a nice idea but basically I failed and yeah all predictors were deleted and I needed a beer
15:05
and I took the sunday at least definitely off and I couldn't see it anymore so somehow I tried to understand this results and where they come from and it came to my understanding at least for the data at this point
15:21
that just like linearity the assumption of linearity does not reflect at all so even using a nice machine learning lasso regression algorithm with test and training data and bootstrapping everything in it yeah you just still can get something like that it just does not work because the assumption is not right
15:46
so I tried my luck with random forest and I got some results luckily for me although they were also interesting I know here put on this slide side to side
16:02
that's the world risk index with its components and subcomponents and those are the most important keys of open street map in the model for modeling vulnerability I think it's quite interesting to see that like some topics are related quite directly
16:24
in a first view if you look yeah land use we have the environment with a lot of environmental global indicators in it infrastructure and all the investments so we have a lot of here traffic signals, traffic calming
16:42
and interesting was also that we got like fire hydrant diameter in it and as something you wouldn't get out of official global data sets but clearly related to vulnerability and from the others you could assume some understanding
17:02
of like economic capacity is measured with GDP and you could assume but I don't want to do it now because it would need more statistical justification but yeah traffic calming could be like related to GDP somehow so the question is here really now to go more into detail
17:23
and explain which facets are really explained by each type the same I did with the keys and then I mapped both models on a global scale and actually they were quite similar roughly
17:40
around 70% of the countries were correct classified and the majority was just by one class wrong basically the United States voted them out themselves during the last election somehow no seriously I don't know what happened we lost it during our modeling somewhere on the way
18:03
it should be there, it was there it got lost, it happens sometimes so we now see the same map based on the text and again like saying the prediction quality of the random forest model
18:20
of the category of the country is roughly 70% and including like one class up or down it's even higher so looking at the countries with the highest arrow looking at Malaysia, Yemen, Seychelles and Cuba
18:40
so the world risk index set category 3 the random forest model set 1 so it's an underestimation of the vulnerability for Yemen the same and for Seychelles is turned around I looked it up and I concluded basically comparing it
19:03
the open street map model with the world risk index that actually the model model closer the susceptibility component than the adaptive or the coping capacity and I thought that's quite interesting to see that like the immediate effect of vulnerability is like
19:22
better represented in the open street map data and that made somehow sense to me so that somehow I could explain this difference is still open to analyze just going really briefly to the analysis of the
19:42
20 countries most or highest vulnerability with PCA the analysis was to set to like correct for 70% of the variance within the data set we had seven components and again just like briefly
20:03
looking at the topics there is a similarity to the components of the world risk index what is quite interesting I just basically mentioned that already at the beginning and as I'm running a little bit out of time I want briefly to say
20:23
okay why do we need it we want to monitor it we want to measure to detect trends and avoid or decrease vulnerability what is our conclusion in three words the first results showed that like with the simplification of counts per country
20:43
there is some way of modeling vulnerability and to monitor vulnerability based on open street map but on the other hand where to go now we have first results and of course the second step is to increase transparency
21:01
robustness and validity of the model and I think for that we would need or going to also not only look at the vulnerability but also look at the subcomponents to better explain which linkages exist between certain aspects like adaptive or infrastructure and open street map
21:22
so from here the scalability was with this approach reduced and increased the robustness of the model now so thank you very much and I'm opening the questions now I didn't prepare any questions because I didn't want to guide you
21:41
haven't heard to me now for a while and I want to be open for anything what comes up thank you thank you Daniel for the very interesting presentation any questions from the audience?
22:06
I have one so you said at the beginning you showed that in your analysis you left outside some tags, some keys and values actually that were not appearing in all the countries so my question is did you ask whether you may have left outside
22:25
something important for the computation of the index because for example I didn't see, maybe I missed it but I didn't see the highway tag which might be of course useful for the computation of the index yeah thanks for the comment
22:43
I think that's quite an important remark and comment on it the idea was to entirely do it computationally so without interfering and without bringing in my own assumptions so that I get like surprising results
23:04
but I didn't want to, like I also found ways of selecting tags and keys first but somehow I felt like if I would do that I would bring in my assumption into it and I wanted to make the model free of my own suggestions what is related to vulnerability
23:22
and so that was in the first step the reason not to do it on the other hand like you said exactly the tag reduction by country and also the correlation analysis I think would be needed to check now in the second step to say okay maybe we should include some tags in further detail
23:43
which now were left out so combine it yeah thanks that's quite correct thank you the most vulnerable countries generally the countries with less data
24:03
so how did you consider that in your computing? another excellent question we considered it but not considered it to be honest here we normalized by the population
24:23
but of course we know that open street map coverage does not fit this number of people in a country so we thought about using some other way of normalization to use like the coverage of the country or so
24:42
but we couldn't, we tried several things but couldn't come up with a good solution actually so that's the reason we still have to figure out to adjust for that better yes one short question
25:04
so do you think this could be used if you know now you will kind of prove that this can be done so but it was kind of correlated with the other data set so just to prove that it kind of works but if you wanted to go forward to receive new data do you think it would be possible to run that on regions
25:22
inside the countries like states to get new data? two good points and I think like I would answer both I think for the part of future new data it would be really interesting to now look into some empirical validation approaches
25:44
like global extreme event data sets and combine it and look there how you could explain the vulnerability or not really with real data so real events so that's like the case of how to include or train the index
26:01
to adapt to new events in the future I think that's an interesting approach and secondly we also thought about how it could be better scalable and to work on sub-regional level sub-national level and if you're interested in that then just stay for the next talk please
26:22
thanks again Daniel for this interesting talk and this very promising work so we now have some minutes for the audience to move between the rooms and in the meantime I ask the next speaker to prepare thank you