We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

An open risk index with learning indicators from OSM-tags, developed by machine learning and trained with the WorldRiskIndex

00:00

Formal Metadata

Title
An open risk index with learning indicators from OSM-tags, developed by machine learning and trained with the WorldRiskIndex
Title of Series
Number of Parts
295
Author
Contributors
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Developing learning crowed based spatio-temporal indicators to model the components of the WorldRiskIndex based on OSM tags and machine learning Climate change is already reality in many parts of the world and even more threatening our future well-being. The SDG 1.5 explicitly aims to reduce by 2030 the vulnerability and exposure to climate related hazards. The World Risk Index (WRI) is one well-respected approach in profiling countries risk to natural hazard. To effectively monitor development and detect decision knots on the climate resilience pathway (IPCC 2014) data of high resolution in space and time about the worlds countries is of urgent importance. Hence, the core of this work is the development of learning indicators. Learning in the sense of a methodological approach combining PostGIS for data management, R for statistical learning and QGIS for spatial analysis on crowd based information assessing the OSM-database and addressing the need of societal learning in the face of severe climate change. The World Risk Index (Birkmann et al. 2015) will guide the supervised learning part resulting in an indicator set derived from OSM tags, establishing on one hand an open risk index and adding deep explanatory power to its components by a qualitative discussion of the OSM themes. The second part explores with unsupervised algorithms the inherent characteristic of country groups classified by the open risk index and deduces common patterns of socio-economic vulnerability but also societal resilience. Hence, the inherent challenge of this work is to substitute existing static indicators with new dynamic indicators, but not only substituting them but also painting a more detailed picture. Moreover, new data sources still questioned often by their reliability compared to World Bank or census data, and therefore its opportunities are neglected instead of critically exploring the potential. Therefore, this thorough statistical approach in quantifying uncertainty contributes to the acceptance and hence use of crowd based information adding necessary reliability for policy and planning. This unique combination is not yet done and bares huge potential moreover united with the open source geo community to contribute a little piece of the puzzle for achieving the SDG 1.5.
Keywords
129
131
137
139
Thumbnail
28:17
Pressure volume diagramMachine learningPlanningMachine learningUniverse (mathematics)Price indexWave packetLevel (video gaming)Speech synthesisWord2 (number)Multiplication signOpen setForcing (mathematics)Personal digital assistantGoodness of fitComputer animation
Pressure volume diagramMachine learningMultiplication signPersonal digital assistantUniverse (mathematics)Different (Kate Ryan album)Graph (mathematics)outputComputer fontProjective planeAttribute grammarWordVulnerability (computing)Flow separationAngleSoftware developerLine (geometry)Lecture/Conference
Computer fontProcess (computing)Vulnerability (computing)Event horizonSource codeSlide ruleConnectivity (graph theory)Component-based software engineeringSphereNatural numberAdaptive behaviorChannel capacityBit rateGenderParity (mathematics)Integrated development environmentForestData managementVolumeGroup actionMathematicsExtreme programmingProduct (business)Solid geometryFinitary relationPower (physics)State of matterService (economics)NumberOnline helpFamilyDynamic random-access memoryScaling (geometry)Open setLevel (video gaming)Vulnerability (computing)Graph (mathematics)Perspective (visual)BuildingImplementationMeasurementTerm (mathematics)Different (Kate Ryan album)Price indexConnectivity (graph theory)Software developerChannel capacityLikelihood functionWordInformationSound effectCASE <Informatik>Family of setsMathematical analysisSource codeSystem administratorSpacetimePoint (geometry)Adaptive behaviorSet (mathematics)Hazard (2005 film)BitCalculationOrder of magnitudePattern languageWebsiteEndliche ModelltheorieFrequencyDialectDecision theoryReduction of orderDenial-of-service attackDimensional analysisDiagram
Vulnerability (computing)Pressure volume diagramRandom numberData modelForestMachine learningPolygonVertex (graph theory)Data conversionKey (cryptography)Graph (mathematics)Open setAttribute grammarPrice indexSet (mathematics)Key (cryptography)Level (video gaming)Scaling (geometry)WordComputer fontMultiplication signData structureMathematicsVirtual machineSource codeComplete metric spaceCalculationDevice driverInformationLink (knot theory)Vulnerability (computing)Different (Kate Ryan album)DatabaseMoore's lawComputer fileConnectivity (graph theory)Computer animationProgram flowchart
Key (cryptography)Price indexPolygonLine (geometry)CASE <Informatik>Set (mathematics)Moore's lawAreaCounting2 (number)ApproximationLecture/Conference
Process (computing)DatabasePressure volume diagramVarianceCross-correlationThomas KuhnMachine learningLeast squaresLinear regressionNetwork topologyRandom numberForestPrincipal component analysisWiener filterGlattheit <Mathematik>FrequencyEndliche ModelltheorieLevel (video gaming)Least squaresData structureLinear regressionKey (cryptography)Price indexSubsetCross-correlationWave packetDecision tree learningResultantDimensional analysisNonlinear systemStatisticsReduction of orderConnectivity (graph theory)Set (mathematics)Graph (mathematics)Arithmetic meanPairwise comparisonAlgorithmPerspective (visual)Software testingMathematical analysisVulnerability (computing)LinearizationVirtual machinePoint (geometry)CountingPrincipal component analysisForestRandomizationOpen setNetwork topologyDisk read-and-write headOffice suiteDesign by contractMultiplication signJSONXML
Channel capacityIntegrated development environmentGenderWiener filterGlattheit <Mathematik>FrequencyForestMachine learningRandom numberView (database)Endliche ModelltheorieLevel (video gaming)WordIntegrated development environmentWebsiteOpen setResultantConnectivity (graph theory)Price indexKey (cryptography)Slide ruleVulnerability (computing)
DiameterSet (mathematics)Channel capacityIntegrated development environmentScaling (geometry)Key (cryptography)Price indexType theoryStatisticsView (database)Endliche ModelltheorieSocial classHypermedia
SummierbarkeitVulnerability (computing)ForestRandomizationOpen setEndliche ModelltheorieArrow of timeCategory of beingVulnerability (computing)PredictabilityDifferent (Kate Ryan album)Channel capacityConnectivity (graph theory)Level (video gaming)Social classSound effectMessage passingWordGastropod shellForcing (mathematics)BitSet (mathematics)Computer animation
Bridging (networking)Machine learningPrincipal component analysisJunction (traffic)Glattheit <Mathematik>Control flowNatural numberVulnerability (computing)Connectivity (graph theory)Mathematical analysisSet (mathematics)Similarity (geometry)VarianceWordComputer animation
Mathematical analysisElectric currentGroup actionCountingResultantLevel (video gaming)Endliche ModelltheorieMultiplication signBitWordFreewareOpen setTwitterVulnerability (computing)
Vulnerability (computing)StatisticsDatabasePressure volume diagramComputerInformationInheritance (object-oriented programming)Data modelCharacteristic polynomialDimensional analysisChannel capacityScale (map)SpacetimeScalabilityTerm (mathematics)MeasurementAdaptive behaviorEndliche ModelltheorieCountingResultantLevel (video gaming)Open setWordScalabilityVulnerability (computing)Validity (statistics)Presentation of a groupMultiplication signComputer animation
Open setExtreme programmingMathematical analysisNumberVulnerability (computing)Level (video gaming)CASE <Informatik>MereologyNormal (geometry)Cross-correlationKey (cryptography)Event horizonNeuroinformatikValidity (statistics)State of matterDialectPoint (geometry)Reduction of orderSet (mathematics)FreewareResultantArithmetic mean40 (number)Centralizer and normalizerMultiplication signGraph (mathematics)Lecture/Conference
Transcript: English(auto-generated)
Okay, good afternoon again, so time for the second talk and yes, feel free to start. Second talk again about OpenStreetMap, an open risk index with learning indicators
from OSM tags developed by machine learning and trained with the World Risk Index. Thank you very much Marco, and also thank you very much for the opportunity to give this short speech here today. My name is Daniel, I'm working at the University of Stuttgart at the Institute of Spatial
and Regional Planning as a research assistant and also lecturer. And at the institute we haven't used any FOSS tools before, so now my colleague, which convinced me last year actually to come for the first time to the FOSS4G,
and so now we're trying to include it in our teaching and get it more established at the university and our institute. So basically that's the reason we are here to get new inputs and also to present something and the idea behind this little side project was
to get a justification to come here that our professor would pay for our travel expenses and for us to get some new inputs. So we had to come up with something and yeah, that's what you're going to see today. I think now looking at the title it seems not so much comprehensive
to be honest in my own words. I hope that I can clear it up in the next 20 minutes, at least a little bit. So what is the idea in a nutshell? It's basically vulnerability is a spatial concept with spatial attributes and looking
at it from different angles somehow the idea was somehow it should be included in OpenStreetMap, I had no clue how and I definitely had no clue how to get there but somehow this idea came up last year during several inputs so on
and so that was basically the starting question or the starting idea where we got started somehow. And we thought it's fun to do that in some free time and didn't realize how much work it could be and it was actually. Okay, so just a brief outline about my talk as we are today,
not a disaster risk conference so I first start with some background of the World Risk Index, the idea development to understand what is the idea of a socioeconomic vulnerability because I assume you're all not very familiar
with this concept and then the second step to how to derive with OpenStreetMap vulnerability index on a global scale. So what is disaster risk? You can basically say it consists of three components.
We have the hazard component, the exposure and the vulnerability. If we now look on global climate models it's clear that the hazard side increase in magnitude and frequency and also the exposure is going to increase of course. Looking at the vulnerability it's a little bit more interesting and just looking
on a global scale is not enough because globally we could say it stays more or less the same, slight reduces. But the interesting question, how is it in space and time?
So somehow we need to measure it, we have to look at it on different scales and different levels and time and space and that's just some analysis done by our institute of the World Risk Index looking from 2012 to 17 at the vulnerability and its development and it's clearly showing that the regions worldwide are developing differently and in light
of climate change adaptation the question is can we detect decision points or how we can be counter actors developments as we clearly having rising exposure,
rising hazard, we don't want to stick at the same vulnerability level because that means just we're going to increase losses. So the question is here how to detect those patterns and maybe explain it to find measures to counter act on different scales, administrative scales and different levels so it's a huge need to have scalable,
understandable indicators or composite measurements. One approach quite accepted quite or let's say a little bit different
but similar to the informed risk index from the JRC developed by Professor Bergman was the World Risk Index trying to focus on explaining natural hazards so in this case floods, sea level rise, storms, floods, earthquakes that's combined to index
of how many people of each country are annually affected and the vulnerability component. So the question what is socioeconomic vulnerability it's basically here in this concept described as three main dimension the susceptibility so in the short term
if you have to impact what is the likelihood of suffering then the coping capacity so in a short term perspective to avoid or to reduce negative consequences but also look at the adaptive capacity so is the society able in the long term
to include new measures to avoid to adapt to different stressors, different levels like research, building codes, implementation so it's quite looking not only at the phenomena so because basically if a flood happens where nobody is it doesn't matter really
so we don't mind so the question is really here the human perspective of explaining it. Now looking into more into detail like zooming into this vulnerability the World Risk Index
consists of 23 indicators like I said susceptibility, coping and adaptive capacity and again some subtopics and I was involved in updating this risk index by different governmental sources and it was a lot of fun and realizing that yeah the indicator sometimes the calculation was changed
countries were dropped out different years only missing years so combining different sources different country names yeah you name it whatever you want you know data you know how different data sources look
and moreover on a global scale these global data sets are interesting to go through and so doing that the idea okay basically I thought there is a global data set
combining a lot of those information we have seen so infrastructure all the socio-economic component and that's the open treatment database it has a global coverage of course not the same coverage in each country but having a lot of social attributes, cultural attributes even what we were missing in the World Risk Index
hospital, school so I thought just looking at these teams in both data sources somehow they should be connected or somehow it should be possible to interchange and like derive a robust calculation based on open street map to develop also this vulnerability
but I had no clue really how to do it so four basic steps each step posing some challenge of its own was the final outline we wanted to go through or we had to go through
so first of all how to derive indicators from the open street map database we just have seen the keys and tags and key value pairs and yeah different spelling of the same topic how you cope with that how you cope this anarchy somehow how you can combine it to valid indicators
and on a global scale so secondly of course working with data you somehow have to clean them up to structure them for completeness, robustness and in a third step the idea was to use machine learning
so to understand to unveil the curtain to get this linkage from open street map to vulnerability once in a supervised way but also in an unsupervised way which I briefly going to explain so that we finally could have some global map for vulnerability entirely based on open street map data and indicator
coming to the first step the indicators we downloaded the full blended file we just heard about it
reduced the text with a white list to somehow make it possible to work with our computational power we had available at the institute converted all notes to clips, imported to postgis
and in the first step we removed text with less than 20 countries so finally we ended up with two data sets and they were per country just the counts once summed up for the keys so not key value just the keys how often such count of key per country
and the second data set we worked from now on was the tag data set so key value pairs as indicators also count per country so in case of lines and polygons not the areas to simplify it because it just would have exceeded the computational effort we would have needed
so it was a huge simplification in a first approach to say okay we counted per country normalize it with the population which is more or less not perfect approximation so we tried to adapt with it and the next step this data set now we had was for text
basically 170 countries with 110 columns each presenting one key as an indicator
and for the text it was 1342 columns also presenting indicators with counts per country so from the statistical perspective we said okay to have a global meaning
we want to have that it covers at least 50 percent of the countries then indicators with zero and near zero variants were removed and also we did a last step pairwise correlation to say okay for statistical analysis
we want to remove two highly correlated indicator pairs so we ended up with two data sets reduced and having around 30 keys and roughly around 100 tags for each country so we could get started with the analysis
the idea was to use a linear based model and a non-linear model in order we really didn't know how vulnerability is it like some linear relation non-linear
so the idea was to take lasso regression as a dimension reduction method compared to rich regression which is not reducing the dimension we choose lasso because we wanted to get a subset which is understandable and usable and in between it was a minor step to somehow better understand the data with a regression tree
and continue to the random forest and like I briefly mentioned on an explanatory level just look at the 20 countries with the highest vulnerability and understand the underlying structure of the data of the open street map
so sub select the 20 countries and analyze with principal component analysis what are the major components within those highly vulnerable countries so I started first with the lasso regression
it was quite some struggle to get the model running and was quite happy on a Saturday evening that finally after a day the model ran through and the result was basically dead
so instead of having a nice map and analyzing going into detail my model kicked out all predictors and that was not what I had expected and I didn't want to present here really to say okay sorry guys it was a nice idea but basically I failed and yeah all predictors were deleted and I needed a beer
and I took the sunday at least definitely off and I couldn't see it anymore so somehow I tried to understand this results and where they come from and it came to my understanding at least for the data at this point
that just like linearity the assumption of linearity does not reflect at all so even using a nice machine learning lasso regression algorithm with test and training data and bootstrapping everything in it yeah you just still can get something like that it just does not work because the assumption is not right
so I tried my luck with random forest and I got some results luckily for me although they were also interesting I know here put on this slide side to side
that's the world risk index with its components and subcomponents and those are the most important keys of open street map in the model for modeling vulnerability I think it's quite interesting to see that like some topics are related quite directly
in a first view if you look yeah land use we have the environment with a lot of environmental global indicators in it infrastructure and all the investments so we have a lot of here traffic signals, traffic calming
and interesting was also that we got like fire hydrant diameter in it and as something you wouldn't get out of official global data sets but clearly related to vulnerability and from the others you could assume some understanding
of like economic capacity is measured with GDP and you could assume but I don't want to do it now because it would need more statistical justification but yeah traffic calming could be like related to GDP somehow so the question is here really now to go more into detail
and explain which facets are really explained by each type the same I did with the keys and then I mapped both models on a global scale and actually they were quite similar roughly
around 70% of the countries were correct classified and the majority was just by one class wrong basically the United States voted them out themselves during the last election somehow no seriously I don't know what happened we lost it during our modeling somewhere on the way
it should be there, it was there it got lost, it happens sometimes so we now see the same map based on the text and again like saying the prediction quality of the random forest model
of the category of the country is roughly 70% and including like one class up or down it's even higher so looking at the countries with the highest arrow looking at Malaysia, Yemen, Seychelles and Cuba
so the world risk index set category 3 the random forest model set 1 so it's an underestimation of the vulnerability for Yemen the same and for Seychelles is turned around I looked it up and I concluded basically comparing it
the open street map model with the world risk index that actually the model model closer the susceptibility component than the adaptive or the coping capacity and I thought that's quite interesting to see that like the immediate effect of vulnerability is like
better represented in the open street map data and that made somehow sense to me so that somehow I could explain this difference is still open to analyze just going really briefly to the analysis of the
20 countries most or highest vulnerability with PCA the analysis was to set to like correct for 70% of the variance within the data set we had seven components and again just like briefly
looking at the topics there is a similarity to the components of the world risk index what is quite interesting I just basically mentioned that already at the beginning and as I'm running a little bit out of time I want briefly to say
okay why do we need it we want to monitor it we want to measure to detect trends and avoid or decrease vulnerability what is our conclusion in three words the first results showed that like with the simplification of counts per country
there is some way of modeling vulnerability and to monitor vulnerability based on open street map but on the other hand where to go now we have first results and of course the second step is to increase transparency
robustness and validity of the model and I think for that we would need or going to also not only look at the vulnerability but also look at the subcomponents to better explain which linkages exist between certain aspects like adaptive or infrastructure and open street map
so from here the scalability was with this approach reduced and increased the robustness of the model now so thank you very much and I'm opening the questions now I didn't prepare any questions because I didn't want to guide you
haven't heard to me now for a while and I want to be open for anything what comes up thank you thank you Daniel for the very interesting presentation any questions from the audience?
I have one so you said at the beginning you showed that in your analysis you left outside some tags, some keys and values actually that were not appearing in all the countries so my question is did you ask whether you may have left outside
something important for the computation of the index because for example I didn't see, maybe I missed it but I didn't see the highway tag which might be of course useful for the computation of the index yeah thanks for the comment
I think that's quite an important remark and comment on it the idea was to entirely do it computationally so without interfering and without bringing in my own assumptions so that I get like surprising results
but I didn't want to, like I also found ways of selecting tags and keys first but somehow I felt like if I would do that I would bring in my assumption into it and I wanted to make the model free of my own suggestions what is related to vulnerability
and so that was in the first step the reason not to do it on the other hand like you said exactly the tag reduction by country and also the correlation analysis I think would be needed to check now in the second step to say okay maybe we should include some tags in further detail
which now were left out so combine it yeah thanks that's quite correct thank you the most vulnerable countries generally the countries with less data
so how did you consider that in your computing? another excellent question we considered it but not considered it to be honest here we normalized by the population
but of course we know that open street map coverage does not fit this number of people in a country so we thought about using some other way of normalization to use like the coverage of the country or so
but we couldn't, we tried several things but couldn't come up with a good solution actually so that's the reason we still have to figure out to adjust for that better yes one short question
so do you think this could be used if you know now you will kind of prove that this can be done so but it was kind of correlated with the other data set so just to prove that it kind of works but if you wanted to go forward to receive new data do you think it would be possible to run that on regions
inside the countries like states to get new data? two good points and I think like I would answer both I think for the part of future new data it would be really interesting to now look into some empirical validation approaches
like global extreme event data sets and combine it and look there how you could explain the vulnerability or not really with real data so real events so that's like the case of how to include or train the index
to adapt to new events in the future I think that's an interesting approach and secondly we also thought about how it could be better scalable and to work on sub-regional level sub-national level and if you're interested in that then just stay for the next talk please
thanks again Daniel for this interesting talk and this very promising work so we now have some minutes for the audience to move between the rooms and in the meantime I ask the next speaker to prepare thank you