Logo TIB AV-Portal Logo TIB AV-Portal

Visualizing uncertainty in data

Video in TIB AV-Portal: Visualizing uncertainty in data

Formal Metadata

Visualizing uncertainty in data
Title of Series
Part Number
Number of Parts
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
A talk about data quality, how it is understood and if visualization can improve the understanding of data quality. A lot of focus has been put on data quality and methods of accuracy assessment. Most of these methods are however statistical. The focus here is on how users and producers view uncertainty and a view into what is the current reality especially relating to the statistics that are presented. A research based section deals with uncertainty perceptions specifically in South Africa but also related to international literature. A tool (QGIS plugin) for uncertainty visualization in continuous raster datasets is also shown. Finally there is a brief demonstration of how visualization can aid in showing the results of uncertainty in data that is put through a model. Thus giving a visual example of the power of visualization.
goodness Meeting/Interview time student Results
Computer animation Universal statements Average Sum sun Kommensurabilität
statistics Actions study graded Kappa part Datenqualität measures structured data different matrix factor information errors report unit response study Kappa analysis bits statistics cognitive sample Computer animation predictive input data types matrix
flow evaluation Computer animation response communication fitness
man unit statistics response administrations projects statistics powerful Datenqualität powerful cognitive report Computer animation visualisation basis communication visualisation spaces ideal model report fitness visual spaces
area curve statistics sources Content machine part image processes Computer animation visualisation testing Windows
point plugin complex Free study projects experts coma Datenqualität programs workloads wave Computer animation visualisation Fingerabdrücke
point statistics files states outliers views shape drop Coloured vision fields regular different box framework model CAMS user interfaces man response standards point bits median means processes Computer animation raster framework sort objects Stammdaten
man statistics table Computer animation visualisation configuration different cloud platforms attributes table powerful attributes
meter area point standards standards statistics Statechart high resolution outliers moment statistics image Computer animation visualisation different testing testing utilizes Jump
point conic statistics study outliers sets water help Datenqualität van production degree evaluation communication different configuration naturally visualisation contrast input man area Maßstab plugin response validation views point experts statistics data management user guide Computer animation visualisation communication Sum
man statistics information directions maximal completion Arm van category Computer animation Normierte Räume Now cloud Sum
area point overlay mid statistics presentation distribution scale link files high resolution polygons feedback coma distances limitations part Kommensurabilität fraction Development Kit Meeting/Interview
Computer animation
10 but this'll of
GERD 3rd time he not exactly in education and the different but I think it's a topic that everyone knows about a few people know how to deal with it and I think no 1 mentions that when they present results to the people pay them so that scientists going to speak about uncertainty in data they can get good afternoon everyone so my name's surrender from some
emotionlessly doing my Masters firstly i'd like to thank my supervisor Mrs. munge and as well as my university and lastly this other African National Research Foundation
so targeted assume I think therefore I have it's statement most of us know and but I'll ever do we know the full statement to be to be too categorical data use I doubt therefore I think therefore I am basically what this is saying within doubt we actually find what we know we can only know how much we know and how well we know it when we start doubting what we know scientists is applied to
spatial data well basically we all use spatial data in some form or other that how much do we know about the data how much do we die out of date enough to know what we can do with our data and if we don't data we should all know that in accuracy is always a part of spatial data we just have to know about it and learn how to deal with it
so what is uncertainty I'm going to sometimes use uncertainty in data quality quality slightly interchangeably that make agency is that it's when inaccuracies known it is error when it is not known it becomes uncertainty however long Li et al also in 2005 defined it as the difference between a dataset and the phenomena that it represents finally sure and stated that it's uncertainty is really a fuzzy concept we haven't actually we still haven't uncertain about what we mean when we say uncertainty so carrying on normally how's data quality measured normally we have our statistics such as kappa confusion matrix pharmacy all mean every chair basically these are statistics they global statistics for dataset when you produce something you produce statistical quality assessment you put it into an accuracy and no 1 ever reach that puts beyond so a little bit has been done about this Duke how much of these statistics to users understand especially those users referred to in the previous chapter so the half found in his study that experience plays a big role in our uncertainty is perceived more experienced users might be more cognizant animal shall I say nervous about using that data and also provide is often don't really provide that could analysis of the data because sometimes they just don't want you to know about a dataset is and finally take my in a similar study also found that 20 . 5 % of his study respondents paid no attention to statistics at all and some companies just decided they my if a supervisor looks over the data looks good must be it's going so my study was done new with South African users but was a reasonably small study group by only 63 participants so what I found was nearly everyone is cognizant of uncertainty and in data type the datasets on perfect however only 60 % of these users and sometimes produces of datasets as well look at the quality of reports then only 36 per cent of those that produce value-added data so they take a piece of data they add extra value to it and then they pass it on to to look at the quality of the data so if they don't know how good the input data is how can they really say much about the up-to-date so when at the
broad statement out in a question of how do you feel about 80 % certainty in the dataset just a general question about 60 . 53 % said it depends on the data on the purpose of the data will get to that again about 29 % said they feel comfortable using this data and another 18 % said they completed reject the data which also means that you really understand data well enough the so to put
it further when asked how does uncertainty managed in their normal flow about 59 % said they can try to improve the dataset and communicate uncertainty however only 47 % of these people actually looked at the accuracy reports so how can you comment about accuracy if you don't know how good your dataset is that using or how can you prove it if you don't know what you improving there another 18 % said they look if it's fit for purpose however 36 % of this 18 % also don't look at an accuracy assessment so how do you know if it's fit for your purpose if you don't know how good the dataset is so to African users
and producers understand statistics I don't really know but I don't think so so many do not ask for an accuracy assessment so they assume the data is good enough must be good enough I got it from someone and some even assume 90 % + accuracy on dataset just because they got a from someone else an example is someone that said that cadastral data this year they said they could national data is 100 % accurate but it's not really because no matter how good your dataset is you're going to miss something some projection is going to move something no dataset is 100 % accurate because we just cannot really model of the world in a perfect way so
how should you use quality reports well quality reports are meant to inform and how how good the dataset really is so you can use it to check if it's fit for purpose the idea would be that each dataset is evaluated and as a subject checked for for if it's fit for purpose report should space way all involved are cognizant of what datasets can and cannot be used for and where these improvements are required so only if you know how good your dataset is or how bad it is can you really improve so then administration was kind visualization Baden communicating data quality well visualization has been found to trump texts in appeal and communication power by previous researchers such as possible and so and fears and more than 70 % of those respondents that said they don't look at the quality Port Said and visualization would actually help and that might actually look at the therefore realization can aid in bringing the understanding of data quality to a whole new audience so some
existing visualization tools that exist is Ahvaz and McEachern that was basically boat made for creating process that this testing and soil nitrogen content so as you can see the 1st image of the blue represents areas of high certainty and the red represents areas that were measured as nitrogen-containing but less certainty the end of the year is just the whole area measured and this over here is just the area of high certainty so that's 1 method and most users said that that is 1 of the best methods that's easily understood other things
solutions sources such as angular I must add that the previous solution is not available anymore it's very outdated this however is available that's angular part of PC Austin a couple of problems of come across with that is it's if using a Windows machine it's really hard to install with a lot of dependency is also the learning curve to using this is quite Steve however provides very good statistics and so very powerful tool and the other
1 is unsettling which was supposed to be a wave program that was funded by the European Commission from just February 2010 to Jan 2013 have a lot of work was done about this however project seemed to have fallen flat with the funding this again shows that visualization and data quality of really important aspects if the European Commission would find that How that's another fellow project to other tools to mention is that 1 BioNLP-ST 2000 and 2015 however that isn't available openly as well as by found but as say again that's not openly available to everyone so problems with these
solutions are they are outdated most of them aren't openly available the complex to install and not really user-friendly so developed a Jew just plug it to initialize uncertainty that's only for continuous trust the data at this point and the purpose what is the purpose of it is to bring a visual aspect to data quality to easily transferred data quality and user said they would prefer if you just plug in like for arches according to Alberti study as this would easily integrate with the current workload so what you just well it's easy to use and now the expert knowledge is needed to install a dependence freely available to everyone so this
is a bit of a framework for most of my tools should assets you have your answer that you want to evaluate you have the shape or with points the shape file with the points can either contain reference values or just fat points we want to test so if you don't have reference values in U shape valued you'd have another reference trusting the statistics will get calculated on chat again about the statistics the shameful will be created loaded into and then you can choose either color vision impaired style or a regular vision this is
basically what the user interface looks like up top you have your dataset to be tested on the side you can see if you take the data is discrete and not continuous you get this little warning at the bottom saying that this tool want to that what it will do is give you a binary response saying yes that's correct at this point will no it's not so that you can't use the statistics they'll be just grab which basically say carrying on you have a shape all of points you can select use the shape file for reference points and you get a little drop down to select your field with your reference data which you don't think that you'll get your difference process and then you get your browse to where you want to save the file and then you get these 2 boxes basically the same thing the first one is just an overall view which is all the statistics show but together and visualized objects breaks the 2nd 1 is as it's for which is a statistical deviation from the mean as well as the modified its for which just takes the median and median absolute deviation to account for outliers in the data and then the ideal states will is sort of the datasets standard deviation taken however model to an ideal which has a deviation of 0 so basically the 1st the z-scored modified states was was a sort of testing a dataset against its own quality so this is
basically what you'll get out to you when you press a OK in Puget Power just take
you through a couple of options this is you will get just statistics in your attribute tables so you can open it you can investigate the actual difference and all your statistics that gets calculated well calculus doing the visualization these all get put into that attribute tables so you can investigate the and this
is just a test dataset stretched on going to show you guys which often through the tools so the mean absolute area is 2 . 4 6 meters this is a 5 is need to resolution digital elevation model by the way and the standard deviation is 2 . 2 9 meters the 90th percentile is 5 . 2 5 meters which means that moment 90 % of the data will fall below that the difference will be below that so but that is a statistic not often mentioned which is interesting because you will get a statistics such as you of the point the 6 all your image of 2 . 4 6 and you'll think that is good but as you see 5 . 2 5 is quite a jump from that so
finally the overall utilization which is when I put that through and the overall visualization shared the areas up top on the right corner that those are outliers you also have over years and over there but that is an area where most of it will be above the point the 6 and some of them even above when
using the options available the difference also show the sum of the area interestingly the score when measured against itself only without accounting for outliers and showed the least areas with outliers however it also highlighted the same thing same little cluster 1 of these actually highlighted the same cluster as being problematic so basically what this is doing is it's giving you the statistics you can you should have your statistics but this is almost trying to deal with the problem with statistics being that spatial data is spatial in nature and you can't really have 1 statistic for the whole dataset not representing the basically the of the data at every point so what does
the user get from UV while the simple to install it you just plug in no expert knowledge is used to install or around that's freely available to all 1 I up like this and a visual tool to understand data quality statistics and the visualization as well so I've also some users about this tool and how good it is and basically the response was overwhelmingly yes it does provide a better understanding of the statistics the creation of while the visualization degrade the quality of the day the perceived quality of the data was that mostly to Germany's professionals no it went to the lady man yes that might but this is basically goes together with water for are found in his study that produces stated might the you the perceived value of the product so finally can visualization help in management of uncertainty and communication they of that was also marginally quality UGS so some shortcomings of the tool is more statistical options are required that documentation on methods and the statistics is needed and this some still issues as well as that it needs an intern but can't to an internal validation of a data set of needs another secondary data with reference points and more contrast between some of the
categories this is 1 of the shortcomings that ephedra it's basically just in about 10 which gives them a whole lot of information about how the tool works and the statistics and then
finally to because uncertainty is present in all datasets whether we like it or not statistics is used to communicate this uncertainty however is not always very well understood so basically we often don't out of data enough to know how good our data is and then finally this tool is not complete efforts step in that direction and I say for future tools perhaps the cloud would be a better or web-based or cloud-based solution would be better since GIS is heading a lot in the cloud direction thank you thank you for this
interesting presentation do we have any questions fractions remarks and just the 4 of the body than this and then the winner would you mind the user feedback of a recent developed using it very by Gina the degraded the very perceive individuals not this could you but the goal of what I mean is when you produce a dataset and you put have the overlay of the uncertainty and that these areas that might be better at these areas that might be was is the end user might look at this and while this status it doesn't look as good as I think it is so I don't like your data can do it over basically just didn't perceive your data as as well as if you just gave a statistic and the statistical good or sounds good near the questions reactions hello them when they are applying this ship file with points and history a lower limit to how many points you can do bloedel I guess the uncertainty realize some island points you have and currently there is a lower limit of the which is really bad which is 1 of the limitations which I mentioned as the scale issue because the amount of points and the distribution between the points also affects the scale of which you can actually say anything about it so I was thinking of adding maybe something checks the distance between the furthest distance and just giving another pop-up saying this is based we can say from this basically about this but I should probably apply it in the mid to so maybe pay kilometer see need x amount of points of depending what quality over but resolution your dataset financial the question link to that how do you go from the point to roster with uncertainty everywhere but uses it to the 3rd point from that point you having a polygon that's the Voronoi part polygons which is basically taking every point the area that is closest to that point will be linked to that point which is also relates to some of the scale problems any other questions remarks modern thank you very much sense