We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Uncertainties in neural networks

00:00

Formal Metadata

Title
Uncertainties in neural networks
Title of Series
Number of Parts
6
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production PlaceJena

Content Metadata

Subject Area
Genre
Abstract
German Aerospace Center (DLR); Neural Networks; Uncertainties in neural network; Structuring Uncertainty in Stochastic Segmentation Networks; Interpretation as a Factor Model; Rotation on Flow-Probabilities; Factor-Wise Prediction Manipulation
Artificial neural networkPerspective (visual)Variety (linguistics)Computer animationMeeting/InterviewLecture/Conference
SpacetimeFood energyDigitizingInformation securityForschungszentrum RossendorfData analysisData managementDigital signal processingArtificial neural networkObject (grammar)RoboticsPersonal digital assistantRobotMathematical analysisError messageDistribution (mathematics)Data structureData modelDemosceneSatellitePoint cloudModel theorySatellitePoint cloudCovering spaceMultiplicationImage resolutionMeasurementMedical imagingArtificial neural networkVulnerability (computing)Distribution (mathematics)Source codeTask (computing)Cartesian coordinate systemType theoryInstance (computer science)Electronic data processingRoboticsWave packetGoodness of fitSocial classPredictabilityPoint (geometry)StochasticMathematical analysisProcedural programmingPixelInformationMultiplication signNoise (electronics)Graph coloringDifferent (Kate Ryan album)Error messageData structureData managementLinear regressionSpacetimeUniform resource locatorDigitizingSoftware testingData acquisitionData typeCircleDot productGreen's functionData analysisWater vaporMobile WebCurveoutputPersonal digital assistantLine (geometry)QuantificationMereologyShift operatorComputer animation
StochasticArtificial neural networkTask (computing)Geometric quantizationPixelPredictionDivisorData modelVariable (mathematics)Control flowArithmetic meanFaktorenanalyseDiagonalMathematical optimizationRepresentation (politics)Heat transferDataflowMathematical analysisRotationoutputComputer-generated imageryMoving averageExecution unitSocial classMedical imagingPredictabilityDistribution (mathematics)Sampling (statistics)DataflowMappingRankingArithmetic meanDiagonalDivisorMereologyPixelStudent's t-testLevel (video gaming)Representation (politics)RotationMatrix (mathematics)Normal distributionDiagonal matrixMathematical optimizationIdentity managementMultiplication signArtificial neural networkResultantLikelihood functionVisualization (computer graphics)Structural loadRow (database)StochasticMathematicsState of matterFunction (mathematics)Disk read-and-write headLine (geometry)outputModel theoryConnectivity (graph theory)Different (Kate Ryan album)Gamma functionTheoryField (computer science)OrthogonalityObject (grammar)Term (mathematics)Data structureGreen's functionHypothesisDependent and independent variablesInformationFaktorenanalysePoint (geometry)Cross-correlationGraph coloringNumberXMLComputer animation
HypothesisCollaborationismInformationRepresentation (politics)Computer virusRange (statistics)Perfect groupMultiplication signComputer animationLecture/Conference
DivisorData modelPredictionVariable (mathematics)Control flowDiagonalFaktorenanalyseArtificial neural networkDependent and independent variablesNoise (electronics)DivisorMeasurementConfidence intervalTask (computing)ResultantModel theoryTerm (mathematics)Cartesian coordinate systemPredictabilityCASE <Informatik>Computer animation
DivisorData modelMappingDifferent (Kate Ryan album)Function (mathematics)outputRange (statistics)PixelPredictabilityCoefficient of determinationClassical physicsLecture/Conference
DiagonalData modelDivisorPredictionVariable (mathematics)Control flowArithmetic meanFaktorenanalyseRotationResultantModel theoryPredictabilityExpert systemType theoryoutputCross-correlationPort scannerDistribution (mathematics)Artificial neural networkShift operatorPunktschätzungCellular automatonDependent and independent variablesScaling (geometry)Function (mathematics)Computer animation
PredictionDivisorBayesian networkMusical ensembleLecture/Conference
Computer animation
Transcript: English(auto-generated)
I'd like to say that the leading actors in this audience are not from the E.A.H.A. It is a very important institute, because I know that with that I'd like to say that the E.A.H.A. Institute is a very important one. And from the E.A.H.A. Institute, I'd like to introduce Jacob Gavlikovsky.
Of course, I'd like to say that this is a very important one. Jacob has worked with Frank Nussbaum for the last four years. And I'd like to introduce the industry perspective of Jacob Gavlikovsky.
My name is Jacob Gavlikovsky, and I work on uncertainties in neural networks. Sorry, it's in English. It was announced in English, so it will be in English.
I try to cover the whole variety of the audience, so it's also getting some very basic introductions to uncertainties in general in neural networks. But first, yeah, let's talk about the German Aerospace Center. So the German Aerospace Center is Germany's research center of aeronautics and space.
And it conducts research in aeronautics, space, energy, transport, security, and digitalization. And in Vienna also for the whole data science pipeline. It employs approximately 10,000 people at 30 different locations all over Germany. For the Institute of Data Science here in Vienna, our main tasks or the departments are separated into the data
acquisition and mobilization part, the data management and enrichment, and finally also the data analysis and intelligence where I'm located.
Now, neural networks. So neural networks are becoming more and more important in our daily life, obviously. So, for example, we have manufacturing and assistant robots. We have sensor fusion and analysis where a lot of information is combined automatically. And we have the whole healthcare sector where also a lot of deep learning is included.
And all of this, yeah, makes it really needed that these predictions are trustworthy. So you have to trust what your neural network tells you. And this again raises the question how certain are these predictions actually. And if we ask this question, we directly come to uncertainties in neural networks.
And when we talk about uncertainties in neural networks, we have to separate between two types of uncertainties. So the model or epistemic uncertainty and the data or aleatoric uncertainty. So this means uncertainty which is mainly because of the model and problems in the modeling
or problems in the data itself which cannot be solved. And, yeah, so for epistemic uncertainty, there are one obvious reason are errors in the model structure itself. So here, for example, on the right side, you see some training data which are given by these green dots,
some true model which is this blue curve and a very simple model given by this red line which is just a linear model. And obviously this model seems to be too simple to really be able to capture the data distribution. And so, yeah, there will be uncertainty in the prediction because, yeah, you cannot match the original model.
Then there could be also errors in the training procedure. So, yeah, again, you see your training data, you see the true model. But now the red curve looks really, really way too fitted. So this is called overfitting. If you just fit your training data too good such that you, again,
don't really match the true model which you actually want to estimate or approximate. And then the third point here would be the maybe for practical applications most important point, I think, the shifts in the data distribution. So whenever you use data for training and then you have to consider that this data should not change significantly when you apply it somewhere.
So here you can see the example where we have the green dots again as the training data. And now the learned model fits the training data pretty good. So at training time everything seems okay.
So you get very good predictions. But then at test time somehow the data completely shifts. So you get different inputs and the model does not fit at all. And in practice this can help you very easily. As, for example, yes, some work from my own work, yeah, clouds in satellite images where you could say
on the left side the task is to predict one class label for, yeah, for your satellite image here. And, well, it works. As you can see here the water has a really high bar which makes sense because, yeah, mostly all of the image is covered by water.
But then at test time you might have the same scene but with clouds. And the most reasonable thing to assume would be that you get maybe the same probability for all classes. So indicating uncertainty. But since the network was never trained on cloudy data but only this clear data, it does
not really know how to handle this and says with very high confidence, yeah, this is urban. The good point about epistemic uncertainty is that epistemic uncertainty is theoretically reducible. So you could fix your model, you could adjust your model, you could, yes, measure more data, you could try to cover your data distribution.
But why theoretical? Because, yeah, the shifts in the data distribution you can never really cover the whole data distribution in your training data. So you should always be careful about, yeah, where your model is applied.
Then the next source of uncertainty or type of uncertainty is the aleatory uncertainty. So one obvious part is noise or ambiguity in the data itself. So here's now a classification problem and we have class one and class two given as this green circles and yellow crosses.
And the task is just to separate them. So you get one point in here and should say is it a class one or class two. And what you see here is that at this region it looks like the data distribution is somehow overlapping. So points in here can't be really matched to one of these classes.
And this is then aleatory uncertainty because it's just in here so the problem as it is stated cannot really give you a certain prediction for this region here. On the other hand you could also have label noise which is also in practice a very common thing which says, okay, maybe it's clearly separable.
But here some of these points actually should be labeled as class one maybe and here on this side some points should be labeled as class two. So if this happens only a few times or not in a significant amount then everything is totally fine. But if this happens too much then the network learns uncertainty or gets uncertain and also the predictions will become uncertain.
And last but not least, we often have weak labels or multiple labels for individual instances. So here again you have some remote sensing images and the task is to predict a class of land cover for each pixel individually.
So each pixel should be given one of these colors. The colors indicate different classes of land cover. And the labels are given in a really weak annotation. So the resolution of these labels is clearly not the same resolution as the images. So the data at the labeling itself gives you some uncertainty where the network has to handle it.
And also this uncertainty will be represented in the final prediction. And this uncertainty is obviously not reducible so you have to deal with the data type or data source you have by hand.
If the labels are noisy then you could fix them but this would mean change the data. So these noisy labels are, so it's not reducible in a way that you mean with the data you have and the data process you have. You cannot reduce it so you would need to improve the quality of your data to really reduce aleatory uncertainty.
Okay, so far on what types of uncertainty we have. Now I'm going into a recent work and we just talked about him. Here he is again.
It's a joint work with Frank Nussbaum and it's about structuring uncertainty in stochastic segmentation networks. And what is this about? So for applications it's really useful to have uncertainty quantification I think. But at the same time it would be even more useful to really have the possibility to explain these uncertainties.
So if you can explore these distributions and see what kind of scenarios arise with our uncertainty distribution. Then at the same time for segmentation tasks we have pixel wise predictions and these pixels obviously somehow correlate with each other.
So if one pixel is class A then the neighboring pixel has a very high probability to be also class A. And the same holds for the aleatory uncertainty. And the third point, if you combine all of this then structuring uncertainty in stochastic segmentation network means that we want to represent correlated aleatory uncertainty in a more accessible way.
So we want a way to really be able to evaluate what uncertainty do we have and also visualize it and explore it. For this we worked with stochastic segmentation networks and here's a rough sketch how they work in general.
So we don't really have to go into the details but again we have our input image and the final output should be some segmentation task. Then we have a neural network where the input image goes in and a network head. But the network head here doesn't give you a single prediction for one class map but it gives you a distribution of class maps.
And this distribution is given as a low rank normal distribution. So what does this mean? Basically you have, as for a normal distribution you have your mean as a prediction of your neural network.
And then you also have a matrix called the loading matrix which has rank R which means it basically has R columns and a diagonal matrix. And then the prediction is given as the normal distribution by the loading matrix multiplied with its own transposed plus the diagonal.
And as you can see here for the examples now you get some kind of structured predictions at least. So if you would sample each point individually so if you wouldn't have this correlation among the pixels which is introduced based on this multiplication. Then each pixel would be sampled individually and you wouldn't have such a smooth prediction
where all this region here is yellow and here again all this region is yellow. But you would have a very cluttered prediction. Then we extended this thing and said okay now we would like to be able to explore this in an easier way.
And we would like to evaluate which kind of scenarios are in this distribution. And for this we interpreted this normal distribution as a factor model. And what is a factor model? So basically we interpret the columns of our gammas which are in here in the original normal distribution representation as factors.
Or as loadings the individual columns. And for each of these columns we introduce factor control variables which basically is just this z i which is standard normal distributed. Additionally for this diagonal value which you have seen before this psi here we
introduce the unstructured uncertainty which is just the epsilon exactly distributed as the predicted psi. And then the nice thing now is that you can have this normal distribution shown here represented exactly in this way.
And here it already looks more structured. So the example here shows we have four columns in our gamma. So we need four control variables z 1 to z 4. And it looks like okay control variables already indicated we could control it. And what is also one of the contributions here is that we
introduce the flow probabilities and these images below here show flow probability visualizations. And flow probabilities basically just state the likelihood of a change in the prediction based on individual factors. You will see later what this actually means in detail.
But basically you can see this is the mean prediction here. So here is some edge between the green class prediction and the yellow class prediction. And the flow probabilities here for factor one means that there is a probability modeled in this bond factor that this region here becomes yellow or the other region becomes green.
And the same here below for this red region. Okay now this looks nice but it doesn't really tell us something about which components do we have how are they connected to each other.
So we have to think what can we do with this. And yeah then we just need to see the low rank normal distribution is only unique to orthogonal rotations of the loadings. And what does this mean? This basically means that we could replace our gamma just by a gamma tilde which is any orthogonal matrix multiplied with the original loadings.
And orthogonal means that Q transpose times Q is the identity matrix. And the distribution itself doesn't change at all. So the eta so the main prediction stays the same. The mean prediction stays the same.
And only these representations here stay the same. But these representations yeah built or these visualizations are built exactly on these factors. So we could use these rotations to optimize the representation here. And how do we do this?
For this we just yeah have three aims defined. So first we want the number of relevant factors to be as small as possible. So relevant basically means there is color in the image. If it's black then it's not relevant because there is no flow probability. Then we want the relevant factors to be separable. So they should focus on different parts of the image or different classes.
And factors should be obviously non-redundant. And then yeah then there happens some theory. So we apply rotation or we borrow rotation optimization from the field of factor analysis. Namely VariMax, EquiMax, QuartiMax.
Which and to transfer the optimization objectives to the flow probability representation we just saw. And yeah this is basically given by this term here below. Where we don't have to talk about in detail about. And better look at results. And what you can see now. The first row is just as the network gave the output as the network gave it to us.
So we have eight factors and get some flow probabilities here. And the second line shows it after we apply our optimization scheme. Where we optimize the rotation. And what we can see is that here now only this edge is encoded.
So factor one now only focuses on the border between the yellow and the green class. Factor two only focuses on the border between the yellow and the red class. And factor three basically only focuses on this low edge here. So there might be still some way for optimization. But yeah it already disentangled uncertainties way more.
And yeah now if you again remember this factor model representation with our Z1, Z2, Z3, Z4. This also allows us to somehow investigate which samples are reasonable or which samples are covered inside this uncertainty distribution.
And for this you see here on the X axis our Z. So this is a control variable. And it goes from minus 1.5 to 1.5. And at zero you see all these Z's here are zero. So we basically just have the mean plus some noise. And yeah if you go to the left here you see this basically affects the green class.
So the flow probability shows the negative part expands the green class so to say. And this is exactly what happens. So you can see the green class gets more dominant. The red class doesn't change at all. If you go to the other side the green class goes back and the yellow class becomes more dominant.
And yeah the same happens here in the second row for the other factor we saw. And here again where only this red part is affected. But not the green part or the yeah not the green part. So yeah that's it from my side so far. Thank you very much.
And yeah we are looking of course very forward to hear from you. And we are always looking for collaborations and exchanges. Of course also working students and thesis students are very common. And yeah also general exchange and information in general. Yeah are always available and we are happy to share them.
Thanks a lot. Yeah thank you Jakob. Perfect timing so we have time for questions. If there are any. Hello.
My question is how do we cover the ranges of a factor. So that we can overcome all the certainties. So is there any way to find out like what range needs to be considered for the factors.
If you see this representation. Basically all these control variables are standard normally distributed. So you could always cover the confidence intervals of the individual factors. Okay.
And one more question would be like how these uncertainties would be impactful in terms of reproducibility of a neural network. Or results of a neural network. So is it more impactful that we need to cover up all sort of uncertainties. Or is it like.
What do you mean so epistemic and aleatoric or. Actually how impactful are these uncertainties to overcome in terms of reproducibility of neural networks. To reproduce the what. Reproducibility of neural networks results.
So what I can say is that considering these uncertainties in general. Helps to give the model the possibility to just leverage the. Get the prediction and to learn it in a nicer way. So it has the possibility to introduce noise. And for the applicant it's important to have them included.
Okay thank you. So it's more about. Yeah yeah that's. And I mean many especially if you consider regression tasks.
There are often also cases where you just have noise in the measurements. And or noise in the sensors. And it might be also of interest to just learn this uncertainty in the sensors itself. Is there another question?
Yeah there's one. Sorry. Thank you for the talk. What is the advantage of that workflow compared to like more classical explainable AI methods. Like generating saliency maps by changing the input a bit.
And finding out like which pixels are important. And which span of pixels or which range of values is important for the classification output. So the main difference is that these common explainability methods. They really focus on what caused this kind of prediction. So if you say you have a picture of a dog.
Then the dog should be lighten up if you ask why is this predicted as a dog. And here it's really more about what uncertainty do we have in these images. So you might get some really certain prediction for this one point estimation. So yeah just like the mean. So normally you would just get this as an output.
But there might be some correlations in there. And maybe if this is not that intuitive. Then there's you could also imagine it as some from healthcare for example. If you say you have a lot of scans. And you maybe have a lot of experts where they just mark out there some cancer cell and whatever. And then if you learn this. Then the network would somehow combine all these expert opinions.
And maybe some would just make a dot to a market. Maybe one would really take the time and draw it completely. And with these approaches you're able to really evaluate the whole scale. To use this approach where you have your control variables.
And you can play around with all types of size of the cell for example. There's another question. You introduced your talk also talking about out of distribution detection.
Or basically distribution shifts. Do you have any results how well this method also works for inputs that are yeah. Basically you're completely out of distribution. So we haven't applied it on this. And you're right. So this is aleatoric uncertainty.
And I would expect that they don't work. So normally modeling or giving the model to get to predict aleatoric uncertainty. Also improves the entropy in the end for these epistemic uncertainty. But this is not explicitly trained for epistemic uncertainty.
So I think there are some ensemble approaches or Bayesian neural networks might be more useful. Which could be combined with this of course. Thanks. Thank you Jacob. I think Jacob will also be around for and be willing to answer further questions.
We even brought our poster for the paper.