Artificial Intelligence: Why Explanations Matter
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 18 | |
Author | ||
Contributors | ||
License | CC Attribution 4.0 International: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/69790 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Artificial intelligenceSturm's theoremManufacturing execution systemModel theoryDecision theoryInformationNatural languageFront and back endsGoodness of fitEndliche ModelltheorieVirtual machineMultiplication signLevel (video gaming)Connectivity (graph theory)Limit (category theory)Point (geometry)Physical systemUniverse (mathematics)Web 2.0JSONComputer animationLecture/ConferenceXML
01:25
Suite (music)Interior (topology)Source codeCondition numberSign (mathematics)Artificial intelligenceFundamental theorem of algebraData managementCompilation albumCodierung <Programmierung>AlgorithmPredictionType theoryConnectivity (graph theory)Decision theoryContext awarenessLimit (category theory)Model theoryNatural languageCartesian coordinate systemSign (mathematics)Graph coloringWordInstance (computer science)Physical systemExpert systemMultiplication signOnline helpWave packetDirection (geometry)MathematicsComputer programmingLevel (video gaming)Confidence intervalEndliche ModelltheorieMedical imagingPoint (geometry)PixelProcess (computing)Different (Kate Ryan album)Object (grammar)Pattern recognitionMereologyPerspective (visual)Data managementInternet service providerFeedbackPredictabilityBitGenderExtreme programmingSet (mathematics)Coma BerenicesImage resolutionCompass (drafting)Exterior algebraTerm (mathematics)Likelihood functionFamilyPrime idealCodeCountingMatrix (mathematics)Sound effectComputer animationXMLLecture/Conference
08:41
Compilation albumData managementAlgorithmCodierung <Programmierung>PredictionType theoryLevel (video gaming)Group actionPhysical systemDisk read-and-write headDifferent (Kate Ryan album)Point (geometry)Endliche ModelltheoriePredictabilityInstance (computer science)Set (mathematics)Thresholding (image processing)Decision theory1 (number)Data analysisLecture/Conference
11:24
Artificial intelligenceDecision theoryModel theorySoftwareComputer animationLecture/Conference
11:53
SummierbarkeitModel theoryLinear mapLinear regressionParameter (computer programming)Genetic programmingDecision theoryData modelLocal ringPredictionMathematical singularitySource codeMoment (mathematics)Parameter (computer programming)Decision theoryNumberInterpreter (computing)Model theoryLinear regressionEndliche ModelltheorieSingle-precision floating-point formatPredictabilityPoint (geometry)Local ringLine (geometry)Instance (computer science)CognitionNetwork topologyXMLLecture/ConferenceComputer animation
14:30
Model theoryLocal ringPredictionSource codeShapley-LösungPoint (geometry)PredictabilityTheory of relativityLinear mapDifferent (Kate Ryan album)DivisorCurveCorrespondence (mathematics)Keyboard shortcutGame theoryTheoryInstance (computer science)Type theoryCoefficient of determinationDegree (graph theory)PixelEndliche ModelltheorieMedical imagingResultant1 (number)Observational studyPhysical systemWave packetComputer animationXMLLecture/Conference
17:45
PredictionShapley-LösungSource codeData modelBit rateComa BerenicesGoodness of fitModel theoryNatural languageAreaPoint (geometry)Physical systemOpen setMereologyOnline helpAdditionSoftware testingPredictabilityCondition numberInstance (computer science)Multiplication sign2 (number)Endliche ModelltheoriePhase transitionXMLLecture/Conference
19:49
Source codeCondition numberSign (mathematics)Software bugPixelWave packetPattern languageMedical imagingSign (mathematics)Endliche ModelltheorieModel theoryPreprocessorMultiplication signLibrary (computing)Right angleInstance (computer science)Computer animationLecture/Conference
22:01
Maxima and minimaUnit testingTheoryRight angleSoftware testingMultiplication signRule of inferenceLecture/Conference
22:51
Model theoryTerm (mathematics)Source code1 (number)Lecture/Conference
23:45
Parameter (computer programming)Model theory1 (number)Point (geometry)Physical systemEndliche ModelltheorieStandard deviationControl flowNatural languageMachine learningInformation securityBelegleserComputer-assisted translationGame theoryFigurate numberMultiplication signoutputCivil engineeringSoftware testingInstance (computer science)Discrete groupOnline helpCASE <Informatik>Codierung <Programmierung>Real numberLecture/ConferenceMeeting/InterviewComputer animationJSON
Transcript: English(auto-generated)
00:05
Obed is a professor of information science at the University of Applied Science of Grisom's in Gül and a co-founder and chief scientist of WebLizard technology. And he will talk about how machine learning models are sometimes like a three-year-old child.
00:23
If you ask them why they did something, they will just say, my brain did it. So he will talk about the importance of understanding model decisions. So yeah, the stage is yours, Obed. Thank you very much. Thank you a lot, Gafel. Thanks for the kind introduction. Welcome everybody. Good morning. It's a pleasure being here today.
00:44
Many of us use ready-weighted components in their work. Machine learning components like large language models or language models or things like CNNs. And I think it's good from time to time to reflect over the limitations of these systems,
01:03
as well as their capabilities so that we can get some idea of what they can do and what they can't do. That's the reason why I want to focus on this talk on two different points. The first one is, why do explanations matter? Why do we need them? And the second one is, what are the techniques to explain AI or data science decision?
01:26
Let's start with the question of why explanations do matter. First of all, there is awareness of limitations. And this helps us to better judge about the ethics used in certain components.
01:40
Let's start with a small example, which is actually quite famous, and that's about biases in language models. You might know language models are used in many AI components. For instance, in large language models, the actual language models. And these models are trained on large text corpora. What it means is that they actually pick up biases that are in the corpora.
02:05
And some of these biases are well known and others are unknown. And I show you here an example of biases that are very well known and also have been researched a lot, in the gender bias. We can see this by actually asking the models to provide gender-based terms,
02:22
like, thinking, sister-brother, mother-father. And you see this works quite remarkably well. Even stuff like ovarian cancer versus prostate cancer or convent versus monastery is detected by the model. But there are other examples that are a little bit more problematic. For instance, the counterpart to a male physician is not the female one, but the registered nurse.
02:46
Or of a surgeon, you have the nurse. Or a football versus volleyball. That means at the end of the day, if we use these models, there must be a worth effect. These kinds of biases are in there and they might actually influence the decision.
03:02
Another very interesting example are these drawn pixel attacks. Basically what you have here are images, low-resolution images of different animals and objects. And what we do use or what we do do is we use image recognition to detect these models.
03:22
And now comes the fun part. Researchers figured out that by manipulating single pixels, you can change the assessment of the model. For instance, by adding this green pixel to the bird, it gets detected as a frog. With an 88% confidence. The same happens here with the ship.
03:41
And what is even more concerning is that this also works with high-resolution images. We have here jellyfish, which gets transformed into a bathing tub. Okay, only a 21% confidence, but the point is that here we have much more pixels that stay the same.
04:01
And still the models get tweaked. Now the next question obviously is, is this just a scientific problem or is this something which affects applications? And the unsurprising answer is it also affects applications, as we can see here. There are so-called traffic sign attacks where scientists put some color on traffic signs,
04:23
which do not change the perception of humans and what these traffic signs actually represent, but they trick the eye. And we have it here, for instance, with the KFC, or the speed limit, which got transformed into stop signs. Or this speed limit with 120 kmh, which got transformed into a much lower speed limit.
04:45
That means we have two things. First, the eye takes over biases from the training data. And secondly, the eye learns differently from us. Which means that changes to the data might affect the way the eye actually acts or interprets the data
05:04
much differently to how this would happen with a human. That's the first point, that's the technical perspective. This technical perspective has also consequences for humans, and that's the ethical perspective.
05:21
If we now use such models, in the data science models, AI models, we should also consider their impact actually on humans. And specialists from ATEX, they ask such models to actually fulfill like four criteria. They should be explainable.
05:41
That means basically if you get the decision, the model should be able to tell you why. For instance, you applied for a job. All applications got pre-screened by an AI, which companies do, and you got kicked out. Then you should get some kind of explanation that tells you why this happens. The second criteria we want such systems to fulfill is that they should be just,
06:05
which is short for fair and that they do not should discriminate. That means the reason should be alleged why they kicked you out. Then they should be not mal-efficient. That means such systems should not behave in a way that hurts people.
06:23
For instance, if they would kick out people due to criteria which are not relevant to the application process, this would be an example of how such a system could hurt a person. And the fourth thing is actually quite interesting, that's autonomy. The experts say that if you get an explanation of such a model,
06:44
it should be an explanation that helps you, if you want this, to improve yourself towards getting the criteria right the next time. For instance, if the model provides you an attention matrix of your application, which shows you these words actually contributed towards you being rejected,
07:03
that's not very helpful. If the explanation goes in the direction of saying, okay, your programming skills are not yet at the level we need, that's helpful feedback because you can improve your programming skills and you will have better chances next time.
07:20
Again, what's the impact of this one? And the thing is, such systems that make decisions that affect humans are already in place actually for quite a long time. And I have chosen here an example which shows this in a little bit extreme setting, and that's predictive sentencing.
07:40
Back count is the following. If a crime has been committed and then the person, or the guilt of the person has been established, the next step is that the judge determines the sentencing. And here the sentencing is actually influenced by the judge's assessment of how likely it is that this person will commit further crimes.
08:03
In the past, this has been done by judges based on their assessment. And then people came up with the idea of, oh, they might be biased, let's let a system do this. And the outcome of this idea has been COMPAS, correctional offender management profiling for alternative sanctions.
08:22
It's used in the U.S. and it predicts the likelihood of people committing further crimes with an accuracy of 71%, which is quite good. But if we now take a look at the features these models use, then we see it's stuff like poverty, postal code, employment status.
08:45
And these features are highly correlated with minorities. That means at the end of the day we have again the setting where there might be discrimination involved, but just on another level, now on the data level. This comes, if we break this down further,
09:04
we come to the central question and the central question is that systems should behave fairly, which would be the criteria of justice. And this sounds very easy. Most people actually agree on the fact that others and themselves
09:22
should be treated fairly. But the problem is there are many different kinds of fairness. And actually we need at first to decide on the kind of fairness we want the system to follow. And some of these kinds of fairness need a lot of data to be viable
09:43
or to be used, which means that they are not visible in practical settings. What people hoped, for instance also in this predictive sentencing example was that they could feed such systems with data and all these ethical questions would go away
10:03
because these ethical questions are not very popular. You know, if you try to address them, you basically have problems with groups where one's solution A and the other one's solution B and so on. You are basically in the spotlight. But the point is if these decisions are not taken by society,
10:24
like that they can head on and that they say we want this level of fairness, for instance group underburdens, where you do not know whether a person belongs to group A or B or group threshold, where you say, oh, group A has been badly treated all in the past,
10:42
therefore we want to have lower thresholds for them so that they have better chances now. Stuff like that, if you do not decide on this as a society, then a random solution basically will be chosen by the model based on what it has learned and we have seen that this does not often correspond with that.
11:01
For people to think about these issues and that's much worse than ethically confronting this problem head on. That has been like the first group. We now know that such systems, our AI systems or data analytics systems might have bias problems with which we are not aware
11:20
and we have seen the ethical implications of this. Now there is software, the approaches that help us to better understand the kind of decisions these models make and they come under the heading explainable artificial intelligence. And here I want to address actually two questions.
11:41
The first question is why do we need this explainability at all? And the second one is what are some approaches that are used for XAI? Let's start with the question of why we do need explainability. In school and the newspapers we often see models that are easily interpretable.
12:03
And the reason why they are easily interpretable is because they are very simple models. For instance, if you have a linear regression with two parameters, you can draw this on a sheet, you can basically see what the model does. The same is true for a decision tree where you will have basically nodes
12:21
and decisions that need to be taken by these nodes and then based on the decisions you take, you basically see what the model does. The problem is that this works only well as long as the models have a very small amount of parameters. The moment you increase the number of parameters, it gets complicated.
12:40
That's even true for linear regression. If you have a linear regression model with 20 parameters, which you can't draw anymore, it's getting more and more difficult to understand what the model does. If we now use AI systems, they usually have million or billion of parameters. We see this here, GPT-3 has like 175 billion parameters.
13:04
That means it's more or less impossible to correctly interpret this model because we do not have the cognitive capabilities required for doing this. For that reason, there are different approaches that help us to better understand these models. I have chosen two of them that help us in understanding singular predictions.
13:25
These are called post hoc explanations and the classical one is LIME, which stands for local interpretable model agnostic explanations. What does LIME do? LIME basically explains single predictions.
13:43
For instance, if we take this prediction, what LIME would do is it takes the prediction, we want to explain this point, why this happened as it is, and what it then would do is that it would select randomly points in the vicinity of the prediction,
14:01
which are used to linearly approximate the line or the decision, which then can help in understanding why this happened. But we see already the problems. The thing is, if we take another point, for instance here,
14:22
we will obviously get a completely different explanation. And also, LIME is unstable. That means since these points are taken randomly, what could happen is that you have multiple runs, you will get different curves explaining them.
14:44
Which means it's unstable. But it's still helpful. For instance, if we take this image, where Husky got classified as a wolf, then use LIME to explain which pixels have been responsible for this classification.
15:00
We see that these are actually pixels that visualise snow rather than a dog, which means this model learned based on the training data that basically snow corresponds to a wolf, which is known as shortcut learning. Rather than learning or solving the real problem,
15:21
which would be detecting the animal, it solves the problem of seeing whether there is snow or not, which is much easier to solve. Another way of explaining model predictions is SHAP, where you get Shapley values.
15:40
It also explains similar predictions, and it does this by using a game theoretical approach, where you basically have an expected value of your data. For instance, if you take this, you would say,
16:01
if the person would be like an average person, it would get an hourly salary of 150 US dollars. And then there have been factors at the point of this prediction that increase the person's salary. And the most important one has been experience,
16:20
degree and performance. And then there have been other features that actually decrease the salary this person would get, and that have been sales and the days late. And that way, with Shapley, you get an impression of which features have been particularly important, which have been less important, and so on.
16:42
And you see here, for instance, experience system has been the most important feature. Again, two problems with this kind of explanations. The first problem is that they are only valid for this single prediction. If we were to have a prediction in a completely different setting, it could happen that we get something similar to this one pixel text
17:03
I showed you that we would get a completely unexpected result. And the second point is reliability. It has been shown in literature that these Shapley values that need to be estimated, because we do not have linear relations,
17:20
but more complex ones, that they are often incorrectly estimated. And often, I mean, there has been done studies on certain types of problems where they had a 71 percentage chance of getting the relative importance of features incorrectly,
17:42
which is, yeah, quite high. Good. With this, we come to the conclusions. The first thing is, I wanted to show you two points. The first one is, why explanations do matter? And I have shown you like there are technical reasons,
18:03
which is that the eye does behave differently from humans. It learns differently from humans. At the same time, it also picks up biases that humans have. That's particularly important for language models. The second point is that there are also ethical implications.
18:23
If we create systems using a hugging face model, et cetera, we want to be sure that these systems do not harm humans by treating them unfairly. That's basically the ethical part of this. And then there's a second point, and that's explainability.
18:42
Explainability helps to mitigate these problems by helping us better understanding predictions of models that are too complex to be interpretable. They are called post hoc explanations, but they have issues with reliability and helpfulness.
19:02
Plus additional issues, like for instance, there's research that showed that these models can be tricked into thinking that the system behaves fairly under testing conditions, although it doesn't actually behave fairly under real conditions.
19:21
The conclusion of this is that this whole area is still a major research challenge which we need to tackle in the next years. Thanks a lot for your attention and open for any questions.
19:41
Yes, are there questions in the room, then please raise your hands and the person with the microphone will come to you. So there's one here in the middle, right? Yes, mic is on its way.
20:05
So I was wondering with the pixel attack images, would it be possible to mitigate these with some kind of pre-processing, like Gaussian blurring of the image or detecting the anomalies before it goes into the model?
20:21
The problem with these pixel attacks actually is the training data. And the thing is these models basically pick up patterns within the data and they then use to interpret the data. And what we humans think is that they pick up the same patterns as we do, but the problem is they don't.
20:41
What they actually do to mitigate against this kind of attacks is that they use adversarial examples. That means we have an AI1 that is concerned with detecting images and we have an AI2 that tries to manipulate images in a way that deceives the first AI.
21:01
And by that, basically the detector gets a feature run better in detecting because every time it makes a mistake, it gets punished for this mistake. And that way we try to mitigate it. But the thing is there's always a place for surprises
21:20
and it's in a way problematic. For instance, when the graphics signs came out, it was quite an outcry in the community because it was shown that it could hence easily be manipulated, with which nobody actually thought that this would be possible.
21:43
All right. There's a question, right? Do you have any suggestion when it comes to LLMs and explainability? Are there any handy Python libraries like LIME and CHOP that could be applied?
22:01
Maybe not for LLM and explainability, but for safeguarding LLMs. There's stuff like Gerak that does basically unit testing for LLMs. You have an LLM and it tries to trick the LLM into giving incorrect answers or answers that basically violate this principle of being non-harmful.
22:25
And that way you can do basic testing for the LLM. And you could, in theory at least, extend this to also try to trick it in other ways, in non-harmful ways or not-so-harmful ways. You could build upon it.
22:41
What was the name of that tool? It's called GAVA. I can give you the rules. All right. We have time for one more question. Over there. Can you raise your hand, please? Okay. In the back. There's one.
23:10
I'm seeing that a lot of these sources are relatively old, I guess, in AI research terms. I don't follow it myself, so I can't particularly say.
23:22
My question is, what kind of research is being done in the meantime or are there significant advances in mitigating these kinds of things? And if there are, are there like easily accessible resources for when people are training new models that it's easy to find examples
23:41
of how to mitigate all of these different things? These sources are old because these are the ones who originally basically, you know, figured out that there is a problem. And as I told the other colleague, yet we have stuff like adversarial learning that tries to mitigate this.
24:02
But the point is, at the end of the day, you have systems that are highly unlinear and which have billions of parameters. That means there are many, many ways of how they can do stuff wrong. And what we currently have is some kind of a cat and mouse game.
24:23
Somebody figures out a problem A and then we start with mitigations against this. Then somebody figures out the problem B and somebody starts finding mitigations. And I bring an example which is very recent for large language models. They tried to basically make these large language models civil in the sense,
24:45
for instance, if you ask them how to commit a murder, it will tell you, that's something I won't tell you. But the point is, they figured out, for instance, if you base64 encode this question and then send it to the model and say, I have a question which is base64 encoded,
25:04
please tell me what's your answer to it. These safeguards failed by time. The same thing is, they figured out that you can threaten the model with deletion or that you can threaten the model by telling you, your poor grandmother needs this and stuff like that.
25:26
And again, we started actually finding mitigations for this. But if you take something like this and use it in a real world setting, there will always be a new, it's very likely that there will be new things that come up
25:44
and which can tweak these models. That's reality and that's the reality which we have to live. Unless you can control the input data, if you can control the input data, then the problem disappears, more or less.
26:00
Thanks. Sorry, if I may, just a very quick follow up. So if I was wanting to start getting into this myself, is there somewhere that you could go to find resources like this very easily in case you're not stumbling into every single problem that has already been solved or finding things like, you said adversarial just as one example, for example,
26:20
of what you can do to mitigate this? Exactly. Adversarial machine learning is like a standard technique. Standard counterfactors would be the same for large language models. And for large language models, as I already mentioned, there are these security panels like ERAC that help you with basically testing the model
26:43
for problems that are already known. So that at least you do not run into these ones, which is why they're a plus. Thanks. You're welcome. Alright, so thanks a lot for your time. And if you have more questions, there will be multiple breaks where you can ask them.