We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

What is algorithmic bias?

00:00

Formal Metadata

Title
What is algorithmic bias?
Title of Series
Number of Parts
11
Author
License
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Producer
Production Year2023
Production PlaceHamburg

Content Metadata

Subject Area
Genre
Abstract
Conversations on AI Ethics Episode 6: Explore the topic of algorithmic bias with Prof. Dr. Judith Simon. What causes this bias, and how does it affect us? Tune in for a deep dive into the ethics of AI!
Keywords
Physical systemAlgorithmComputer animation
Physical systemMoment (mathematics)Context awarenessMeeting/Interview
Form (programming)SoftwareMeeting/Interview
Source codePattern recognitionWave packetSimilarity (geometry)Meeting/Interview
Wave packetSource codeDifferent (Kate Ryan album)AlgorithmPhysical systemMeeting/Interview
Position operatorGenderPhysical systemMeeting/Interview
Group actionPhysical systemMeeting/Interview
Physical systemMoment (mathematics)DatabaseComputer animation
Integrated development environmentElectronic program guidePhysical systemSound effectContext awarenessProcess (computing)SoftwareMeeting/Interview
Term (mathematics)BitSearch engine (computing)Physical systemDistribution (mathematics)SkewnessComputer configurationComputer animationMeeting/Interview
Physical systemDecision theoryContext awarenessDatabaseComputer animationMeeting/Interview
Instance (computer science)Axiom of choiceDifferent (Kate Ryan album)Context awarenessThumbnailPhysical systemType theoryGenderGroup actionRule of inferenceCategory of beingDatabaseComputer animationMeeting/Interview
Different (Kate Ryan album)Computer animationMeeting/Interview
Physical systemCartesian coordinate systemDecision theoryAlgorithmPoint (geometry)Context awarenessDifferent (Kate Ryan album)Degree (graph theory)Meeting/Interview
Decision theoryRule of inferenceComputer animationMeeting/Interview
Process (computing)Software developerDecision theoryMeeting/Interview
Perspective (visual)View (database)IdentifiabilityDecision theoryXMLComputer animationMeeting/Interview
Large eddy simulationType theoryInformation privacyRegulator geneTheory of relativityDegree (graph theory)Computer animationMeeting/Interview
Physical systemPhysical lawExtension (kinesiology)Motion captureRegulator geneXMLMeeting/Interview
AreaCategory of beingPhysical systemOrder (biology)MeasurementMeeting/InterviewComputer animation
Transcript: English(auto-generated)
Algorithmic bias means that the systems can affect different people differently. I mean, that's also a narrower notion, right?
I mean, algorithmic bias is even broader that basically a system are skewed in a certain way. But what we're discussing very often in the context of discrimination is the moment you have these systems affecting humans, right? And for instance, if you have a software that is used to detect skin cancer and this would be functioning better on white skin than on darker skin, that would be a form of bias, right?
And the sources can be of multiple origins. It can be that the training data is skewed. We have similar issues, for instance, in facial recognition technologies where it was shown that the accuracy is highest for light-skinned men and lowest for darker-skinned women.
And then the question is where does it come from? Is the training data different? Where there are more white men in the training data, that can be one source. But in other systems, it can also be that the target variables, what you're optimizing for, may affect people differently. Think about a hiring algorithm in which you basically, let's assume you're trying to train a system to find the most successful candidates, right?
And you now have who is now in the lead position in your company and you're screening the CVs from 10 years ago. And if you have a sexist promotion practice, then basically your best predictor will be the gender of the applicant, right?
And then the system learns this and if this is used to make future predictions, you will basically just perpetuate the former bias decision-making into the future. So this is meant by biased systems that they have a slant, that they systematically disfavor certain groups and that's particularly worrisome if this is applied to people.
The problem is really that the moment you use database systems to guide decision-making, you're replicating and even reinforcing previous decision-making processes. That is the problem. So sometimes in society we may have, let's say, racist or sexist people making decisions,
but if you interject future decision-making with the software that systematically does so, of course then you have one specific bias affecting many more people. So that is one effect. But that is not necessarily though. You can also use the same insights to mitigate for bias. But if you just learn what has happened in the past, you will transform it and maybe even
make it worse in the future because you're basically preserving the same bias from the past into the future. And maybe if you think about just sexism and racism to a certain degree, I think there has been a raising awareness in society, at least I hope so, that racism and sexism are unacceptable and we're trying to avoid this and mitigate this.
However, if we're now using systems based on data from the past, these systems may be rooted in even more racist or discriminatory practices than we're currently having today and then we may be transporting them into the future. So if we just use these systems blindly without assessing and checking
whether the systems are biased, then indeed it could be worsening the situation. I mean the same system to realize who was promoted and what can you learn from the previous CVs about who was promoted can also show you a discriminatory practice and then you can maybe mitigate for that either technically or in your broader institutional environment.
If you think about picture search on a search engine and you look for nurse, you may see a lot more women than men even if the term is actually gender neutral. So the question is, does this ratio between female and male nurses to be found in
the picture search represent the actual distribution or is there an additional skew in the search engine? But even if we get away from that system, the question shouldn't it still be a problem even if it's just replicating the status quo?
And that depends I think a little bit on your world view, right? If you think existing injustice is fine and it just depicts what is there in the world or whether you think we should mitigate for a better world and try to change the existing injustice. I would certainly say there is no reason in replicating the status quo but rather to strive for a better world
and then of course relying on systems that merely inscribe the status quo to the future is not the best option. As we said before, very often existing society injustices but also stereotypes are very often replicated in these systems.
So those who already are worst off in society are usually also those who are worst affected by these systems. And that also basically means that systems very often or decision making tools are less trustworthy for them, right? So they are actually right and not trusting them as much because they are most negatively affected by those systems.
I think it's very important that the system, especially data-based systems, don't treat everyone the same. If you want to think about this, right, if you think about the insurance context where maybe if you are very healthy and young, having personalized pricing may be to your benefit but what is to somebody's benefit is usually to somebody else's detriment.
So think about it this way that usually the harm comes hardest on those who are already most vulnerable in society.
So I think first and foremost one needs to check for it and that is sometimes easier, sometimes harder because there are certain types of protected groups which are also enshrined very often in the law, right? So you shouldn't be discriminated according to gender, to race, to age, to religion. So these would be instances where you could check in a system whether it's
treating for instance Catholics and Protestants or different types of religions or different genders differently. And if it's the case, you can mitigate for that and try to avoid it. But that already shows you that there's a problem because you may not be able to achieve to satisfy this for all categories at once. So even for the known categories such as gender, race, age, religion, you name it, it
may be impossible to optimize, if you wish, for all of these different groups at once. Now think about very different groups. So very often when we're talking about database systems, we may be classified in ways that are much more unpredictable.
So it may be that people who happen to like a certain soccer club and go to vacation somewhere plus some extra factors, they are grouped in a way and they are systematically discriminated against, even if this is not a recognizable group that is kept by the law. So I think it is tricky to figure out, but I think the rule of thumb here must also be is to
look into who is in a society in a given context most disadvantaged and to check that those are not further affected. So if you need to make a choice on what to optimize for and what to avoid, it is helpful to look at who in a given context is most severely discriminated against in general in society and then to check for that.
But of course overall it's not just about changing the technology but changing the society, but then the question is who can do what. So you're right in saying that there are very different notions of fairness, also very different notions of justice, also in our society not only in technology.
If we think about what people consider just when it comes to distributing goods, you know, should everybody have the same? Should those get more who are worst off right now? These are very different conceptions that we have. And I think the first and foremost point would be that when we're dealing with fairness in algorithmic systems, we have to make something explicit, but we're very often only discussing implicitly.
And I think it is important to discuss what is an appropriate notion of fairness in a given context and also to make it open for debate to a certain degree. So the more high impact the technology is, I think the more participation is needed to settle on the question of what fairness in this context may mean.
What does it mean to have a fair system that is selecting applicants for interviews? And that may differ between different people, but nonetheless at the very end a decision has to be made for a certain system. So I think recognizing the differences is the first step and then discursively trying to agree upon this is the next one.
So currently the same decisions are made, but they are not necessarily, they are as contested, but just somebody makes it under the hood if you speak. So I think the first step must be to lay open that these decisions are being made and then decide what is most appropriate in a given context.
On the one hand, you have people developing and designing technological decisions, very often making decisions with high implications, who may actually not the people who should be deciding how this is being implemented. And then I think who should always be involved is the people most severely affected.
Again, as a rule of thumb, if you're developing a system, think about who is usually most heavily discriminated against in a certain group and think about how to involve people in the development process and design process. So think about who are, very often it's called the direct and indirect stakeholders, who is affected by your tools and
not only those who directly interact with the tool, but those who may have the implications of a decision making process. So I think involving stakeholder groups, NGOs, etc. is very vital, especially for high impact technologies.
The more different perspectives you have, the more likely it is that people will come up with problems and identify certain issues. Because many of these problems of bias and discrimination are not intended by the people developing a system, but they are a side product either of the data or of maybe sometimes decisions where people haven't thought about things properly.
So I think the more diverse viewpoints you have, the better it is in general. I don't think this is solving everything because very often, as I said, the bias creeps in through the data and that is not solved by a more diverse team, but it's potentially that those have a better view on what to check for.
So I think it's important to work on this, but it won't solve all issues, I would say. I think in particular in regards to bias, discrimination or all types of relational harm, so harm that is not only for individuals, but also for groups of people.
We're not really doing enough to a certain degree. For instance, the data protection regulation focuses very much on individualized harms, but less so on all these issues of discrimination. And I think this is not captured as good as it should be, because the more we rely on these systems and they have systematic biases, the more problems we will encounter with systematic discrimination.
And there is, of course, anti-discrimination law. The question is, to what extent does it capture also AI-based systems? How easy is it to prove that you were discriminated by a system? And on the other hand, we have AI regulation that is very often focusing such as the AI only
on high-risk technologies, and not all systems that may discriminate people may fall under the high-risk category. So I think there is room for improvement. I think one of the solutions should lie in having obligatory standards for at least laying open what you did in order to mitigate bias. So I think if companies were forced to say these were the measures that were taken in order
to at least check for discrimination and bias, then people can ask and challenge whether this is sufficient. So especially for high-risk, but a broader notion of high-risk, high-impact technologies, it should be essential that people disclose what they did in order to prevent bias.