Data and discrimination: Representing marginalised communities in data
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 85 | |
Author | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/38131 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
5
6
11
12
13
14
16
17
23
24
25
27
29
30
31
32
35
39
41
42
43
44
45
46
47
49
50
51
52
55
56
58
59
60
63
65
66
67
69
72
73
74
76
79
81
82
84
00:00
TelecommunicationChaos (cosmogony)CASE <Informatik>Information privacyRoundness (object)Representation (politics)Lecture/Conference
01:00
Right angleCountingTelecommunicationChaos (cosmogony)Representation (politics)Graph coloringGroup actionGenderMarginal distributionDependent and independent variablesNumberCategory of beingSound effectXMLUML
02:02
CountingTelecommunicationChaos (cosmogony)State of matterWordRight angleCountingTraffic reportingState of matterMarginal distributionSlide ruleJSONXMLUML
02:39
WordState of matterService (economics)Integrated development environmentDifferent (Kate Ryan album)Sound effectCoefficient of determinationPairwise comparisonElectronic program guidePattern languageLimit (category theory)Video gameIndependence (probability theory)UMLComputer animationLecture/Conference
04:03
TelecommunicationChaos (cosmogony)Web pageBitHacker (term)State of matterComputer animation
04:31
TheoryQuicksortPrisoner's dilemmaWater vaporOrder (biology)Multiplication signRight angleSystem administratorSet (mathematics)
05:25
Right angleSound effectFamilyDemosceneWater vaporBitMultiplication signArchaeological field surveyField (computer science)Lecture/Conference
07:02
Chaos (cosmogony)Gamma functionProjective planeCountingTraffic reportingRow (database)DatabaseSet (mathematics)State of matterSurjective functionComputer programmingMoment (mathematics)
08:14
Chemical equationGenderChemical equationGenderSoftware developerPower (physics)Different (Kate Ryan album)MereologyDatabaseSet (mathematics)Lecture/Conference
08:58
Game theoryNatural languageBuildingChemical equationGenderOpen setMetropolitan area networkSet (mathematics)Entire functionRight angleLecture/Conference
09:19
Service (economics)TelecommunicationChaos (cosmogony)Field (computer science)Service (economics)Pattern languageNumberDifferent (Kate Ryan album)Right angle
09:56
TelecommunicationChaos (cosmogony)State of matterIdentifiability2 (number)Service (economics)Roundness (object)Word
10:26
Multiplication signService (economics)Ring (mathematics)Lecture/Conference
10:58
QuicksortRight angle
11:27
Right angleGroup actionRing (mathematics)Lecture/Conference
12:12
Chaos (cosmogony)Term (mathematics)Identity managementMultiplication signRing (mathematics)SpacetimeMereologyAxiom of choice
12:50
Set (mathematics)Group actionRight angleMultiplication signLecture/Conference
13:39
Image registrationForceImage registrationRight angleInformation privacyPoint (geometry)
14:10
Chaos (cosmogony)Information privacyProcess (computing)Level (video gaming)Lecture/ConferenceXML
14:36
TelecommunicationChaos (cosmogony)Digital photographyPhysical lawContrast (vision)Sinc functionGame theoryStatistics
15:14
State of matterLecture/Conference
15:44
TelecommunicationChaos (cosmogony)User interfaceTrigonometric functionsComputer fileMassTrailOnline helpProcess (computing)
16:19
Right angleCASE <Informatik>Term (mathematics)Revision controlLimit (category theory)Sensitivity analysisInformationMathematical analysisProcess (computing)Physical systemInformation privacyLecture/Conference
17:09
ArmNuclear space
17:31
InformationRadio-frequency identificationOffice suiteStatisticsInformation privacyPhysical system1 (number)StapeldateiComputer animation
18:09
Lecture/ConferenceXMLComputer animation
18:39
InformationNumberAxiom of choicePosition operatorLecture/Conference
19:09
ResultantComputer configurationDependent and independent variablesProcess (computing)Metropolitan area networkElectronic visual displayLecture/Conference
20:12
NumberAddress spaceTwitterElectronic mailing listAxiom of choiceLevel (video gaming)Lecture/Conference
20:50
Service (economics)Workstation <Musikinstrument>NumberMoment (mathematics)Functional (mathematics)System callCore dumpInformation securityPrisoner's dilemmaQuicksortSelectivity (electronic)Computer fileXMLComputer animationLecture/Conference
21:46
Functional (mathematics)Point (geometry)Wave packetCore dumpLecture/Conference
22:16
Functional (mathematics)Arithmetic meanArmLine (geometry)Information securityComplete metric spaceState of matterMultiplication signLie groupVideo gameTask (computing)Computer-assisted translationElectronic GovernmentHacker (term)Computer animationLecture/Conference
23:26
TwitterGame theoryFunctional (mathematics)Group actionComputer animationLecture/ConferenceJSON
23:53
Physical systemInformation privacyAxiom of choiceLecture/ConferenceXML
24:32
Computer configurationTelecommunicationChaos (cosmogony)Axiom of choiceLevel (video gaming)Electronic GovernmentComputer configurationInformationExclusive orLecture/ConferenceXML
25:01
MathematicsPressurePower (physics)Level (video gaming)MathematicsDesign by contractExtension (kinesiology)WeightArithmetic meanLecture/ConferenceXMLComputer animation
25:48
MathematicsLecture/Conference
26:18
JSONXMLComputer animation
Transcript: English(auto-generated)
00:13
So, our speaker for today will be Sarah Raman, and she will be talking to us about communities,
00:21
marginalized communities, and their representation in data. And more specifically, she will be telling us whether it is preferable for marginalized communities and in which cases to be reflected, to be represented in institutional data, and in which cases this could be harmful. So, privacy and anonymity are preferred.
00:44
So please, let us give Sarah a warm round of applause. Thank you all for coming on this lovely warm day.
01:03
So I'm going to be talking about the representation of marginalized communities in data. So this is based on some work that I've been doing with the engine room on the theme of responsible data, and some work with tactical tech on data and discrimination. So to start that work, I started thinking, like, what is discrimination?
01:23
It's when groups of people are treated in an unfair or biased way based on some arbitrary category, like their skin color or their gender or any number of other things. So how can data facilitate or mitigate discrimination in all its different forms?
01:42
So I started looking at the people who are living on the margins of society. So that's what I mean by marginalized communities, people who don't often get their needs seen to, people who are excluded from what's considered to be regular society. And I started by doing this by looking at the anti-discrimination movement and what
02:03
they're pushing for with regards to data and marginalized communities. So one thing that seems to be a theme among many anti-discrimination movements is the right to be counted. So the right for these communities to be reflected in data sets. So why count?
02:21
So I've got a couple of examples of, yeah, marginalized communities or communities that should be or need to be reflected in data for their needs to be met. So this is a screenshot from UNICEF's annual report about the state of the world's children. And in it, they really push for better data collection around children with disabilities
02:43
living in developing countries. Because as they say, without this data, it's impossible to know how these children are being treated, whether they get access to the services they need, whether there are any particularly harmful policies or any particularly beneficial policies that really help them get what they need.
03:01
And they found, for example, that children with disabilities are much more likely to drop out of school a lot earlier, and the data can help understand why this happens and try and mitigate this in the future. There is a problem with comparability of data in regards to different abilities and disabilities.
03:21
Because the environment that people live in has a huge effect on the effect that their disabilities have on their lives. For example, someone with limited sight who has access to a guide dog has much more independence and can lead a much more, yeah, much more fulfilling, one might say, life than someone who lives on their own and doesn't have access to a guide dog and
03:43
doesn't have anything that they need to live their lives. So this means that even if you do collect data across different countries, it's very difficult to compare what happens in one country, like the quality of life from different countries, yeah, without lots of different caveats. But it still might be interesting to see, yeah, different patterns.
04:04
This is an example that you might have come across. There was an article about it on the front page of Hacker News a couple of weeks ago. Until just over two weeks ago, this messy thing was the border between India and Bangladesh. So 51,000 people live in these bits marked in red, known as enclaves, which are
04:23
basically portions of a state that are entirely surrounded by the territory of another state. And this also was the home to the world's only third-order enclave. So this was a piece of Bangladesh, no, a piece of India within Bangladesh, within
04:41
India, within Bangladesh, which is ridiculous. So for the 51,000 people living within all of those, I think, 162 different enclaves, they didn't have access to any basic human rights. They weren't represented in any administrative data sets. They didn't have citizenship.
05:01
It meant that things like if they wanted to go to a market that was outside of their enclave, they could have lots of problems. They had no access to water, no electricity. They couldn't travel, all sorts of things. And reportedly, more than 75 percent of the people living in Bangladeshi enclaves had spent time in prison for invalid travel.
05:22
Because in theory, someone who lived in an enclave who wanted to travel outside the enclave would need a visa to enter the foreign country. But to get that visa, they would need to travel to a major city in their country, which was impossible for them to do without first going through the foreign country.
05:41
So this is an example of a population who really, really, really needed to be represented in some kind of data set and have their needs seen to. And actually, just on the 31st of July, India and Bangladesh exchanged various parcels of land and hopefully cleaned up the border a little bit, which was very momentous for these people.
06:04
And in preparation for that, one of the first things that was done was Indian and Bangladeshi officials, so a team of like 75 different teams. We each have one Indian official and one Bangladeshi official conducted a field survey. They went around and collected data on the enclave residents in the first two weeks of July.
06:25
And each of the enclave residents was allowed to choose citizenship of either nation. So by the time the official swap took place, they'd all decided upon what citizenship they could claim. And this went into effect on the 1st of August. So it's not, I mean, the problem isn't quite solved as simply as that,
06:43
because obviously, if, for example, families had chosen different citizenships, then now they'll get split up. But it is a first step towards getting these people the, yeah, the access to basic rights that they need, like schools, hospitals, electricity, water, that kind of thing.
07:02
This is another example of really missing data that was very, very crucial. This is a campaign or a project from The Guardian called The Counted. So in the US, there's no comprehensive record of people killed by law enforcement. It's up to the states whether they collect this or not.
07:20
And this lack of really basic data has been very glaring among the Black Lives Matter movement, for example. So at the moment, the FBI run a voluntary program, where law enforcement agencies have the choice whether or not to submit their annual count of what they call justifiable homicides to this centralized database.
07:45
So to counter this, The Guardian, together with its readers, are basically collecting the most comprehensive data set of people killed by law enforcement agencies in the US. So they're asking people to submit reports when someone is killed, or if they know of someone who's killed in the past,
08:02
and then they verify it, and together they're getting this data set. So this is a really crucial piece of data that's been missing until now, of people who've been being killed, and there's been no way to put all this data together, and thus no way of knowing if there's any patterns, like if they happen to be of a certain demographic, for example.
08:23
This is another nice one that is also quite new, called gender balance. So another missing data set is that there's no data set on the gender balance of different parliaments across the world. So you can't answer, for example, what country has the highest proportion of women in parliament, or do women vote differently on different issues,
08:43
and when did women come into power in different countries, and did this make a difference in the way that the country is run? So this is a tool called gender balance, created by developers at MySociety, and it's aiming to collect a database of the gender balance of every parliament in the entire world,
09:01
and it's doing this by a kind of Tinder-like game, so you can sign up, say what language and what kind of country you're most familiar with, and then you swipe right and left, if it's a man or a woman, and then they're crowdsourcing this data, verifying it, and then they'll build up this data set and release it as open data. So basically, the reasons for counting,
09:21
for making sure that communities are represented, is so that you can notice different patterns. You can see if there's needs or gaps in the services that are being offered, so that there's more accurate dissemination of public funds. In lots of countries, it depends very much on population as to where money gets given, and of course, in our field a lot,
09:41
and the stuff I've been working on, it strengthens advocacy hugely to be able to point to concrete numbers and say this many people can't get access to a certain basic human right. But then the other side of it, why not count? This is an example from World War II. It was quite recently revealed, after decades of denial,
10:03
that during World War II, the US Census Bureau provided the Secret Service with data from the 1940 census, so that they could identify people of Japanese ancestry, and in other words, to assist in the roundup of Japanese Americans for imprisonment in internment camps in California
10:20
and six other states during the Second World War. So this enabled them to collect and to round up and to imprison these people very, very quickly and efficiently, so to speak. And researchers who studied the way that this happened, the relationship between the Census Bureau and the Secret Service,
10:40
say that the speed with which the data was released, so it was released just seven days after it was requested, leads them to think that that wasn't the first time that that has happened, but it is the first time that it has been publicly admitted. Another example from Bangladesh. So the Rohingya people are one of the world's most persecuted minorities
11:01
in Myanmar or Burma. They're stateless. They live in Myanmar, where they're regarded as kind of illegal Bengalis, but in Bangladesh, they're not regarded as Bangladeshi, really. So most of the 1.3 million people have no citizenship,
11:20
and they also lack very basic human rights. They live in camps, in all sorts of really terrible situations. So recently, a couple of weeks ago, the Bangladeshi government announced that they'd be holding a census of the hundreds of thousands of undocumented Rohingya in Bangladesh,
11:41
who entered Bangladesh seeking refuge from persecution in Myanmar. But the Rohingya people within Bangladesh are also really, really unpopular, and lots of groups watching over the human rights of the Rohingya are very skeptical as to how this data will be used.
12:01
So some officials have said that the people are really reluctant to give their data for fear that they'll be deported, so this will be used by the Bangladeshi government to say, we can't take any more people, we need to send you back. And then they'll go back to Myanmar, and they'll face the same thing. Another thing that is happening among the census,
12:23
when they're collecting the data, they've often been asking, you have the choice either of identifying yourself as Bengali or not at all. So they're completely trying to erase the Rohingya identity in the census data. I mean, they say that the term implies a lot more than just the ethnicity,
12:40
that it implies a claim to land, and that's their reasoning for erasing this data, but for the people involved, it's a very important part of their identity, obviously. This actually isn't the first time that the Rohingya have faced discrimination within data sets. So from the other side, in Myanmar, last year, they conducted their first census in decades, and they included questions on ethnicity in the census,
13:03
and some human rights groups said that this was overly sensitive at that time, and they shouldn't be including those questions, but they went ahead anyway. And it turns out that there were a few census workers during the census who would go to households, ask them their ethnicity, and if they said Rohingya, they would turn around and go away.
13:22
So this is completely erasing the population of Rohingya from this data set. So clearly, the issue of race and ethnicity can be very problematic when it comes to marginalized communities, but it's not just ethnicity data that can cause problems. So this is an example from, I think, 2012,
13:41
where in the Netherlands, there was a proposal to make registration of sex workers mandatory, and lots of people within the sex worker community campaigned against registering, because they found that would be really stigmatizing for them, and it would violate their right to privacy. There's also very little evidence that points that,
14:04
so yeah, there's very little evidence that shows that registration actually helps fight human trafficking, which is the reason that it was supposedly being done in the first place. Yeah, so there's many other examples of why you really shouldn't collect sensitive data, but the kind of middle ground that I seem to have come across
14:22
between the anti-discrimination movement and the privacy movement, in a way, is a census that's done in a sensitive way, but this also, as we've mentioned, can be a very political process. So this is a map I found of showing countries where ethnicity or race of people was counted or was included or enumerated
14:44
in at least one census since 1991, and it's quite interesting to see the countries that really don't collect that data. For example, in France, it's forbidden by law to collect any statistics based on racial or ethnic origin,
15:03
but in contrast, there are other things that happen in France that, yeah, kind of go against this, so until recently, it was very usual to put your photos on your CV, so even though you didn't have to state what your race or what your ethnicity was, you could see that, and there's lots of research that was done
15:20
that proved that there were many discriminatory hiring practices based on that. And also, I mean, social scientists have devised kind of ways to get around this data, so now, for example, they just look at the names of people to kind of guess their ethnicity, so it's not like the data isn't really being collected, it's more that it's being collected in a kind of ad hoc and inaccurate way.
15:44
Obviously, Germany knows the horrors that can happen with collecting ethnicity data more than many countries. The Nazis used census data to help track Jews and other minorities, and then, obviously, in East Germany, the Stasi secret police maintained really comprehensive files on citizens.
16:04
And interestingly, in the 1980s, attempts at introducing a census in West Germany, I'm sure many of you know this, but for those who don't, yeah, sparked massive protests, and people refused to actually answer the census, and it led to a boycott. It meant the Constitutional Court stopped the census in 1983
16:23
and then required a revision of the process, so a more sensitive data collection process. This is where the term Informational self-determination was first used, which is kind of similar to the right to privacy, but it gives the individual the right to determine, in principle,
16:43
the disclosure and use of their own data. And the only limitation to this is when it's in an overriding case of public interest. So carrying out a census, as we've established, is hugely, hugely sensitive, but it is kind of crucial when you're trying to give people the things that they need.
17:03
So who do we trust to carry out the census? In the UK, unfortunately, it's Lockheed Martin. It's America's largest arms manufacturer who carries out the census, and this is the same in the UK, the US and in Canada.
17:21
It's an arms manufacturer who carries out this really, really basic role. So they make Trident nuclear missiles, cluster bombs, fighter jets. They're really into aerospace and defense. They get 80 percent of their work comes from the US Defense Department. And obviously, when this was, so this has been going on for a while,
17:41
and when it happened, there was kind of an outcry, justifiably, but actually not surprisingly small. In the UK, so this is where I come from, so this is kind of what I've been looking at with my research, and the UK government released a privacy impact assessment where they said, you know,
18:00
it's all the data is owned by the Office of National Statistics in the UK, but Lockheed Martin were the ones who designed the system and who implemented the system. And they said it's a criminal offense to release any of the census data, but then, you know, if we look to history, we see some examples of how that hasn't always stood up.
18:25
So yeah, Lockheed Martin, they've carried out the census in the UK 2000 and 2010, in the UK, I don't know, in the US 2000 and 2010, in the UK and then in Canada as well. So concerned citizens in the UK, for example, submitted a number of freedom of information requests
18:42
through the Freedom of Information portal, what do they know? And that's been really interesting to read through, but there's not actually that much more information that's given, and it's quite hard to follow. So looking, thinking about choice, like what can people, what can citizens do if they don't want to be giving their data to this company that they, you know,
19:02
whose morals they might not agree with? I mean, I'm obviously in no position to make any judgment on what's actually done with the data, but just thinking about what role this plays. So there were some really interesting responses from anti-war campaigners in the UK, because not carrying, not filling in the census
19:22
meant that you faced a fine of a thousand pounds. So these campaigners didn't want to give money to the UK government. The Green Party also decided not to support a boycott of the census, because the results of the census are used to allocate public funds.
19:40
So if you don't submit your data, you don't, you might not get your needs met or get represented in a way that means something to you. So yeah, so they didn't recommend boycotting either. So what they planned instead was making sure that the census process was as expensive as possible
20:01
for Lockheed Martin. So Lockheed Martin received 150 million pounds for the census, and they estimated, you know, there was also an option to fill your census data in online. So these anti-war campaigners said, you know, don't fill them on online, because that saves Lockheed Martin a lot of money. Make a few mistakes here and there,
20:21
you know, in your telephone number, in your address, things that don't really matter. Yeah, maybe like push it right to the deadline, send it by post. And they came up with this list of things that you could do to try and make the census as expensive as possible on an individual level, which I thought was really clever.
20:40
But, you know, the fact remains that you essentially had no choice of what to do. You had to fill in the census, you had to give your data to this company. And this is kind of a trend that we're seeing in the UK at least of increasing numbers of public-private partnerships. So when the government is partnering with a private corporation to carry out really core government functions,
21:01
and this is what I've been looking at with my work at the moment. So just another really quick example is G4S. So they're, I think, the world's largest, or one of the world's largest security companies, making no judgment on what they do with this data. But this is just a selection, not even all of the activities that G4S carries out in the UK.
21:22
And if we look at the people who are affected by most of these things, they're people on the edges of society. They're asylum seekers, they're former offenders, they're people, they're survivors of domestic violence. They're children, they run eight children's homes in the UK, as well as, I mean, G4S also do a number of other things,
21:42
like run prisons in the West Bank and all sorts of horrible, horrible things. But the point here is that they're doing this as subcontractors of the UK government. So if you come into contact with any of these things, you might not even realize,
22:01
you likely don't realize that you're not dealing with the government. It looks just like the government, but it's not. And there's no opt-out function. So they're providing really core functions, children's homes, CCTV at train stations, for example. So, you know, what does this mean for us?
22:20
This could have been a really big slide. So at least in the UK, we seem to be giving up really core functions that should be done by a trusted state entity with accountability, and we're giving up that accountability to these companies. So as a citizen, if one of these companies does something that I don't agree with, it's very kind of murky and difficult to know
22:41
where the lines of accountability lie. Like who can I go to to say, hey, they treated me in a bad way, or I don't agree with what they're doing. And there's a complete lack of, well, not a complete lack of transparency, there's a very kind of strategic way that they seem to be doing transparency. For example, if you know where to look with all this bureaucracy on the government websites,
23:02
you can find out that it's companies doing all this, but it takes a long time. And like most people, I mean, I don't think many people assume that, you know, when you see someone in a government function to ask, hey, are you from an arms company or a security company? At least I hope not, because that would be terrible. Yeah, so some of the work I've been doing
23:23
with Tactical Tech has been looking at the questions we should be asking when we come across relationships like the one we've just described. Like how can we reclaim that data and those functions? How can we gain more accountability? How do we kind of try and stop this trend or at least try and hold them accountable for their actions?
23:43
And yeah, there aren't any answers yet, I'm sorry. So in conclusion, there are kind of three basic things that I've realized are kind of emblematic or very important in all of the relationships I just described. Transparency, choice, and trust.
24:00
So transparency about who is collecting the data, who actually has access to it, like really? I mean, if you're a company designing a system to collect data, and then the government say, that company actually have no access to it, do they really, like who carried out that privacy impact assessment? Do we trust what they're saying or not?
24:22
Is there any, who owns that data? Do you have access to be able to say, actually, I don't want that data to be represented here? And then as I mentioned, the kind of accessible levels of transparency. So not just saying, oh, you know, it's on a government website somewhere, like making it really, really visible who is doing what
24:43
so that people can have the choice. Is there an option to not be in the data? So this speaks to the German idea of information self-determination. Is there an actual option to opt out or are you then facing more exclusion or a fine?
25:02
Is there any pressure put on people to opt in and is it being done by people who hold lots of power? And then the final one, trust. I mean, this depends a lot on where you live, whether it's a democracy where you expect, where there's some kind of social contract where you expect to have that relationship
25:21
with your government. I mean, if it's in, sadly, many of the world's countries, you don't have trust and that's assumed. But for example, I mean, I'd hope that at least in Germany and the UK, we try to establish or kind of try to still claim that level of trust to an extent. Do we believe the answers that they're giving?
25:41
So even if they're being transparent, are they telling the truth? Can we believe them? Will this change in the future in any way? Like if you give your data to one entity and they say, we're gonna do this, and you trust them and you believe them, is there any chance that that could change in the future and then that same entity might still have the data?
26:01
Yeah, I think that's just about it from me. Thank you.