Metadata Investigation
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Untertitel |
| |
Serientitel | ||
Teil | 47 | |
Anzahl der Teile | 188 | |
Autor | ||
Lizenz | CC-Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Deutschland: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben. | |
Identifikatoren | 10.5446/20717 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
00:00
GruppenoperationDifferenteBitLeistung <Physik>VisualisierungSoftwareHypermediaTopologiePhysikalische TheorieComputeranimationJSONVorlesung/Konferenz
01:04
Mengentheoretische TopologieVisualisierungMusterspracheResultanteInternetworkingFacebookCOMAblaufverfolgungBitWeb Site
02:09
InformationDifferenteWeb SiteBesprechung/InterviewVorlesung/Konferenz
02:31
FacebookTechnische OptikInformationDatenstrukturOffene MengeDifferenteSchnittmengeTypentheorieApp <Programm>Visualisierung
03:05
InformationFacebookDifferenteHypermediaElektronische PublikationKartesische KoordinatenDatenstrukturWeb-SeiteVorlesung/KonferenzComputeranimation
04:10
MomentenproblemE-MailLeckHackerVorlesung/KonferenzBesprechung/Interview
04:53
CybersexDistributionenraumDifferenteAnalysisMetadatenHackerXML
05:45
E-MailMobiles InternetFacebookChatten <Kommunikation>Inhalt <Mathematik>Vorlesung/KonferenzBesprechung/Interview
06:07
DatenstrukturDatenbankSelbst organisierendes SystemAnalysisGewicht <Ausgleichsrechnung>E-MailFormation <Mathematik>GraphComputerunterstützte ÜbersetzungBitInhalt <Mathematik>MetadatenMusterspracheSkalarproduktNetzadresseSchnittmengeComputeranimation
07:52
DialektGamecontrollerE-MailMultiplikationsoperatorZusammenhängender GraphBesprechung/Interview
08:24
Selbst organisierendes SystemBitComputersicherheitMusterspracheTelekommunikationKategorie <Mathematik>CybersexDomain-NameDifferenteAggregatzustandGruppenoperationVorlesung/KonferenzComputeranimation
09:33
GraphfärbungSelbst organisierendes SystemVorlesung/KonferenzComputeranimationBesprechung/Interview
09:58
DifferenteMinimumProdukt <Mathematik>AnalysisComputerspielMultigraphQuick-SortTelekommunikationDynamisches SystemE-MailInformationBitMusterspracheComputeranimation
11:09
Prozess <Informatik>E-MailMusterspracheInformationPeer-to-Peer-NetzFunktionalDifferenteComputeranimationBesprechung/Interview
12:10
ZahlenbereichMomentenproblemGesetz <Physik>MathematikTropfenVisualisierungMinimumMereologieE-MailDifferenteComputeranimation
13:51
MathematikMusterspracheRichtungMultiplikationsoperatorArithmetisches MittelZeitzoneEinfach zusammenhängender RaumMomentenproblemGruppenoperationDifferenteRuhmassePixelBesprechung/Interview
14:31
PixelGeradeMultiplikationsoperatorProdukt <Mathematik>DifferenteMereologieVollständiger VerbandMAPBesprechung/Interview
16:10
MultiplikationsoperatorNetzadresseInformationEinplatinen-ComputerKontextbezogenes SystemE-MailMomentenproblemURLTopologieAnalysisBesprechung/InterviewComputeranimation
17:31
MomentenproblemDifferenteTeilbarkeitE-MailEchtzeitsystemAutorisierungCASE <Informatik>Basis <Mathematik>AlgorithmusPartikelsystemPlotterMustersprachePhysikalisches SystemBesprechung/Interview
18:30
Physikalisches SystemComputerspielMusterspracheSprachsyntheseVorlesung/Konferenz
19:00
TopologieFacebookRechter WinkelDreiecksfreier GraphDifferenteMultigraphFaktor <Algebra>Bildgebendes VerfahrenInformationAlgorithmusProdukt <Mathematik>Ein-AusgabeMusterspracheVisualisierungBesprechung/InterviewProgramm/Quellcode
20:28
ZweiHackerModallogikSchlussregelNeuroinformatikBesprechung/Interview
21:03
Ordnung <Mathematik>Selbst organisierendes SystemDifferenteSystemaufrufBenutzerfreundlichkeitVorlesung/Konferenz
21:32
PolstelleLeckDatenbankVorlesung/Konferenz
21:59
MetadatenDatenanalyseCASE <Informatik>BeobachtungsstudieVirtuelle MaschineBesprechung/Interview
22:38
CASE <Informatik>MetadatenVirtuelle MaschineInhalt <Mathematik>SoftwareDifferenteVorlesung/Konferenz
23:24
Virtuelle MaschineMetadatenMultiplikationsoperatorBeweistheorieProzess <Informatik>Lie-GruppeVorlesung/Konferenz
23:55
Virtuelle MaschineProzess <Informatik>MAPMultiplikationsoperatorQuick-SortBesprechung/Interview
24:32
MultiplikationsoperatorEreignishorizontAnalysisMetadatenVorlesung/KonferenzBesprechung/Interview
25:07
AnalysisTopologieMetadatenDifferenteBesprechung/Interview
25:35
Vorlesung/KonferenzJSONXML
Transkript: Englisch(automatisch erzeugt)
00:23
We are some kind of really small, strange, a bit unofficial lab. It's a kind of group of people that are working together. We have one cyber forensic, we have some people who are like into tech research, legal researchers and me who have some kind of strange background in media theory and arts.
00:45
And what we like to do, we like to investigate different kind of invisible things. And basically we started with like really simple investigations, like how our networks looks like, how our surroundings looks like.
01:00
And basically we were making some kind of like data visualization of different kind of network topologies. Then we were mapping all of those websites where they exist and what's the traces that we are leaving to others when we are like visiting those websites.
01:20
So we were able to trace, for example, when we are typing, I don't know, www.facebook.com where those packets are traveling from and to detect different forms of like some kind of internet harbors around. And then we said, okay, let's try to do a bit deeper investigation.
01:44
So we were basically exploring inside of the websites what are some kind of third party embedded in those websites and where all of those data is going to. So we came up with this kind of visualization, so we basically get some results
02:05
and we saw that like Google is collecting all of those data, Facebook and so on and so on. But this is most what all of you know already, no? But then we were like mostly interested in analyzing individual companies
02:20
because we said, okay, it's so huge this kind of surveillance economy that we really want to understand how this is functioning. So this is, for example, Google, how he's like extracting information about our visits from different websites. And we did the same with Facebook.
02:44
And then we said, okay, of course, this is just the little segment of all of this surveillance economy. The probably really, really scary thing is happening on the mobile phones, so we map all the permissions and this kind of panoptical structure.
03:00
It's like visualization of all permissions that we are giving to different mobile applications. So, for example, this is Facebook and then they are collecting different kind of information. So, for example, photos, media files, so on and so on. If you go like deeper, you will understand that they are able to access
03:27
mostly anything that exists on your mobile phone. And then we were investigating further in Serbia like, okay, but how different government agencies are basically collecting information about retained data.
03:47
So we were like mapping all of those structures based on some kind of 2,000 pages of documentation that we were able to gather on different ways. So, step by step we were like understanding how deep and how scary this kind of surveillance economy is,
04:10
but we just in those prior investigations were able to understand how they are collecting our data. But we never were able to understand what they are basically doing with this data.
04:23
And then what happened is that there was like one really nice moment. So in July 2015, there was one really big leak of emails from a company called Hacking Team. And basically some kind of 400 gigabytes of data, their internal emails,
04:45
was there thanks to some people and then published on WikiLeaks. And on other side we had another leak coming of course from Snowden. So we had some kind of picture what for example NSA,
05:02
what kind of methodology NSA is doing in analysis of metadata. So we said, okay, on one side we have some kind of methodology, but on another side we finally have our own big data. So let's cross those things together. So what we did, we basically tried to do the same thing that NSA is doing to us on the daily base,
05:26
but to do this to someone else, basically to Hacking Team. Who is Hacking Team? If you don't know. So this is one of the biggest companies in the world that is like some kind of cyber weapon manufacturing and distribution.
05:43
So they are selling different kind of tools for government agents and probably others to get into your email accounts, mobile phones, Facebook chats or whatever you can think of. So we said, okay, let's investigate them.
06:03
So every email have a header and have a content. And for us, because we wanted just to use like metadata analysis, we said, okay, just concentrate on the header. We don't care what is in the content. And our database was like really simple.
06:22
Subject, date sent from to and IP address of the sender. So what we first did was to do some kind of social network analysis. And by doing this, you get some kind of structure like this. And so you can investigate the big dots. That means that this person is sending a lot of emails, their external contacts, so on and so on.
06:46
But this looks a bit messy. And then basically we start to do different things to play with the data. So we filter only people who send more than 100 emails.
07:00
So we get basically a structure of the organization because those dots are different people inside of the hacking team. Then we play with the same data set on another way. So we were like on this side you have people from hacking team on this other side as well.
07:20
So the darker dots means that those two person are communicating a lot between each other. So from here we were able to understand some patterns. Who is the main guy who is communicating with who, how, how often and so on. And then if you like to respond the same picture into another graph,
07:42
you basically got some kind of organizational structure of the hacking team. That for us, that was fun. So you see that main guy, this David Vicinzetti, he's some kind of control freak. He's communicating with everyone, sending a lot of emails.
08:02
Then you see another people who is communicating with who. So we are on the good path. Then what we can do, we can add also time component and see for each of them when he's active and how often he's sending emails.
08:23
Then of course, because we were like really curious, we were interested about external contacts and just with a few clicks we were able to visualize who are the most frequent external contacts of the organization that led us to investigate a bit more them and so on.
08:44
Then we were able to cross people from hacking team with external contact and then to understand what are the patterns of communication between them. Then we said, okay, let's analyze now just the domain names.
09:01
So this is, for example, all external contacts but grouped in different categories. So those are another cyber weapon manufacturers or different kind of hacking organizations. Those are state security organizations.
09:22
Those are some kind of investment groups, lawyers. So we get a picture overview how this industry looks like and who are the main players. Then we were able to follow different companies. So for example, this is nice.com and every color represents another person.
09:44
So we can see here that 10 people from this organization is communicating with 10 people from hacking team. So they are like really tied together. And then we said, okay, we never heard for this company, nice.com. And basically this company is doing the same thing, selling almost the same product
10:05
but not to governments, to different companies around the world who are then surveilling their workers, for example. And here, for example, if we are looking more on these kind of graphs, we can see, for example, how here the guy from probably CEO of this company is first doing some deals
10:24
and then he's going out from communication and another person is entering in. So we were able to understand dynamics of how they communicate with the different companies and what's going on. Then we said, okay, now we are going a bit more deeper and said, okay, let's follow the person.
10:45
Okay, let's follow the guy. So what we have here is something that's called pattern of life analysis and it's some kind of military vocabulary but basically they are doing that. So if you are analyzing just sent emails, what are sent emails?
11:05
Sent emails, it's how you, it's basically really personal information because it's how you are reacting to things. So we can see that this guy David, he's like waking up almost every day really early around 4 o'clock,
11:21
then having some pose around 6, then going probably or doing some jogging, then going on his job, then working a lot around 11, then having a pose for lunch around 13, and mostly not working anything around after 20. On other side, if we analyze the emails that he's receiving, that's a pattern of his surrounding.
11:46
That's a pattern of how his peers are basically functioning. We can see that they are a bit different. You know, they just go to job and then have some peak around 10, but then lunch and slowly not working a lot after that.
12:04
It's what people do, no? But then we were able to follow David, like what is his behavior during the week or during the year, and every change in that pattern, it can lead to something because the number of emails that you are sending,
12:26
it can tell are you depressed or you sick or you fell in love or something else. So those changes means a lot. You know, when you see some drop, it means that something is happening to David.
12:43
And then we played, this is going into some kind of data masturbation. Then we tried to understand the patterns, and if you visualize them like this, you can see some peaks. And those are the dates when something happened to David, you know,
13:03
when something is strange because he's like sending, I don't know, 20 emails per hour or something. And this is my favorite one. So, okay, the more darker the square is, more emails, those are months, days.
13:24
So here you can really have some kind of visual background. So you can see like different patterns, no? So for example, this one. You have emptiness, and then you have something like switch in one side, and then you have emptiness again.
13:42
What's that? And then we investigate this strange pattern, and we learn that this is the moment when he moved to Singapore. So that means traveling time, no connection, and then change of time zone, and then traveling back.
14:00
Then we, this pattern is similar, but it's going in another direction. This is when he went to New York, different time zone, shorter flight. And this, another favorite one, this is the moment, you see some mess here. This is the moment when Citizen Lab published research about hacking team.
14:25
So you can see, you can see drama in those pixels. Then in subject line you can find a lot of interesting things. So every time when you are ordering something from Amazon, in subject line it's written the name of the product.
14:47
So we were able to understand, to know all the equipment that they buy over Amazon, and who bought in which one. So you have like different movies, you have different kind of equipment, mostly mobile phones and things like this.
15:06
But then another really interesting thing, so they have a company that is like providing them tickets for flights. And also in subject it's written the name of the airport and the name of the person.
15:23
So from that we were able to basically extract for each of them how many times they travel and where. So we get like first map of their movement. Then we were able to separate this on different people so we understood which member is covering which part of the world.
15:51
And then if we compare all of this together we were able to understand where they are meeting each other. So for example meeting in Mexico, Morocco, different countries.
16:03
And who is meeting who. And you understand if there is like four or five of them meeting in one place that means that they are doing something there. Most of the time or they have fun. Even more aggressive thing it's basically hidden in IP addresses of the senders.
16:24
Because every time when someone is sending an email to David he is revealing his IP address. Most of the time not every time. So we were able to track every each person who sent an email to David the exact location.
16:42
And for example this is just analysis of their movement. So you can see like these guys all the time in Singapore. The other one is moving like this. So lots of interesting information. And then when we analyzed external contacts we were able to map them like on the precision of the city.
17:09
All of them so those are some kind of strange people. But we were able to go even deeper. So for example this is London. So we know that guy from UK government with his name and surname was in that moment here and communicating with hacking team.
17:29
So it's really scary story. And emotionally for us it was also like you know like not so nice feeling to do this.
17:46
Because in one moment we understood we are going too deep into someone patterns. But another side this is what is happening to us on the daily basis. In the real time. By different actors.
18:00
So who can do this? Google can do this. NSA can do this. Lots of different actors. All of them who have access to your emails can do the same thing. And what is really scary is that this is happening in real time. And in most of the cases it's run by the algorithms who are then able to understand what are your patterns.
18:26
In the same way I understood what is the pattern of David. So they can alarm the system if you move somewhere. If you start to behave differently than usual. If you change your pattern of life.
18:41
And then this lead us to some kind of predicting of future behavior. When our patterns are analyzed and then so on so on. That lead us to some kind of pre-crime, pre-cog and a lot of other really really scary scenarios. So that was our investigation of hacking team.
19:02
The next investigation it's not so visually that we are doing is about Facebook. But it's not so visible. So we are mapping all the inputs into Facebook. Like for example this is when you like, when you share.
19:21
And how those different kinds of information are being stored inside the Facebook algorithmic factory. Basically we are doing this by reading some kind of 8,000 different Facebook patterns. Not patterns but patterns.
19:43
And trying to understand what they exactly do with all of our data. And how different algorithms are basically transforming this, our behavior into the product. On the right side.
20:01
I'm really pity that you cannot see so well. Those really psycho graphs. So that's my story about hacking team and data visualization. Is there any questions?
20:29
Microphone is coming. One second. I have two questions. First question. Were there any moral problems to you using the real name of this chef of hacker team?
20:45
And the second question is. Do you think there is any necessity for humans involved in analyzing this data? Or could this all be done by computers?
21:04
First we were thinking to change the names of the people with different animals. And that was like really funny. Like that David is a wolf or something like this. In order because we have like one legal team in our organization.
21:22
So we had like really long discussion over month about different ethical, legal things. And at the end I said on one side all of those data is public.
21:41
On other side so we didn't like expose specially something. If someone is like googling their names it will appear some of those names. They're written in like Wikileaks database and so on. So at the end I just choose to publish this like this.
22:04
I don't know what I'm really proud in some way. Our investigation of hacking team became some kind of case study in this data analytics society. So when you google metadata investigation you have hacking team.
22:24
So it's some kind of like being big. Okay I'm proud on that on one side. But what was the another with humans and it's completely done by machine.
22:44
Okay in this case it's done by humans. But it's done in I believe in all other cases it's done by machines. And this is why metadata it's so important as Snowden said. We don't care about content so much in NSA we care about metadata.
23:02
Because metadata can be analyzed by softwares. And content can lie on different ways but metadata it's like metadata. And a lot of people ask me do you need to you know like because it's kind of investigative journalism. And like if you're doing investigative journalism you should like have different proofs from different sides.
23:24
And I said okay I don't know how to collect proofs because like it's metadata it's like cannot lie. You don't need another opinion. So yes the machines it's basically in 99% of the time this is done by machines.
23:40
And this is why it's so so this is why these government agencies like so much metadata. Because it's really easy to process. Okay we have time for one last question. I was just curious you did this as a proof of concept basically you substituted yourself to the machine as you just said in this process.
24:05
Is there something that you think you gained as an understanding of what I can team was doing on another level on a most mostly even knowledge level about what was happening by looking at that data for so long time.
24:22
And were it something as some sort of level of knowledge that you were able to achieve through that with your own human brain. I think this can be this is this is really powerful tool. And in some way really powerful knowledge and I was most of the time after events I'm approached by the people who ask
24:48
me like do you want to target our audience on this way and like you know so I learned a lot by doing that. But did we discover something I think the biggest discovery not it's not
25:07
about hacking team the biggest discovery it's how the metadata analysis it's intrusive. And and by understanding that that was the biggest shock that we have because from just
25:23
like four different columns of data we were able to reconstruct a lot of different things. And that that's really that's really scary. So thank you.