Coding to fight online abuse
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 96 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/51707 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
NDC Oslo 201650 / 96
2
7
8
9
12
14
19
20
26
28
31
33
38
40
43
45
48
50
51
61
63
65
76
79
80
83
87
88
90
93
94
96
00:00
TwitterInformation securitySoftware developerFlash memoryJava appletScripting languagePoint (geometry)TURBO-PASCALWeightArtificial neural networkHacker (term)Denial-of-service attackWebsiteDistribution (mathematics)CodeInternet forumPhysical systemRange (statistics)Partial derivativeEmailComputer-generated imageryVideo gameExtension (kinesiology)Computer fileMathematicsProcess (computing)Information securityCASE <Informatik>Shared memoryWebsiteMultiplication signFile archiverBlogSearch engine (computing)Content (media)Hacker (term)Internet forumEmailDenial-of-service attackSoftwareOnline helpTurbo-CodeMedical imagingProfil (magazine)Physical systemFacebookOperator (mathematics)Flash memoryInformation technology consultingRight angleWindowPartial derivativePerspective (visual)Connected spaceBitAddress spacePascal's triangleQuicksortBuildingWeightGrass (card game)Service (economics)Self-organizationForm (programming)Dot productBoss CorporationInsertion lossFrequencyPoint (geometry)Java appletFerry CorstenVirtual machinePhysical lawReading (process)CodeInformationThread (computing)Point cloudSoftware developerGoogolComputer animation
08:07
EmailComputer-generated imageryHash functionCodeThresholding (image processing)Internet forumLink (knot theory)Address spaceHacker (term)Decision theoryTime domainProcess (computing)InformationImage registrationQuicksortProcess (computing)Subject indexingWebsiteContext awarenessInformationLink (knot theory)Medical imagingForm (programming)Point (geometry)Closed setHacker (term)CodeSystem administratorNumberThread (computing)Traffic reportingOnline helpInternet forumRobotTouch typingSearch engine (computing)Content (media)Address spaceEmailData miningComputer configurationMultiplication signIP addressCross-correlationCASE <Informatik>LengthPRINCE2WordFlow separationSystem callPoint cloudArithmetic meanDivisorComputer filePrisoner's dilemmaWeb pageFacebookGoogolTwitterDataflowError messageFormal grammar19 (number)Computer animation
16:10
InformationProcess (computing)CodeError messageImage registrationHacker (term)Dependent and independent variablesDivisorNumberAddress spaceForm (programming)EmailPasswordLink (knot theory)outputMaizeMarkup languageSource codeBackupSimilarity (geometry)Computer networkAutomationHash functionOffice suiteSocial softwareIdentity managementUniform resource locatorRegular graphWeb browserPressureComputer-generated imageryBlogContent (media)Real numberHacker (term)EmailPosition operatorElectronic mailing listWeb serviceIP addressError messageHypermediaInternet forumNumberHash functionBitCASE <Informatik>Uniform resource locatorBlogPoint (geometry)InformationoutputConnected spaceWeb pageWeb browserSystem administratorMatching (graph theory)Freeware10 (number)PressureAngleSubject indexingTraverse (surveying)WebsiteHoaxProcess (computing)Similarity (geometry)FingerprintFacebookQuicksortOperator (mathematics)Markup languageMedical imagingPasswordTraffic reportingWechselseitige InformationPhysical lawTouch typingSource codeSystem callWeightComa BerenicesOnline helpOperating systemPlug-in (computing)Address spaceProxy serverDivisorSoftwareMultiplication signLatent heatOffice suiteCodeShared memoryComputer animation
24:13
Hacker (term)PasswordInformation securityContent (media)Message passingEmailComputer configurationDependent and independent variablesQuicksortProcess (computing)PasswordEmailLink (knot theory)WebsiteMoment (mathematics)NeuroinformatikCASE <Informatik>HypermediaInformation technology consultingNumberInformationMultiplication signHacker (term)Video gameInformation securitySoftware developerDifferent (Kate Ryan album)BackupSelf-organizationProjective planeGraph coloringRegular graphTotal S.A.Keyboard shortcutTerm (mathematics)Point cloudResultantComa BerenicesMetreKey (cryptography)LoginFamilyFood energyProfil (magazine)Arithmetic meanMessage passingContent (media)Game controllerVideoconferencingComputer animation
32:16
Link (knot theory)Extreme programmingResultantVideoconferencingQuicksortGoogolDigital photographyWater vaporTouchscreenSystem callLink (knot theory)Graphics tabletSource codeComputer animation
33:34
Student's t-testQuicksortLattice (order)Local ringPrisoner's dilemmaAdditionWebsitePhysical systemVideoconferencingGame theory
35:49
Computer networkWebsiteInternet forumSearch engine (computing)Structural loadChaos (cosmogony)Maxima and minimaLink (knot theory)Plot (narrative)Twin primeThomas KuhnComputer fileExecution unitNormed vector spaceInflection pointUser interfaceMathematicsProgrammable read-only memoryMenu (computing)Read-only memoryTime domainMaizeCollaborationismInternet forumWebsiteWhiteboardWikiComputer fileDot productDescriptive statisticsAnalytic continuationProof theoryWeb 2.0Chaos (cosmogony)File formatEmailQuicksortAddress spaceData loggerLoginPoint (geometry)Visualization (computer graphics)VideoconferencingTheoryUniqueness quantificationFilter <Stochastik>Set (mathematics)CASE <Informatik>ResultantMultiplication signIP addressContent (media)SoftwareContext awarenessWeb pageInformationCodeData managementLaptopLink (knot theory)Drag (physics)Formal languageExtension (kinesiology)Search engine (computing)40 (number)Form (programming)WordHand fanTimestampGoogolComputer animation
41:33
LaptopAnnulus (mathematics)CollaborationismProgrammable read-only memoryReduction of orderWeight functionProxy serverEmailAddress spaceBlogPasswordWeb pageInformationNumberContent (media)Mathematical analysisKernel (computing)FlagIP addressFormal languageMathematical analysisHypermediaPasswordResultantLaptopLoginProfil (magazine)Set (mathematics)Computer filePosition operatorGraph (mathematics)Point (geometry)CASE <Informatik>Filter <Stochastik>Process (computing)Data managementWeb pageEmailAddress spaceInformation securityElectronic mailing listKey (cryptography)Connected spaceLevel (video gaming)Likelihood functionQuicksortCirclePlotterAxiom of choicePerformance appraisalExclusive orProxy serverSystem identificationContent (media)NumberCycle (graph theory)DivisorElectric generatorDrag (physics)WordOrder (biology)Computer programmingProgrammer (hardware)Lattice (order)BitIdentity managementMultiplication signFacebookTwitterData storage device5 (number)Computer animation
47:18
Computer fileProcess (computing)Physical lawComputer fileWebsiteMultiplication signNumberSet (mathematics)ResultantTheory of relativityDemosceneIP addressPoint (geometry)LoginMathematical analysis
48:26
Musical ensembleInternet forumUser profileTwitterCrash (computing)CASE <Informatik>ResultantVideo gameWebsiteOpen sourceFilesharing-SystemMedical imagingInternet forumVirtual machineProduct (business)DatabaseCollaborationismEmailArithmetic meanProjective planeMusical ensembleGreatest elementPoint cloudScaling (geometry)Level (video gaming)TwitterUniform boundedness principleSelf-organizationHash functionRing (mathematics)Computer fileDatei-ServerProfil (magazine)Computer animation
Transcript: English(auto-generated)
00:05
Welcome to Coding to Fight Online Abuse. My name is Einar Otto Stangvik. This is, to a large extent, the history of my career and personal life from 2012 up until today.
00:22
To introduce myself briefly, I am closing in on 34 years old. I am, more than anything, a security-minded developer, work-wise. I am also an aspiring strobist, which is a fancy way of saying that I enjoy photography and artificial lighting.
00:40
The picture on the right there is me in the shed outside my in-law's cabin with two fog machines, I think four flashes and a remote trigger, and obviously welding goggles. I have been employed by VGNO since 2014, and I do a little bit of ops, security, and investigative journalism.
01:03
For those that don't know VG, it is the most visited news portal in Norway and also one of the biggest newspapers. To give you a bit of a grasp on the motivation for the things I'll share with you here today, my work resume from the late 90s up until around 2012.
01:23
I was doing Turbo Pascal, C, C++, Linux, and ops. I was a security consultant for a while, C++ MFC for Windows UI, PHP, C++.net for whatever reason, Flash, some Java, back to C++, SharePoint consulting, C-Sharp, WinForms, and yada yada,
01:47
Silverlight, JavaScript, Node.js, Flash, and security consulting once again. It all mostly felt like made-up artificial solutions to not very important problems.
02:01
It felt like I was working to make money for myself and for my bosses, and none of it really had any real purpose. By mid-2012, I was burnt out, bored, and generally just depressed with my job. I desperately needed change. I needed to do something that felt more purposeful than what I had done so far.
02:27
Around the same time, I read articles and blog posts about revenge porn and people being outed, shamed, and exposed on various forums.
02:41
There were sites that were dedicated to outing young girls and boys and sharing their explicit content. My impression at the time was that the victims of these abuses weren't getting any help from police, from organizations, from anybody. They were feeling helpless, and the offenders grew ever more ruthless in what they were doing.
03:07
They were going even further than they had before in exposing their victims. They were making these collages of Facebook profiles combined with nude pictures they had found, stolen, or otherwise obtained,
03:24
and publishing them online along with names that were indexed by search engines and so forth. They were showing precious little mercy towards their victims when they were doing this. At the same time, the police couldn't be bothered for the most, it seemed.
03:42
If there are anyone representing law enforcement here today, just grab hold of me later and I'll apologize. That was my impression. Many of the cases that were reported were dismissed very quickly. This, I felt, led to a legitimization of the offenses.
04:07
The offenders seemed to get an understanding of what they were doing as not really that wrong, not that bad. The general public as well seemed to get an impression that this was all boys-will-be-boys kind activities.
04:26
The offenders were outing, shaming, and basically abusing people, but it wasn't that bad. That whole thing was caused by police and others not really pushing back and making examples of anyone,
04:42
not pressing charges against anyone, not making any proper investigations. The few vigilantes that were working against this problem seemed to mostly resort to DDoS attacks. It seemed to me that the DDoS tactic was especially childish and counterproductive
05:04
because they were mostly harming innocent sites, innocent businesses, and innocent private persons. They were taking down entire hosting companies and network connections just to get at this one site that might be dedicated but not really dedicated to abusing material for a short period of time.
05:25
All at the same time, the same content, the same pictures, the same texts would be available on dozens or hundreds or thousands of other sites. It seemed pointless. That whole approach to this problem from the content-hosting perspective seemed wrong to me.
05:48
Instead, I wanted to know, would it be possible to crack a Norwegian revenge porn case by using code? And specifically, could I, as a burnt-out developer, apply my knowledge to this problem?
06:05
I figured at the time that it couldn't hurt to try, at least. In retrospect, it could have hurt a lot of people in a lot of ways. I could have basically destroyed people's lives pretty severely.
06:20
But that dawned on me when it was too late to stop. So I went head-on in to the case that turned out to be the chase of an iCloud hacker. In January 2013, or actually late 2012, I built a system that monitored a few certain forums and image hosts online.
06:44
I built this system to download partial images, partial files regularly, all new posted files to these forums, but only the first 5,000 bytes. I didn't want to download potentially full and illegal images and other illegal content.
07:03
I was only interested in the EXIF headers. So I processed millions of images over the course of a few weeks, and specifically I looked for GPS tags. Once I found images with GPS tags, I would resolve these through Google's APIs.
07:21
And in the cases where the address is resolved to Norway, I would archive all the text and send myself a notice. Hey, take a look at this. Before long, I found a post which suggested the hack of at least five girls from Norway.
07:42
And the reason why I got the impression that this was a hack was one of the commenters on this thread asked, aren't you afraid that the girls will know who posted these pictures? And the poster responded in a way similar to who said they know I got them, iCloud...
08:01
So it seemed like it was one of the iCloud hacks that I have read about. So I went on at that point to identify two other girls in the pictures. There were at least five girls, the others I couldn't identify at that point. But two of them were easily identifiable through looking up the GPS information in the images
08:24
and basically going through Google Maps and the white pages, yellow pages, Facebook, and so forth. And so I found out who they were. And then I went on to update my bot code to include the thumbprints of their images
08:41
in scraping all the various forms and sites I was monitoring. So I was looking for duplicates of these pictures. And also I was looking to learn as much as possible about all the contexts when these pictures were posted.
09:03
And this eventually revealed a link to another forum which looked a lot like the original posting of these girls' pictures. Some of the pictures had EXIF information which put the images being acquired,
09:23
being taken with iPhones mostly, a very short time before they were posted online. So there seemed to be a time-wise correlation that suggested that this was the initial posting. And users in these threads were trying to identify the girls.
09:43
So I was sort of stressed when I found out that one of them came close to naming and thus potentially making search engines such as Google and Bing pick up and index their names
10:00
in context with the pictures. And one of them also came close to naming one of the girls' 14-year-old sister in this regard. I mean, it's already bad enough that the then 19-year-old girl would have her name associated with the pictures, but it seemed entirely wrong for the 14-year-old to be associated with not dramatically explicit
10:26
but still private pictures that could be her but turn out to be her sister. In either case, I was stressed out enough by this that I contacted the administrator of the forum and massaged him for a while to both get the IP address of the poster as well as the thread deleted
10:44
and all the pictures deleted. And I ended up actually paying him $200 from my own pocket to do this. In retrospect, a bad idea and something I wouldn't recommend others to do because it basically means I'm funding revenge porn posting.
11:02
But I got what I came for. I got an IP address which resolved to a residential area in a town close to where these girls were located and also deleted the content. Obviously, the pictures were reposted in other threads and reposted on a lot of other sites as well,
11:21
but it sort of at least struck dead the whole naming of the 14-year-old sister in that one thread. So they didn't pick up on that again, luckily. So while working on the technical side of things, I contacted the police as well, two precincts actually, and I asked them for help in notifying the girls.
11:42
I wanted the police to be the initial contact with the girls and explain to them what had happened and also to provide them with all the information I had gathered and put them in touch with me if they wanted to pursue things further. The police said, no, we won't help you, and they won't do anything unless a formal report is filed by the girls.
12:07
And they also questioned whether my motivation for contacting them was good. I don't know really what they were aiming at with their questioning back then
12:23
because I obviously wouldn't have contacted them if I was out to cause harm in any way. But they turned the whole thing back towards me, and we had an angry sort of exchange, and that was that with the initial contact with the police. So it left me sort of dumbfounded, and I was left with the option of either dropping the case entirely
12:48
and walking away or contacting the girls myself and presenting them with information. But I figured that there was a risk that that was going to be uncomfortable for them, for some random stranger approaching them with this information.
13:03
But I eventually did contact two of them, and one of them filed a police report. This police report was dropped immediately, even with the information I had gathered so far, including the IP address that I had gotten for the initial poster.
13:20
And the reasoning was that since they didn't know who the perpetrator was, they couldn't do anything, which seems really strange to me because I imagine that's why people contact the police in the first place in these cases, to get help to figure out who has done something against them,
13:41
who has stolen their pictures, who has published them online. To move forward, I started monitoring systematically the chatter about iCloud attacks on these forums, and I saw a lot of self-advertising hackers, a few of which seemed to be from Norway as well.
14:00
So I decided to set a trap for them. I bought in the main called Spun XXX and established a hollypot there, and I contacted the different hackers that I found by email and started exchanging some casual chatter with them and asked for help in hacking a made-up stepsister of mine.
14:23
So I posted this sort of dumb fellow who didn't exactly know what he was getting into, but needed as much information as possible to potentially help in doing the actual hacking. So I tried to learn as much as possible about their activities. The honeypot was a site I claimed to be this mythical stash of revenge porn stretching back many, many, many years,
14:47
and it had been able to operate for many years without being investigated and with no consequence towards the admins or the users by an absolute user trust. So all the users were staked in some way in this site,
15:03
and it was a really tightly knit community, which was non-existent. And I cautiously mentioned this site in emails with the hackers, and I told them that I was a long-time member of this site and that I could potentially send them an invite, which I eventually did.
15:21
I told them, well, we've exchanged all of four emails now, so I trust you with the most mythical site in history. And the invitation process was coded specifically to, it was built like a workflow to pique their curiosity and pique their interest in whatever was behind these walls.
15:42
So we've had many steps where it went in great length in describing, without being too explicit, what was on the inside and what was expected of them as users and blah, blah, blah, blah. And a final step, as for their phone number, you had obviously, as a user, got to stake something in the site
16:01
to be let into this mythical, superior, fantastic, abusive, nasty site. And many of them actually supplied a phone number to me, and I was able to verify this phone number because I actually made up two-factor code, which I could see that they tried to send back to me,
16:21
but it all obviously resulted in an error because there was nothing to access on the site. But I got several to trigger the honeypot, and I got several residential IP addresses, a couple of proxies, a few phone numbers, which were verified, which we actually, only much later with a couple of journalists,
16:43
were able to contact and get their story as well. But the most interesting hit I got was from a guy who didn't leave his phone number, but he had the same IP address as the administrator had delivered to me a few weeks earlier.
17:00
So at this point, I seemed to have a connection between this original poster and the hacker's anonymous Hotmail account. So this could be the same person. And I went in to investigate, see what I could find about this email account,
17:20
including the password reset information on Hotmail.com. And at the time, it pointed to a Gmail account, which started with T-O and some unknown bit and Gmail.com. I also checked up on the markup of the password reset page
17:42
at that point, which has since been changed. But at that point, there was a hash for each email listing that you could reset a password to. And the thing is that this hash wasn't sorted or specific for one source account. So if I was able to somehow input the same connected account
18:04
into one of my fake Hotmail accounts, I'd end up with the same hash. So I had a way of automating the processing and finding of his connected account. And around the same time, I was investigating other angles.
18:20
I was looking into the girls' networks, and they weren't directly connected in any way. They weren't Facebook friends. They weren't living in the same place. And they didn't seem to have anything to do with each other. But I did share an organizational connection, and they had some mutual interests and a few mutual friends on Facebook and other social media.
18:44
So I automated the hash lookup on Hotmail with a list of emails that I scraped from here and there, including those of their mutual Facebook friends. And I got a hit for one of the mutual friends.
19:03
So I knew at this point that some hacker at Hotmail.com, which isn't this real hacker email, but still, the Hotmail account was connected to a real name at Gmail.com account. And this guy was a politician with a vast social network
19:22
and a trusted position as well in the party's social media office. So he had a lot of connections that could potentially be abused to find his victims. At this point, what I had was a residential IP address
19:41
that had been delivered from the administrator of the forum for the first known posting on pictures. I also had a matching IP address for the hacker that I had communicated with. And I had a connected real name-based email account for this hacker.
20:03
I wanted to be absolutely certain that there were actually connections there before I went to the girls or the police or lawyers or anybody else and discussed this information. So I did the following. I sent the hacker a URL and got the hit from his IP address
20:22
and immediately sent another email from another fake account, which was made out to be sort of a... I was posting as a young girl sending an email to another friend of hers trying to pique his interest since he was basically into abusing people.
20:42
I figured that was a good way to get his interest in this email as well. But I got a hit on the URL I sent him from that fake email account from the same IP address. And it also showed the same operating system, the same browser, the same browser plugins, and basically the fingerprint of that visit was the same as the first one.
21:05
So I was pretty sure at this point that I had him, that he was the same guy. And I went to the girls with the information and I convinced them that there's certainly something worth looking further into here. And we went to the police once again
21:21
and updated the report that was handed in with the first formal report. And they were still uninterested. And they were still uninterested after being worked on by a large law firm for five or so months. While those lawyers tried to lobby the police to take the case,
21:45
I went in trying to identify other victims. And I figured that might... If I could get somebody else or even several people to file reports, that could add enough pressure for the police to actually launch an investigation.
22:02
And during a couple of the exchanges that I had with the hacker, I kept in touch with him for many months and tried to siphon out as much information as I could and also build a text corpus in case I wanted to see if there were any specific similarities
22:21
between the texts he was publishing through social media and so forth as in my exchanges with his hacker persona. But he told me that he had found another girl and he sent me a couple of pictures of one that he claimed was from a Norwegian blogging portal. He wouldn't tell me who she was, no specifics at all, short of her being Norwegian.
22:47
So my approach to finding out who this was was to automate traversal of the tens of thousands of blogs that were on this blogging portal and index email accounts and the rest of the text.
23:06
Each of the emails I would find in the text on the blogs, I would match through Apple's create a new Apple ID web service and get an error if it was already connected with an Apple ID account
23:20
and thus know which of the emails were actually iCloud connected, which was what he was looking for as well. So I was able to do that with a huge amount of accounts in a fairly short while. And I crossed these matches with a few free text searches that I also did in the index that I built
23:41
based on the information that he provided me with. And I narrowed it down to a couple of hundred posts. And I scraped images from those blog posts and manually compared the pictures to what he had sent me and eventually I found her. So I contacted this girl and she was shocked, obviously,
24:03
not the most pleasant sort of contact to have with a random stranger who basically tells you, hey, I'm really, really sorry, but I've seen a topless picture of you when you're holding your baby.
24:20
And nobody else had seen that picture. She hadn't shared it with anyone. It was on her phone. This guy had downloaded her phone backup and spread it all around. So I apologized furiously but told her that I really needed to meet her and I needed to explain the whole situation and she agreed.
24:42
And she also pressed charges against the hacker. The police did nothing still, though. They dismissed that case as well. All in all, it took around six months for them to act at all. And it took press involvement in the end.
25:02
And I really wanted to avoid press involvement. I really wanted to stay anonymous in the press publication. And I wanted everybody to, I basically wanted the case to go away. I gave the hacker ample opportunity to come forward, apologize to the girls, and just come clean and stop what he was doing.
25:23
But it didn't. And he even went so far as hacking his girlfriend at the time to control what information she received regarding this whole case. So he was notorious in his doings. But eventually the police dealt with the hacker.
25:43
And one year later he was sentenced to 60 days, of which 30 were forced to be served in jail. So we succeeded, I suppose. And it sent an example to other hackers out there. I'm sad to realize now that the example hasn't been very formidable
26:08
because people are still being hacked and still being exposed. But it's something, and it's more than nothing in terms of dealing with revenge porn and hackers and so forth. The hacks, though, were basically really simple.
26:23
He was doing iCloud password resets on one end. And he had installed a keyboard logger on his own computer. And he was offering this computer to girls and contacts that he had if they wanted to log on to social media. So, hey, come use my PC.
26:40
And he swiped their passwords and tried to access their accounts later. Now, resetting an iCloud password required a date of birth as well as a secret piece of information at the time. I mean, a date of birth can't really be seen as a secret.
27:01
And with social media profiles, neither can the color of your first card, the name of your first pet, the name of your favorite teacher, and so forth. These aren't secrets. We, as developers at least, need to just quickly move ourselves from the something secret but not secret will let you in and basically take over a person's life.
27:20
We need to stop doing that immediately. He, though, downloaded backups from iCloud. And he also downloaded not only pictures and videos, but also messages and texts. And he was specifically looking for explicit content, obviously, but also passwords. And he tried with those passwords to access email accounts,
27:44
any other accounts, with the intent of linking any email account with a similarly sounding account that he created. So if he hacked lisalala.com, he would create lisalala.live.com and link those two. And his goal was to essentially give himself
28:03
eternal access to their iCloud connected accounts to be able to reset their passwords at will, even if they change the password reset questions and answers. He was able to do this with 30 girls. We still don't know how many girls he hit in total.
28:23
But since he succeeded with 32, I think, that he admitted, there's reason to believe that the number he tried to hack or hacked and didn't find anything or hacked and didn't tell anyone about is much higher. In retrospect, though, would I have done the same again?
28:41
Well, for better or worse, my life hasn't been the same since. I've had opportunities, I've met people, I've done things that I couldn't otherwise have done. There have been certain brands that I'd rather have avoided, such as the whole super hacker thing.
29:06
I feel that it doesn't make things easier for me should I go back to my regular consulting sort of job. But still, more important than me and my stuff is the fact that I could have easily destroyed a lot of people's lives in doing this.
29:22
I could have basically destroyed the innocent people that I've outed, that I've made accusations even in passing about, and I could have wrecked even the victims' lives and their families' lives, especially after the press got involved. All of that was my responsibility.
29:43
Everything that would happen from the moment I decided to take this case would be my responsibility. And also, I continuously ran the risk of meddling with police affairs. So I could have hampered police investigations and future investigations by doing this.
30:01
But would I have done it again? Well, basically, it is a problem that's hard to ignore for me. And so all else being equal, were I put in the same situation again, I guess, yeah, probably. But knowing now what I know, I wouldn't have done so easily.
30:22
In March 2014, I started working for BG as a direct consequence of this case. I was there to do a consulting gig for a little while on security and some investigative journalism, and I landed a full-time job. And my job and time at BG has been a tale of opportunities
30:46
that I really didn't think were possible. To work with a news organization like that on the projects I've been allowed to work on has been really rewarding and has felt meaningful and has had purpose,
31:01
which I felt were lacking earlier in my professional life. And if anything, it proves that it is possible to get out of a rotten situation, which I feel that I was in up until 2012. And there's no reason to just sulk and feel that,
31:21
hey, we're poor, poor developers and we can't get anything reasonable and purposeful done. There are things to be done. You can work for the press, you can work for the police, you can work for a lot of different organizations with really meaningful things. In either case, my time at BG has been diverse. I've done a lot of things, but most importantly,
31:42
I feel is the project where we've been chasing child abuse consumers. So in March 2014, when I was hired, a colleague and me, we had already for a while been investigating this, but we were basically looking into tying up loose ends from the iCloud case
32:01
and found that there were sites connected to the sites I was monitoring that was used to spread child abuse material, pictures and movies documenting raw and brutal child abuse. And so a video to introduce this to those of you
32:23
that haven't seen the end result.
34:13
The I.O. knows more than we know, he knows how to deal with the most dangerous, whether it's a film or a game,
34:20
so he tries to deal with the most dangerous things. That's right, that's right. Are you sure you're going to be able to deal with the game after the film? I'm sure that I am. I'm sure he's going to be able to deal with it. And you, the other hand, are you going to deal with the game after the film?
35:06
How do you think?
35:51
So, as I was saying, we discovered sites spreading these pictures and videos while
36:02
chasing loose ends from the iCloud case. And we spent more than a year researching these sites, their discussions, their users and so forth, and really got a sort of first-hand impression of many of these people.
36:24
And we found a network of hacked sites that were used to push people towards these file sharing hosts which served this content. And the users that we saw seemed to mostly find these forums through search engines,
36:42
such as Google and Bing and so forth. And they would see a link, for example, five-year-old girl comes. And they would click on this link, actively wanting to pursue that content. And they'd get perhaps to an advertisement page with even, in many cases, a timeout they'd
37:02
have to wait for a while before continuing. And they'd once again click, I want to chase this content down. And this could be repeated multiple times over before they finally got to a file sharing site and had to either register or log on. Now, in many cases, they'd also have to pay before downloading the content.
37:22
And so my colleague and I figured that is at least proof that by paying, there's some sort of economy here, by paying you're, at the very least, funding the facilitation of spreading child abuse material. You're funding the sites that make this content available. And these are sites that do not, to a large extent, seem to care what's on there.
37:46
They seldom respond to removal requests, and then it just continues to be there. Some of the payment is also shared with the uploader. So there's the possibility that there's an economic incentive to making the content available
38:02
in the first place. In either case, the downloader finally gets to download the file. This is logged in a public log file from this rotten piece of software, which is picked up by search engines, and that's where we come in. So we, after months of researching and narrowing down our results for these sites,
38:21
we found log files and we started to systematically download these files. In the end, we had gathered information about 36 million downloads, represented here in dots on the board. Not all of the files behind these 36 million downloads were child abuse,
38:43
contained child abuse material. So the question was, how could we tell one thing from the other without actually downloading all of the files and looking into them? And our initial considerations was, well, we had a lot of data. We had timestamps, file names, unique file codes.
39:02
We had refers. We had email addresses and usernames for the people who had done the downloads. And we knew that several of the downloads were connected somehow, related somehow. And the goal was to reduce this 36 million downloads and many, many, many
39:21
300,000s of files, sort of chaos to something manageable. And we were interested basically in knowing the possibility, the probability for a certain file contained child abuse documentation. So we knew for a fact that a few files contained child abuse material.
39:42
And we knew this because of the context from the many sites where it was linked. Quite explicit file names and descriptions and also thumbnails that leaves you wanting to gouge your eyes out and also names of the studios that produced the content,
40:03
which can be read all you want about in wikis on both the dark web and the open web. So many other files where there really was no doubt early on. And we had many of these files also verified by police later.
40:23
But we knew at that point for a fact that these contained child abuse material and we wanted to use that to say something about the probability for the other files as well. So the log format was as such. And I figured that if this file was for sure child abuse,
40:41
I'd see which IP addresses and which email addresses had downloaded that file and see which other files had the same email address, the same IP address, and also the same referrer downloaded. So very basic stuff. But that allowed us to make this network between the files and the downloads.
41:03
So my goal at that point was to take this pretty basic theory and rapidly experiment, make filters and visualizations and just tools to explore the data set further, all the while collaborating with colleagues that weren't as technical as me.
41:25
And so I needed a tool to help me in this and at the same time keep reload times as low as possible and just minimize all the drag. And I settled on Python and Jupyter Notebook,
41:42
not because Python is the fastest language in the world, but it's reasonably simple to read even for people that aren't programmers. And Jupyter Notebook is a great tool for adding snippets as you go. Even in a meeting, you can have a large kernel with a large data set already loaded,
42:04
and you can feed snippets in there to make new graphs and new map plots and so forth to present the content to people that not necessarily understand pure text data, as many of us tend to do.
42:22
So this was the key, I feel, to collaborating between the coder, me, and the journalists on the team. And because of our choices technologically, we were able to come pretty far in a reasonably short time. And a full analysis process that we came up with was to find,
42:44
for each cycle of this process, find the related files from the current set and to limit all the results to Norwegian IPs, then move on to exclude known proxies, Torexes, VPNs, and so forth from the results, also false positives and other filters that came along
43:02
as we built a greater understanding of the files and the content. Finally, I'd do a weighted filtering of these files. And that was sort of the key to the whole process. And I built this mock-up demonstration of this weighted filtering
43:22
for a journalism conference a while back. And it helped the journalists understand what I meant. But the point is, if these are all files and you see the connections between the files, I could basically flag certain files as being bad and for sure contain child abuse material.
43:42
And this would in turn affect other files related to it in a number of generations and say something about the probability for those other files as well. And there were other factors involved there. If a user had provably downloaded mostly child abuse, that would affect the probability more than for a user
44:02
that had downloaded a little bit of everything. So there were more factors involved, but this was the basics. And then you could, after a while, go in and also flag something as definitely not child abuse material.
44:20
And as you found false positives and so forth, files confirmed or other files verified by police, you can tag them like this and reduce the data set to something manageable. So you'd go from hundreds of thousands of files to a list of just a few thousand in a fairly easy way.
44:46
And so the result of the weighted filtering fed back into a manual evaluation and we made new filters, we made new exclusions and so forth and continuously updated the whole program, which was then launched all over again.
45:04
So this ran in circles for months. And eventually we take the results from there and make attempts to identify some of the downloaders. Now, we weren't particularly interested in chasing down
45:21
those that tried the most and were the best at hiding. We were interested in just finding downloaders and be relatively certain about our accusations or impressions of their activities and then go out and confront them.
45:42
But the most basic identification method was basically to export large email lists from our logs and import those to address lists in an Outlook account and then feed that Outlook account with basically hundreds of connections into Twitter, Facebook, LinkedIn and so forth and find our friends.
46:06
And in many, many cases get real profiles back. So these people who were downloading or seemingly downloading child-based material were often using their own social media accounts to do this.
46:21
We also examined password resets pages for social media to further see which profiles were connected to pictures and then bring those pictures in and examine other social media profiles and try to find these many, many ways in from an IP address
46:42
and an email address and username to an actual identity as possible. And finally, we'd also use the IP address to, in the cases where there weren't known proxies or VPNs, involve further establish a likelihood of these identities.
47:00
In the end, we found 5,500 downloads from Norway of files that we were certain about the content in and 300 downloaders of those files. And we managed to identify with reasonable security 78 of them. Globally, the files were downloaded for 430,000 times.
47:27
And this isn't the full data set necessarily. In the analysis process, I very early narrowed it down to Norwegian results.
47:40
So had we done the same for other countries, this could easily look differently. But 430,000 downloads spread out over, I guess, around 95,000 IP addresses is still a significant number when seen in relation to the date, time spans, or our log downloads.
48:03
We were monitoring these sites for a year. But the data points we had were for around a month worth of days because the logs were rotating very quickly. So there are a lot of downloads hiding behind the scenes here.
48:28
In the end, though, we confronted 10 of the 78 that we found. And 7 of them admitted to have downloaded and in many cases paid for child abuse material.
48:42
As a direct result of this case, Norwegian police got increased funding. A few million, if I recall correctly. Not by any means enough, by our impression. So we're still working on related projects. We want to pursue this problem further.
49:02
And we're looking into the possibilities of international collaborations, spin-offs, and so forth. Having gone down this path, though, both the iCloud case, the child abusers, and other projects that I've worked on, would I recommend it to others?
49:21
And I'd say that working with the police organizations, even the press, is really preferable to being a lone wolf, such as I was initially. Because you run a huge risk of destroying people and destroying yourself. And if you go solo, at least stay far away from child abuse material.
49:43
In Norway, it is illegal even to familiarize yourself with where child abuse material is located. So do not go into that domain at all. I would like to see an open source initiative, both for tooling, research tools, download tools,
50:05
anything to collect data and understanding on forums, on file sharing sites, on image hosts, etc. My impression from dealing with police in Norway and from a visit and dealings with Belgian police
50:28
is that the tools really aren't good enough in many cases. I've seen download tools crash brutally when pointed from production machines
50:42
towards file hosts crash in ways that could potentially be used to infiltrate the machines that they were run from. And so we have a really long way to go in just tooling. So if you want to invest your skills at the base level, an open source initiative would be fantastic.
51:05
And I would also like to see an open source hash database of files from image hosts, I mean 4chan and on eBay and so forth. Just all of these hosts, to have a hash database would have been really nice for me when I got started way back.
51:26
Potentially profile database, but things are getting hairy once you include profile information and try to actually profile people, so you probably couldn't publish that anywhere. In any case, do you have any questions or suggestions?
51:43
If you do, do scream. So the question was, how long did the politician case take from when I started? It took me two weeks to figure out who it was, and it took me more than six months to get something done.
52:03
So I think the publication date was around July 27, 2013, and I started in late January. Even though I figured out who it was, my life was pretty much dominated by getting to the bottom
52:20
and getting people, police, whatever, to deal with it for those months. Anybody else? Just scream. I can't really see anything here. And if not, you're free to leave and pester me on Twitter or email or whatever later. Thank you.
Recommendations
Series of 13 media