We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

RECON VILLAGE - From Breach to Bust

00:00

Formal Metadata

Title
RECON VILLAGE - From Breach to Bust
Subtitle
A short story of graphing and grey data
Title of Series
Number of Parts
322
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Graph (mathematics)Convex hullComputer hardwareComputer-assisted translationInformation systemsPresentation of a groupDegree (graph theory)Multiplication signOperator (mathematics)Information securityData managementGraph (mathematics)
Graph (mathematics)Execution unitMultiplicationTerm (mathematics)Self-organizationGraph (mathematics)Cartesian coordinate systemMereologyQuicksort
TwitterGame theoryInformationMereologyMultiplication signDegree (graph theory)
Shared memoryDemo (music)Right angleSensitivity analysisQuicksortInformationGraph (mathematics)Self-organizationCategory of beingMereologyWebsite
VideoconferencingCartesian coordinate systemCodeType theoryQuicksortRight angleInformationRow (database)Level (video gaming)NumberSlide ruleTransformation (genetics)Computer animation
Service (economics)Hill differential equationDifferent (Kate Ryan album)Functional (mathematics)User interfaceInformationCartesian coordinate systemRight angleDemo (music)Traffic reporting1 (number)QuicksortTwitterSet (mathematics)Single-precision floating-point formatWeb 2.0Software repositoryComputer animation
outputFunction (mathematics)Address spaceFunction (mathematics)VideoconferencingoutputWeb pageRight angleQuicksortSingle-precision floating-point formatIP addressEmail
System callPersonal identification number (Denmark)Direction (geometry)Term (mathematics)Right angleInformationWebsiteNumberGreatest elementLeakWeb pageAssociative propertyGraph (mathematics)Information securityBit
Execution unitGraph (mathematics)Web pageRight angleWebsiteComputer configurationMetadataIntercept theoremLibrary (computing)QuicksortMultiplicationSearch engine (computing)Graph theoryServer (computing)Different (Kate Ryan album)InformationRadio-frequency identificationType theoryTerm (mathematics)Cross-correlationSingle-precision floating-point formatCodeNumberOrder (biology)Differential equationShared memoryBlogMultiplication signScripting languageSlide ruleCurveLink (knot theory)AliasingResultantNon-standard analysisGoodness of fitFirst-order logicSinc functionGoogolAnalytic setGraph (mathematics)
InformationDrop (liquid)IP addressQuicksortDifferential equationService (economics)Level (video gaming)Right angleControl flowCross-correlationPatch (Unix)
TwitterGraph (mathematics)outputInformationTraffic reportingPhishingSheaf (mathematics)Term (mathematics)Computer animation
Graph (mathematics)Execution unitRight angleDemo (music)SoftwareCircleFamilySheaf (mathematics)TwitterLevel (video gaming)Different (Kate Ryan album)Multiplication signGene clusterGraph (mathematics)FacebookGraph (mathematics)Data structureSet (mathematics)HierarchySelf-organizationConnected spaceCounting
InformationPattern languageTerm (mathematics)Right angleShape (magazine)Connected spaceMereologyGraph (mathematics)Gradient descentFlow separationComputer animation
Digital photographyTwitterTerm (mathematics)Graph (mathematics)Office suiteScreensaverTouchscreenInformationGroup action1 (number)Computer animation
Demo (music)System callNumberView (database)Right angleMotion captureTerm (mathematics)VideoconferencingGraph (mathematics)Computer animation
Infinite conjugacy class propertyTable (information)System callRight angleInformationMultiplication signNumberType theoryGraph (mathematics)Graph (mathematics)Computer animation
Different (Kate Ryan album)MultiplicationNumberGraph (mathematics)Right angle1 (number)QuicksortLink (knot theory)System callGreatest elementCASE <Informatik>TelecommunicationComputer animation
EmailEmailInformationDifferent (Kate Ryan album)Graph (mathematics)NumberDomain nameAddress spacePasswordSelf-organizationRight angleDirection (geometry)Personal identification number (Denmark)State of matterPosition operator
Execution unitIP addressWebsiteInformationInternetworkingSoftwareStandard deviationEmailRight angleWeb 2.0Electronic mailing listFreewareSelf-organizationAddress spaceLevel (video gaming)Message passing
DatabaseFile formatCore dumpType theoryMenu (computing)Domain nameQuery languagePairwise comparisonMultiplication signSubject indexingProfil (magazine)Statement (computer science)IP address1 (number)
Metropolitan area networkAddress spaceMyspaceEmailSlide ruleProfil (magazine)QuicksortProcess (computing)Right angleDomain nameMultiplication signCore dumpTerm (mathematics)BitIP address
Meta elementHill differential equationStandard deviationInheritance (object-oriented programming)Field (computer science)Differential equationMeta elementTerm (mathematics)Multiplication
Infinite conjugacy class propertyRight angleDot productDomain nameDirect numerical simulationIP addressTerm (mathematics)Block (periodic table)InformationServer (computing)WebsiteDifferent (Kate Ryan album)Letterpress printingChief information officerComputer animation
Level (video gaming)SpacetimeSoftwareNeuroinformatikCASE <Informatik>InternetworkingRight angleSelf-organizationProcess (computing)Computer animation
IP addressQuicksortWeb pageInternetworkingOffice suiteSoftwareWeightBookmark (World Wide Web)MathematicsMultiplication signBlock (periodic table)Right angleNeuroinformatikEmailComputer animation
EmailIP addressDemo (music)InformationBlock (periodic table)Greatest elementRight angleProcess (computing)Address spaceFerry CorstenNormal (geometry)Computer animation
PasswordWeb pageConfiguration spaceFacebookInformationWebsiteAddress spaceTwitterEmailPivot element
Execution unitPhysical lawInterior (topology)InformationInclusion mapSineReliefRight angleInformationEmailInformation privacyCuboidLink (knot theory)FacebookInternetworkingAddress spaceNeuroinformatikComputing platformNumberBitMultiplication signMyspaceAliasingQuicksortOcean current
Right angleSource codeInformationService-oriented architectureOffice suite
Interior (topology)Confidence intervalBit rate
Email
Transcript: English(auto-generated)
Okay. So, I have the delight today of introducing our keynote speaker, Andrew McPherson. We're delighted to have him here. He's the operations manager over at Perturba with a degree in information systems and apparently an uncanny knowledge of cat memes. So, I don't know how many we're gonna see today. He's now been with Perturba since 2007. So, in over a decade, he's managed to, and I will read this from here, he's got a
delighted to have him here. Aside from multi-going, he's also got a keen interest in hardware and security. So, I will now invite Andrew up. Hopefully, everything's gonna technically get connected and work for his presentation, Breach to Bust.
Okay. Hi, everyone. Welcome to the first talk of the day. I'm surprised it's very full, very early for Vegas time at least, 12 o'clock. Although, I'm quite impressed that the village
has like a 12 o'clock first slot. That's more my kind of style to be at that time. Okay. So, my talk is called From Breach to Bust, a short story of graphing and gray data. So, we'll look at some like breach data and like some gray things in terms of how we use it within Multigo or within any sort of graphing application and what we can do with it from
there. So, I'll do a little bit of like who I am and Multigo. Then we'll look at some common OSN tools and I know some of them are being presented here later. Then we'll look at graphing, so like how we make sense of relationships with layouts and sizing and things. And then we'll look at some of the OSN tools without the OS part or like
using gray data. Okay. Is that better? And then I've got a nice example of the breach data and how we'll use it to kind of find something based on like one of the organizations while the microphone is smooth. Okay. And then we'll look at like what I think
of OSN in the future when we're going to do it. Is that okay? Hi. Yeah, that's right near
my face. Okay. Can everyone hear me now? Maybe. Oh, okay. I'm just going to switch to this one
then. Can everyone hear me now? Okay. This one better. Okay. Unfortunately now you have to hear the whole talk in the back. Okay. So, with that, my name is Andrew MacPherson. I'm Andrew Mac on Twitter. I've been a bit over about 10 or 11 years. So, I have an
information science degree. I spend a lot of time on flights. As you can see there, I've sold the game 2048. It's the only thing on my phone a lot of times. Generally, I talk too fast. I'll try and fix that. Also, last year, my hair was white. My friends are terrible. So, they called me Malfoy for a long time. They don't even let that go. And
then a part of this thing called Shame Con. Okay. So, just a little disclaimer at the beginning. Like this is not a sponsored talk, right? I know that like as Poterbo, we do give some licenses for the prizes, but like this is not related to that. Then some of the information that we have, like it might show details about individuals who work like
an organization, but like just be reasonable human beings. Like don't go and say like, oh, we're going to find it or tweet at them or share that or go look it up. I think that's quite an important part of like OSN. Like as soon as you get sensitive data on someone, like it might be cool to show your friends, but like it's not really that cool to be like, here's everything about a person without the consent and everything.
Then this is not going to be a vendor talk. Like, so I'm going to use Multigo a lot, but I've also been there for like 11 years. Any sort of problem that I see, I try and solve with Multigo. So, someone's like, we should go eat. I'm like, I got to go out for this. So, I'll be using the tool, but like I'll talk about like, hey, we can do this
in many different ways. And actually some of the talks later use some of the stuff that I'm talking about. Then I do have some demos. So, if that works, good. That would be great. Okay. So, if you know what Multigo is, you can go to our website. There's a ton of tutorials and videos, but basically it's an application that looks like
this, right? You can drag in like any sort of information. You can change it to whatever you want. And then you have the ability to run like a set piece of code. It's like a very small thing that takes one type of information to another, right? That's the key concept of it. You just, here for example, I'm just looking up the MX records and then I
could take that piece of information and these are just MX records, but it could be anything. That could be someone's name or an ID number, anything else. And then I can take those to another level, right? So, we're just graphing stuff out. And each one of those small, what we call transformations, is like a small piece of code or a small application that we can integrate with. Okay. So, I have this slide, so I talk slower.
Okay. I'm talking slower. So, if we look at like OSN and some of the stuff that we do, it's one thing that I find really interesting is that if I talk to someone who does this, they've got like a whole tool set of tools, right? If you need to get like historical Twitter data, there's like six or seven different repos that you can go to,
different applications that you can use, and everyone that I speak to has like a ton of different ones, right? Like, if I speak to someone, they'll be like, have you heard of this tool? I'll be like, no, that's weird, right? Because there's like thousands of them. So, some of the ones that people use commonly, I mean things like Harvester, Recon NG, have been formed Intel techniques, so like the web interface, skip
trace, and some of these are being used later today as the tool demos, but basically like all of these have a function where you can use the tool to get some sort of information, right? So, these are what are called single layer tools, and basically what you do is you provide some sort of input, like whether it's email address or username,
IP address, whatever kind of stuff that they've got, and then they'll go and mine it or do whatever. So, here's just an example of Recon NG. So, this actually I stole from a video, but it's from booking.com, and it just looked for like host names, email addresses, and then it generated this output, right? So, it's a single layer output. It's like one
page, and it says here's all your stuff that you've got, or they can just produce the output on the command line or any other way that you do it. And these tools are really good, like we need these tools, we need more of them, but I'm going to talk about where we go from using something on a single layer, or what I call single layer tools. So,
if we look at something like that's on a single layer, so anytime we can click through something like it's a page, and say for example, this is just an example that I did earlier, but let's say that you could look up a VIN number, like you can go to the DMV website, and you can say I put in someone's name, and I get the VIN number of a car, right, of their engine. So, that makes sense. People write a tool, they say,
okay, I go from this piece of information to the other. That's fantastic, that helps, but unless we can pivot on that information, then we're going to start losing stuff. So, here, I've just got an example. So, the top left is Andrew McPherson, that's me, I'm good. I can go from like my name to a VIN number, or my name to a social security number, like maybe there's a site that leaks it, something that I've used to get that. But
what I can do in the second example is I've said, okay, I've taken the things at the bottom, and now I've run it again. I said, okay, well, I've got a VIN number, show me, show me the people associated with it. And with that, it just points back to me, which is how it should, right? It shouldn't be other people associated with my social security number, or with my VIN number, or anything like that. So, that's, that's
really nice for me, I mean, it kind of makes sense here. But as soon as I go to these people, who are obviously bad, so that's Badger McPherson, it's like me, but very bad. And here I can see, okay, there's a social security number, and a VIN number, but as soon as in the second step, I say, okay, well, let me see who else has these social
security numbers, then I get, oh, there's another person who's sharing it, or there's two other people with a car. And obviously, that's something we want to look at that's bad, in terms of this kind of data. So, you kind of think of it like a phone book. So, if I've got a phone book, I can be like, okay, I go through the phone book, and I say, here's Andrew, and he has his telephone number, right? So, I can call him, that's
fantastic, it serves its purpose. But as soon as I want to find out, like, who else has the same number that I do, like, I have to go through the whole phone book, right? And that's a pain, like, we need to be able to have a way to say, okay, if I can go from one direction, then I want to be able to pivot back, um, to go in another direction for that. Okay, so, I'm going to do a little bit of, like, graphing, and the graph
theory, but of course, my slides are done with Multigo, but like, there's tons of stuff that's available, right? There's like D3 and Gephi and stuff, and especially, like, the hard work that the Google Chrome team put into JavaScript, like, those JavaScript libraries now are fast, um, and they are options that you can have in terms of, uh, graphing different stuff. So, obviously, since I've worked at the server for 11
years, like, there's no way I can even see a graph that's not Multigo. Um, and there's two basic things that I'm going to talk about here. So, the first is, like, if we look at orders, so, a lot of the time we want to say, okay, we're only going to find good stuff after, like, a certain amount of orders. So, basically, the number of orders is bigger than two. Um, and if we look at the first one, then we have, like,
fast, uh, single order links. So, an example I gave is, like, let's say you've got five individuals or five aliases or five names that you need to look up, right? What you can do, you can just be like, okay, I go to Google, I put in the names, go to whatever search engine you use, and I go through the results, right? I write them all down, I put them in an Excel sheet, that's awesome, you'll have it for one person, then you go
to the next person, the next person, and now you've got these, but now you don't have any correlation between them, right? Maybe I'll say, okay, well, actually, I've seen the same person's name on one page, but what you'd like to do is you'd like to say, okay, I've got five different people, show me all the results, so that I can see, like, hey, these five people, three of them are mentioned on the same website, maybe
different pages, but now I can say, like, hey, this is probably somewhere that they're all connecting on, even if it's not public. So, I can look at, like, the metadata around it, um, essentially to be able to do that. And then also, like, the things that do the first order stuff are those single layer tools, stuff that, like, let you quickly do, uh, take one type of information to another. So, here at the top, um, what I've done
is there's just, we use, there's something called built with, and you can go and look at what's in the page in terms of relationships. It just shows you, like, Google Analytics and things, because what I wanted to find, uh, in this example was that I can see, okay, does anyone share the same Google Analytics code, right? Because then I know, like, oh, they're all probably connected to the same account, this is
something that's useful for me. But I'm not gonna go to every single page, like, I'm gonna run a script, or I'm gonna run an application, and over here, I've taken three different websites, so just perturva.com, abso.com, and nsa.gov, right? And I just said, show me these relationships, go through the pages, get that out, um, and here it's got those different results. Okay, but now because I can pivot on it, I can get to a
second order, this is where it becomes more useful. So, over here, you can see that there's a couple of different sites, like, for perturva.com, uh, there's one of our blogs, um, and then for the others, I can see other sites owned by that bank. Um, and now I can see, like, for nsa.gov, there's a whole bunch of different sites, and some of these are okay, like, some of these are the sites that I expect, so that one says, like, m.nsa
or nsa.army, but there's a whole bunch of other sites that I don't know about that now I can link up because I can do this correlation. So just say from one step to another, and then from there I can go on. Alright? And then we're gonna start looking at, like, how we can graph these together. So, if we look at a basic graph, like, the big important thing is to say we go in the smallest, smallest steps that we can, right? So,
here at the top is, we've got a bunch of IP addresses, and we said, what's running on these ports, right? So, we ran nmap or whatever, we got the ports back, and there we can see, ah, there's a bunch of stuff that's running on port 80. Like, that's useful, for sure, but actually what you wanna do is you wanna say, I wanna break that down into smaller steps, so here in the, in the bottom graph, what I've done is I've
said, okay, I've got all the same IP addresses, and now I've taken them to the port and the service, so now I have the ability to see, okay, I can see the correlation on, like, a second order, and I can see that there's, ooh, here's my mouse, yeah, I can see that there's multiple things running, ah, say, for example, a patch on 80, or something on 80, 81. But, yeah, because I've broken it down into different steps, I now can
see, um, that sort of information that's available. Okay, then, obviously, like, graphing gets much further, so this is just a tweet that I saw, ah, online from something called Phishing AI, um, or Phishing AI account, and they just, like, tracked all the different things that were involved in that iOS MDM attack, um, and actually, they found something else. They said, like, hey, using, um, graphing and using
the ability to go from one small piece of information for another and keep them connected, they could find, like, another, ah, piece of infrastructure that wasn't included in the original report. So, that's kind of where you're gonna start using this, um, in terms of that. Then, there's just three sections that I'll talk about in graphing, and then I promise there's a good demo at the end. Just stay for that, right? Set your alarm. It'll be okay. Um, so, the first thing is that
there's, ah, different layouts. So, that's how stuff is laid out, right? And this actually makes sense. So, when I look at other graphing tools, a lot of the time, they just have, like, a fixed layout, um, and it's usually this one at the bottom right, but what you want to do is you want to have these layouts. So, the first one is just called hierarchical. You just go from one step to another. So, if you're doing,
like, a very structured investigation or something on network level, like, that makes sense to use, right? But if you're using it on people, you're not gonna use that layer. Um, then the second one that I've got is called circular, right? Because it makes things in a circle. It's very clever, these names. Um, so, this one is, basically, all the things that are connected are in that big circle in the
middle, and if they're not connected to each other, then they're further outside, right? So, I can quickly say, hey, these things in the middle, these count. That's what I'm looking for, right? And then the last one is called organic. So, what that does is it just puts stuff as close together as they are related, which is super useful if you're looking at, like, social networks or something like that, because I can say, well, okay, if I've got, let's say, my
account on whatever, Facebook, Twitter, any of the social networks, I can say, show me all my friends, and I can get all my friends out, right? To everyone that I know, and then I can say, show me all of their friends. And because they're connected on, uh, organic, so, they're placed on the graph how close they are together, I'll see, like, different clusters. So, I see things like,
oh, all the work people know each other, and some of them know me, so that will be a cluster, or all the people who are family members, all the people who are from the same town, so that you can kind of use to say, okay, cool, we can look at that. And actually, we did an investigation a while ago where someone was, like, doing something ridiculous, like selling meth on Facebook, and they'd become friends with all the people that they were
selling to, right? Which, if you're going to do this, don't become friends with them. Just like a 101 on that. But they found, like, okay, so we looked at the graph, and we looked at these relationships, and obviously, there were, like, a lot of people who were, like, friends or family, like a cluster that you could see, and then there was just, like, one huge cluster of everyone that didn't know anyone else in the graph was further away, and then only
connected to the target that we're looking at. And of course, like, if you figure out that some of those people were buying, like, the rest of them were probably buying as well, right? So, you need the different layouts to be, like, cool, we can figure out where to go from here. The next thing is sizing. So, because we are people that are really good at being able to spot,
like, patterns and stuff, we want to use as much, in terms of the layouts and the sizing, that allow us to identify this stuff. So, we say, for example, here, we'll say, okay, well, show me things that are important on the graph. Show me things that have got lots of connections coming in, right? Stuff I want to see. But if I look at this one over here,
this makes sense because if I just said, okay, only the amount of nodes connecting to this, that's what I want to size on, right? You don't want to do that. So, this first one over here just says there's two things that come into one other piece of information, okay? So, we're not going to make that the same as something where there's two things that come into one piece of information, but have separate parts, right? It's what's called diverse descent.
Because here, you have something like, oh, there's someone that has a typo in their name, and they both come to the same whatever piece of information. Then I don't care that much about that, but I care if I've got two, like, completely discrete parts that go to one piece of information, right? Because then I'm saying, hey, lots of stuff is pointing to this. So, here we can look at, like, you want to change the size and the shape
based on what information you've got. Then the last thing is collecting or grouping. So, this is really important in terms of if I'm looking at information. So, if you look at that first graph, like, that's great for a screen saver, like, you're at the office, you need to look important, like, have this, right? That's like just Twitter followers. That's just like a mess in terms of a graph
because there's so much information on that graph, and most of it you probably don't care about or need. So, the second one, like, we've collected it slightly more. So, we've said, okay, well, here are all the ones that are connected, and here are just things, like, where people have a million followers that aren't connected to the group. So, that makes no sense. And then the last one, I've just made it, like, much smaller, and this is easier to do. So, I've just got three accounts here, and I say,
because I put these in collections, I can quickly see the groups that I want. So, this is like, oh, there's 23 between all three of these accounts. There's 200 between these two, and here I can see the outside. Like, I don't have these 3,000 extra nodes filling up my graph in terms of that. Okay. So, I just want to show you a demo of this quickly.
So, what I did is a company called DFR in South Africa, and I said, hey, do you mind if I get some captured data from a phone that just says, like, you went and did a pickup of a phone, you got all the call data. And then I want to just, like, view the call data. And I'm only going to look at, like, outgoing numbers. So, one number called another number, that's all the data that I took from it, so I can illustrate, like,
why we'd want to do graphing in terms of that. Okay. So, don't worry about what I'm doing. There's probably a video or tutorial of some kind on this. Okay. So, I can say import graphing table. I'm just going to select it. So, let me just show you. It looks something like this. Okay. So, it just says number, date, time, call, type, and I can see, okay,
this is the information that's available. So, someone just pulled it off a phone that they picked up. So, I'm going to take the first one, and I'm just going to put it into my graph here. Okay. I just look at it. It just says, cool, we're going to go from the number that you've got
to any of the things that you've called. So, next, I'm just going to put it in here. Okay. And this is fine. Like, this is totally what I expect. There's a number in the middle. It called a bunch of numbers, right? And actually, I mean, Excel will be amazing at this. You don't need a graphing tool for this. Okay. But now what I want to do is I want to take, like, from multiple different phones that were captured in the same investigation. So, I could say, okay, I'm going to take two to nine here. All right. I'm going to do the exact
same thing. Okay. So, the same thing, just going from the number to whatever is in the name column, which is a number. And I'm just going to put it on the same graph. Okay. Now, because I've done that, I can now look at my graph, and I can start exploring it. So,
in this case, I'm going to switch the layout, right, because this makes a lot more sense to me. And if I start looking at this data, I can see, okay, here's two people, or let's look at the bottom here. So, here's two people who have made calls, and obviously, they've got one shared number, right? Excel could probably do this as well. So, if you need that, there's probably some wizard that can figure that out. All right. So, it just says, okay,
if I've got these two calls, there's a link between them. Maybe that's how they communicate. But if you look at the graph at the top, so if I use this over here, you'll see that there's this number over here, and there's another number on this side, right? And these two numbers have called different numbers. And because we've got all the data from those phones as well, I can see that in the middle over here, there's one number that's been called by three different
ones. So, I can see that the relationship between this node and this node is actually through another number. Okay, so now I start getting that data, and if I size it, then I can see like, hey, these are the connecting nodes that we spoke about earlier. So, this is where you kind of use that sort of stuff. Okay, so now we're going to get
into the more exciting stuff now that we did the graph. Okay, so we look at some of the OSN data. Obviously, there's a ton of different stuff that you can get to. And if you look at something like, have I been born, like, I'm in tons of these. It's great. Every couple of weeks, I get one of those emails that says like, there's more breach data with your
information in. But data breaches happen a lot, right? So, like South Africa had a phenomenal one where like 60 million were our ID numbers, like social security numbers got leaked, including people's names, where they work, addresses, just basically everything on everyone in South Africa, right? And then obviously, the Reddit one happened last week,
the week before. So, when they do happen, people are usually like, okay, we can do a basic audit of the data, like almost right away. You see something like, you know, when Ashley Madison came out, everyone was like, oh, state.gov is in there. So, people say, okay, well, what we can do is we take our organization, we look at anyone who has this domain, and we see if it's in there, right? Because either we want to laugh at them, or
we need to actually protect our stuff. And we can look at like, people look at how weak the passwords are, and if there's password reuse and things like that. But generally, they only start from like a domain or an email address that they already know. So, there's like a ton of work done on this, but they almost always go in the direction that the data was intended for. So, you log in and it looks you up and sees if your
password is right. And then secondly, like, if we're looking at this kind of stuff, like everyone is like, oh, you know, like, where would you find them? Like, they're everywhere, right? You can go like, so here's one of the sites that I'm not going to say I went to, but say that there was a site that looked like this. And you can just download like all of these different breaches. They're there in their own formats. And like, either
it costs something like $20, or it's free, right? Depending on how old something is, like they can download the Snapchat DB. I mean, tons of this stuff is like accessible, like it's on the open web. It's not even like hidden away anywhere. And it usually contains super useful information, especially if you're doing OSN on like organizations,
and not just people, like we'll look at the example that I do just now, but also on like on a network or an infrastructure level on these organizations, because I can start looking them up. So, usually the things like first name, last name, IP addresses, like either in sign up or the one that you used, messages sent to other users, email addresses, so like your
standard list of things will be in everyone, and then obviously more specific for each breach that happens, right? So, we can get these on the internet, and I think that we should start using them a lot more. So, the one thing is they do need a lot of fixing. Like a lot of the time, they're quite difficult because they come in different formats, and they're all different databases, and you've got to have all that stuff. And then they're really good for
like the way that they were intended, but you've still got to go and fix some stuff to say, okay, I've got a domain, I need to get to the various profiles, because previously they wouldn't have indexed the domain. So, you need to make a new column, you've got to get it indexed, it's quite a pain to do. Then the things like IP addresses, they've got the actual IP address written out, so you've got to go and convert it to a long, otherwise you can't search a big
dump, like it just takes too long. So, if you wanted to search like anything like any of the last really big ones or bigger ones, like if you did, there the IP address is like that, you could do the like statement, like you could go home, make food, come back from lunch, that query will still be running, right? And if we're looking at data, like we need it really
quick, like we need that stuff to come back so we can use it, so we can validate things that we've already got. So, here for example, like you need to convert it to long, so you could just do a quick in comparison or long comparison. So, I actually asked these people like, hey, should I put your name in the slides? Do you want to be known for it? But they haven't
got back to me, so for now, I'm going to say they are friends that I have that have the data, and they wrote some transforms for me just so that I can query it, because obviously they have had to go and do all this hard work, and they've got tons of the different breaches that I can use. So, the first sort of stuff is usually a little bit interesting, like I take my email address, I can see, or one of them, I can see like, oh, it's just in Dropbox and
LinkedIn, Myspace and things like that, and usually I can interact with the stuff in like what a forward manner, so say email to the profiles, or domain to the profiles, or IP addresses to the profiles, and then I can start looking at that, and then of course people are like, oh, if I am looking at this data, like let's say we've got Ashley Madison, or we've got the LinkedIn data, so we'll say, okay, well people shouldn't use their
work email address, right? That would be insane, like why would you register on LinkedIn with your work email address, because what if you need a new job? But of course that does happen, and we find it like all the time, so, but actually if we're trying to target people who like, so for example, we'll look at like the CIA or FBI, like
if you registered with your FBI or CIA email address on Myspace, or like Dropbox, like that's pretty bad, right? That's a bad idea, like even on like Ashley Madison, like you're sitting at work, you're like, I'm gonna use my work email address, that would be insane, right? So of the like 10 people who are in there who have done that, like those are not the people I want to target, or maybe they are the
people I want to target, right? Because they've done that from work, but actually I want to say, well, I don't care about them, I want to find out like the other people that are there, so how do I find out like, okay, if you work at the CIA, like how will I possibly find out that you're in these dumps if you didn't register with your CIA email address, right? Because what I want is I want like, oh, you're at Yahoo,
and then I can be like, okay, you're probably not there anymore, you're at Gmail, and then I can email you that. So we can look at like how we can go and explore that kind of data in terms of it. So now we want to say, okay, well, if we can look at a second order, we can correlate this data breach with external data. So I can do like the standard OSN stuff,
and then I can say, well, I've got this to enrich that data, and now the like meta or the less used fields are super, super important because I can start looking for interesting stuff in there. So I'm just going to do an example on the Ashley Madison one. So firstly, like, I know I keep talking about Maltigo, but like it's really good for
footprinting. So to show you an example, like, so we're going to start with like, let's say we're targeting the CIA. So hopefully this works, right? If you're using the tool, you say, I put it in a domain, CIA.gov, right? And I can say things like, okay, I want to find all the DNS. So this would be the same as using like any of those other tools, right? I'm just, it's just built into this, but you could do, you'd do anything. So I say,
just show me all the DNS, and you get like a bunch of different information that comes out. So the name servers are on Akamai, but I can see some of the websites. I can see like they've got relay 203, whatever the stuff is, right? So I've got a lot of this. And then I can say, okay, well, I take all of this. And I looked at it in terms of an IP address. And
they can say IP address to a net block. But basically, I can footprint this stuff. And actually, with the way that we do it, you don't even need to know what you're doing in terms of a footprint, right? So one of the things that you can do is like, because we're automating all this stuff, obviously, we also look at some of that. So I say, I've got a domain here. And I say, CIA.gov. And now instead of having to go through all of that, I'll show
you like, everyone here, I could just say, I run this thing called a footprint 01. And it just automates this thing. So look, like I don't have to touch the computer. And then it will go through this process. And then eventually, I'll have the kind of basic level network stuff for this organization, right? So in this case, the CIA, because what I want to do is I
want to say, well, if I can figure out all of the network space, then I can start saying, well, is the network space anywhere else in the internet? And then I can start saying, well, if I can figure out where they come from, are they in any of these breaches, so that I can start using this? So the one way that I can do it, so here, look,
you just did a footprint, like here's the picture over here, there's that 19881. Everyone remember that? Okay, because it should be this one, I guess. So this one over here is 19881129. Right? So this is, I mean, this is the data, like we're not making this up. So then what I can do is I can say, okay, awesome, if I've got this network block,
like this is the network, this is where the CIA comes from, I can go look on things like Wikipedia, which is my favorite, right? Because I can say from this network block, what pages have been edited? Okay, and the CIA loves to edit my favorite of all time, this page called lightsaber combat, right? So I just imagine like they're there at the office being like, can't believe someone made a change to my page again. And then they're
going back and they're editing that page. And of course, if they're not logged in, like it logs the IP address, right? If they are logged in, it has a username, but if they just go and edit a page, like we get the IP address. So here already, we said, okay, we use some OSN stuff, we've got the CIA's network, we've got the IP addresses, and now I can
see like if anywhere in those net blocks, they've edited a Wikipedia page, I know that's how they get onto the internet, right? Because if you work there, like they're not like, oh, just go browse the internet from your computer, it's fine. Like it has to go through like some sort of device, like some IP address that's checking that you're not like stealing all the data, maybe, I don't know how it works, making sure that you're not like doing anything. But because I have that, I now know that those particular IP
addresses have access to the internet and have access to the inside network, right? Because people who work there are using it to edit the lightsaber combat page. It's very important work that they do over there. Okay, so if I've got that, I also know about the people, right? So, if someone edits like every single lightsaber related page,
then I would run a lightsaber related phishing campaign, right? Or I'd be like, hey, John, you really need to stop editing that page, it's getting weird. Um, okay, so if we go from there, so here we found that particular net block, and we can say from that net block, we can get like, I'm just going to pick on one IP address, just to kind of show this stuff. So we've got this IP address over here at the bottom, and we say that's the
CIA exit node, obviously there's lots, but this is one of them, and here I can see like all the stuff that it's edited. Alright, so there's like intelligence things, I'm like, okay, it looks right for my target, um, this is the kind of information that we're using. And now we can start exploring, okay, if we've got that exit node, can we go into the breach data? Because remember, like, if they logged into Ashley Madison, like, it logged
their IP address. So they're sure they're not using CIA.gov as their email address on the account, but they are using that IP address because they're using it from work. Like, if he's editing lightsaber combat at work, like, he's probably doing a lot of stuff that's not work related. Um, okay, so I've hidden, I have actually, we have actually hidden the names, and I'm not going to be able to do the demo live, because then everyone will
see it, and it'll be weird. Um, but you could go and look it up if you understand the steps, and it's still a relatively simple process, we're just linking like small pieces of data. Okay, so I've got the CIA exit node, then I find like one account, but there's a lot, but I pick one account that says, okay, this is an account that was in
these data breaches, right, that comes from that IP address. So to come from that IP address, you've got to work with the CIA. Then we find like, okay, well, that account, I can say, well, what email address did you use? And now, of course, I have like a Gmail account, right? So I have someone who works there, and I have the private email address. And now I can say, okay, well, like you can do back, go back to normal OSN stuff, be like, where do we find this Gmail address, right? And then there's
like a CV. So here, he works at the CIA. And he has a particular, like he has it in there that he's been there since 2011. I don't know if he's still there. Well, I'll talk about it just now. Okay, and then it gets like really bad. So he has a GitHub account, right? And he has like this portfolio website. And in the portfolio website, he uploaded his config. And in his config, it has this thing
called gmail.ini. And in the gmail.ini, it has the password for his email address, right? Then also, you can find like a Twitter account that he says he's a typical techie. So I know that he's a technical person. I don't know really looking at this. But then also like there's, you know, there's a Facebook account as well. So you can go and like pivot off all this information like you
normally do. And you say like, hey, all we did was we found out that you edited a Wikipedia page. And now we use because we can use like external data or data that we have that maybe isn't traditionally OSN'd. Like now we have a lot more information about the people or what I'm having. Of course, like for this account, like this could just be a honeypot, right? Like they, I
don't know how they would have added it, but they could have set this up to be like, hey, check this out, like and see if people start trying to email from there or something else. But of course that data does link to each other. Okay, so that's an example of how we use that sort of stuff. And then I just want to finish with like what I think the OSN's in the future. So obviously we have like a ton of GDPR stuff like everyone in the room
probably has heard about it, right? It's a little bit of a pain. It is really good for users, right? Because we have privacy on the internet because everyone is going to get like sued or something if they store my information. But of course it does make, mean that like if we're an investigator, like that information is becoming more and more scarce. Like we're way past like the golden era of OSN now. So remember like when
MySpace was around, people were like, hey, fill in your details, also put where you work and how much money you make. Like if I go and tell someone like, oh, I saw your Facebook, but you didn't in your about box put where your current salary is, like people think I'm a lunatic, okay? But obviously like back then we could say, cool, we got an email address, we find an account and it's got tons of data and it's open. So it's
becoming like a lot more difficult. And especially we look at things like, you know, like my mom knows about GDPR. Like she can barely turn on the computer and she's like, hey, have you heard about this thing? I was like, yeah, it's over now. But because of that, like, obviously there's no more like, you know, the things that we took for granted before, like we could search from platforms like
Facebook or LinkedIn or Amazon or whatever, we could search for email addresses and telephone numbers and go directly to accounts. Like now we have to either use holes that are still open for the time being, but like they will be closed. And then we have to say, okay, well, we go from like aliases or something else, small piece of information to try and get to these accounts, things
that aren't concrete, really that we had before, right, then obviously, like who is is basically gone. I don't know what's happening with it. I think they're still debating. So now I can't even see the who is to try and say like, all these people registered with the same fake name, like that makes good sense to me. And then also, like the historical information is also probably going to go, I guess, because
they're storing all that information on people, and no one gave their consent. So now we're going to start like, I bought at least in my opinion, like we're going to start seeing stuff move to more like underground sources. So we really have stuff where there's like, paid for API's, where you can pull that information from people who have that. But like now it's just going to move further into
that thing. Right. So we have this whole policy clash with availability, because I say, I need to get that information. And it's in this data breach. Can I have it in the office? And everyone will be like, there's no way you're allowed to keep a data breach in the office. So you'll have to go through a third party data broker to say I can go from this information to that particular one. So I think that that's
probably where it's going to go through. And we're going to see stuff being like, okay, we've got different sources that we have to use. So we're going to start seeing more smaller pieces of less concrete data to get to it. And we're going to rely a lot more on our correlations and like the confidence rating that we have. And obviously we're going to do like a lot more graphing stuff to say like,
hey, we can take all these small pieces and link them like on other small pieces until we get to something that we can find. So using like the breach data, various paid for API's or other loopholes that we can find in the meantime while they're there. Okay, so that was me. If you have any questions, you can either tweet me or
email is a picture of a small girl with a machine gun. That does. Yeah, so that was it. Thank you very much.