We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Recon Village - A URL Shortened By Any Other Name video

00:00

Formal Metadata

Title
Recon Village - A URL Shortened By Any Other Name video
Title of Series
Number of Parts
335
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Presentations from the DEF CON 27 Recon Village
Uniform resource locatorRow (database)Speech synthesisMultiplication signUniform resource locatorTwitterComputer animation
Slide ruleCountingWhiteboard
Video gameType theoryTraffic reportingOpen sourceComputer animation
CodeExpected valueHypermediaOnline helpLatent heatSurfacePasswordCanonical ensembleSpacetimeComputer animation
CodeUniform resource locatorMessage passingTwitterComputer animation
Convex hullOrder (biology)Link (knot theory)Element (mathematics)LengthUniform resource locatorComputer animation
Rule of inferenceLimit (category theory)Term (mathematics)Virtual machineLink (knot theory)MalwareUniform resource locator2 (number)Service (economics)Computer animation
MathematicsNatural numberSampling (statistics)Link (knot theory)SpacetimeComputer animation
Electronic mailing listWordBitSpacetimeComputer animation
CodeLink (knot theory)Electronic mailing listGreatest elementComputer animation
Link (knot theory)BitLink (knot theory)Computer animation
Error messageInformationBitServer (computing)Error messageScripting languageComputer animation
StatisticsStatisticsLink (knot theory)WordSpacetimeComputer animation
Graph (mathematics)Random matrixBitCountingLink (knot theory)Point (geometry)Right angleComputer animation
Mathematical analysisHypermediaSource codeNeuroinformatikComputer animation
Type theoryEmailPoint (geometry)File formatAddress spaceComputer filePasswordUniform resource locatorComputer animation
Level (video gaming)InferenceBitUniqueness quantificationMereologyLink (knot theory)Point (geometry)Greatest elementAddress spaceBlock (periodic table)Multiplication signComputer animation
Magnetic stripe cardEntire functionGodSet (mathematics)Sign (mathematics)Element (mathematics)Service (economics)Computer animation
InferenceMedical imagingInformationType theoryLevel (video gaming)BitStructural loadSlide ruleTerm (mathematics)EmailLink (knot theory)QuicksortHypermediaAbsolute valueLatent heatPasswordDifferent (Kate Ryan album)Multiplication signUniform resource locator2 (number)Service (economics)Office suiteComputer animation
CodeVideo gameInformationStatisticsLevel (video gaming)MereologySlide ruleEmailLink (knot theory)Source codeComputer fileWebsiteBlock (periodic table)MalwareMultiplication signUniform resource locatorSpacetimeTwitterComputer animation
Transcript: English(auto-generated)
Right, okay, so I'm gonna quickly introduce Master Chen while he is having a drink, or maybe two. This is a comprehensive talk that we've got going on now. The title is on the screen, a URL shortened by any other name. If you do not know Master Chen, even though the front two rows apparently do, he was here at DEFCON in 2015 doing a SkyTalk, which basically taught people how to automate stalking.
He then came back to Recon Village last year to tell people how not to do that because of what he did the first time. And now he's back again. So he kind of loves coming back to our village, and we're really pleased to have him. And without further ado, I'm gonna hand over, and enjoy. Perfect, thanks.
Thank you guys, thank you everybody. And I just wanna say, you know, speaking at a smaller village, not that this is a small village, but speaking at a smaller village, it's really intimate, it's really awesome, and I like to see my audience, even though I brought like half of you guys here. So I just wanna say thank you guys very much.
You can follow me on Twitter there at the handle that you see there. That's also like all of my social media, pretty much. So there you go for doxxing myself. All right, and actually, here we go, starting with who I am or who am I. I know that's a funny little tagline there. But this is actually my favorite slide because it reminds me of Mushu the dragon from Mulan.
Right, who am I? Who am I? I am the Gray Noise co-founder and co-host, well, which is a podcast if you don't know. I'm a B-side speaker, I'm a Defcon speaker, I'm a Recon Village speaker, I'm a Telefreaker, I'm a 2600 writer, and I'm a board member of the Synchop locally here in Las Vegas.
I try, I try to be a comedian, maybe it's good, maybe it's not, I don't know. Who was there? Was anybody there? You don't count. I'm kidding, I'm kidding, Muffins. Okay, so I wanna know who you guys are, and that's probably for my own own synth. But any first-time Defconers here?
Okay, cool, any old-timers here? I'll use the same joke. Any egg-timers? No, okay, kitchen-timers. No, okay. All right, do we have any people in the audience who are like private investigators or just people who do this professionally, OSINT, these types of investigation? Okay, awesome. Awesome, I got people to prove myself to, all right.
Any data scientists out there? Okay, awesome. I'm recently getting into that space, that's cool. Any reporters besides myself as press for Defcon this year? No? No reporters? Okay, awesome. All right, so these are standard disclaimers. I don't wanna hurt anybody in my research. This is all open source.
Try not to be an asshole if you're gonna replicate my work. I'm doing this for educational purposes only, right? And here they are, actually. So I anal, I anus. That's been a joke for the past couple years. I am not a lawyer, I am not a stalker. I anal, I anus. Sounds better that way, I guess. So again, I'm not here to make life difficult
for anybody, so use whatever you learn today wisely. And of course, again, this is for informational purposes only. All right, so some caveats before we get started with the actual meat and potatoes. I went into this research not knowing what I would find. I wasn't like looking for something specific. I just kind of aimed my code cannons at it and I wanted to see what was gonna happen.
So yeah, there was no particular target. I'm not trying to target anybody specific, no stalking this year. And this is just, there is much more to do in this space. This is just scratching the surface. So hopefully with the help of you guys, we get something done. But after the research, what can I tell you that we can expect here?
Well, there is porn. There's porn, obviously. There is username enumeration, password enumeration, and we'll get into that a little bit, sensitive docs and so on and so forth. And thousands and thousands of various media files. That's kind of the fun one. That's just fun to watch. Okay, so let's define a URL shortener.
So a URL shortener, of course, you're taking a very big, long, descriptive URL and you're shortening it into something that is shareable so you can share it among, let's say, Twitter or text messages or it's just something very quick and easy, something that's memorable or at least easier to remember. But all it really is is a 301 redirect for your HTTP codes. These are just a 301 redirect to a different URL.
So knowing that, we can follow it back. We can follow it back to where the original link went to. Now, I chose isgood as my target. That's is.gd as the URL shortener for a couple of reasons. There's no membership required in order to shorten a URL.
There's a consistent slug length. So what I mean by that is a lot of URL shorteners, maybe their slug length varies in size. So you have a URL shortener that only used five characters, maybe some that used eight. isgood only uses six, no more, no less. I feel like there's a Monty Python joke in there somewhere. I won't attempt it, but.
So a six-character slug length. It's always gonna be six characters. And so there's also advanced shortening features which makes this easy and low-hanging fruit. So things to consider, of course, rate limits. I did check the terms of service for isgood. And you can scrape all you want
as long as it's only one link per second per machine. So yes, every hour then you're basically able to scrape 3,600 links and so on and so forth. So that's, what is that, 8,400 per day? Something like that. But yes, 3,600 an hour.
So terms of service, of course, if you're going to use isgood as a URL shortening service, they have some things that you'd want to mind, of course. No spam, no child pornography, of course, thankfully. No malicious content, so you can't URL-shorten malware. If they detect it, they'll disable it, et cetera, et cetera.
So basically, don't be a dick. All right, the maths, because there's always maths. Now, no, I'm not British, but I do love the area. I've never been there, but maybe I love the culture. So okay, like I said earlier, all of the slugs are six characters in length, which means if you, and they don't use special characters either, so no plus sign, no parentheses, nothing of that nature.
So you're looking at capital letters, lowercase letters, and then zero through nine as your numbers. So if you take a look at all that, that's 62 character possibilities in each space. So it's 62 to the sixth power, which gives you about 56 billion links, closer to 57 billion links, actually.
Now, no, not all of these URL-shortened links are actually being used, and we'll get into that in a second, but this is the entire space. This is exactly what can be shortened. And of course, if you have a lower pronounced, or a lowercase word, like zero, A through Z, zero through nine, it shortens that to about two billion, okay?
So we're kind of whittling away the possibilities, or we're trying to whittle away from population size to a sample size. Now, isGood actually has some advanced, advanced URL-shortening features, such as lowercase word pronounceable, okay? What exactly does that mean?
Lowercase word pronounceable is just consonant, vowel, consonant, vowel, consonant, vowel. It's not actually a word, it's just what they consider word pronounceable. So, bah bah bah bah bah bah bah bah bah bah bah, bah bah bah bah bah, what about you? Sounds like a, sold that hog for a $35 bid. I wish I could do auctioneering better, but it made the joke better.
So the word, so the lowercase word pronounceable character set is, either you're starting with a vowel, or you're starting with a consonant, and going from there. So it's either ABABAB, BABABA. And so with that, this actually whittles down our space to about 2.3 million links, something that's just a little bit more manageable.
Now, so how do we do that? On the right side of the screen, you're gonna see a list of the slugs that I generated, and then I'm just gonna plug that back into a cURL request to grab that data and see where that link is going to. So the top code snippet is just how I generated the links. The bottom code snippet is just taking that,
putting it into a cURL request, and then seeing what comes back. That's very simple, that's all it is. The cool thing about this is that it's all native Linux tools, a little bit of Python, but it's nothing that's not on a computer already. So here's an example of the scrape as it's going on. I just took a screenshot. I let this go on for about a month and a half
on three computers. Could I have done it quicker? Yes, but it was kind of cool to watch it as it computed, right? The cool thing is like you go to a coffee, you come back, oh, I found something else. You go take a nap for about eight hours, you come back and it's still running. So for a month and a half, it was better than watching a screensaver at least, right?
So what you'll see here is that if the link is resolvable, it'll show you the new URL, the URL that it actually points to. If it is not resolvable, it'll just kick back your original request, which is the HTTPS, you know, is.gd slash whatever the slug is. And so that's exactly how that works.
So there were a couple of errors along the way. So bad requests, hosts that no longer exist, you would get for like a 404 error. Also, things that would hold up the script would be something like streaming media, right? Because the curl request just constantly grabs data and it actually hung up my server a little bit
or the script. So I would have to stop it, move that URL, and then continue on with the scrape. Because imagine like a URL being shortened to like SOMA FM or whatever, the curl request is just constantly pulling for that next media, that next piece of information. So that was a problem.
But that's why it was a good idea to keep an eye on it. So here are some of the statistics. Now this is actually pretty interesting. So again, just to kind of refresh your memory, you have 2.3 million links in this lowercase word pronounceable space. Now of that, only about 230,
or sorry, 228,000 links were resolved. So that's about nine, almost 10%. And of the original 1.3 billion links that are actually resolved on LinkedIn, or not LinkedIn, sorry, that's the whiskey talking, on Isgood, only about .01% was resolved, if we're looking at that statistic.
All right, now that I've kind of explained exactly what I was doing to scrape, let's look a little bit at the data, which I'm sure is the interesting part, right? So what you're seeing here on the absolute right side, the white text, is the entire count for how many of particular links were there.
So what I mean by that is, at the very top, you see 141,444. Those are 141,444 unique links that are referenced in this scrape. At the very bottom, you're gonna see one link, but that's referenced 66,000 times.
So maybe that's an interesting point of data to take a look at, and it actually is. We'll get into that in just a second. And it varies from the top all the way down. So you have two links that are 35,000, et cetera, et cetera, et cetera, and that's how that data goes. I tried to graph it, didn't work so well. Gotta work on that. Now here's our media analysis
on the stuff that we've scraped. So we have JPEG, APKs, Zips, MP3s, C, that's C source code, PNG files, PDFs, EXEs. We found a lot of stuff here. When I say we, I mean me and the voices in my head. But yeah, so this is like a media analysis of exactly what we found.
Again, the text on the right side is the actual data counts. Now again, I've only used native Linux tools, so we're talking like cut, sort, unique, stuff that you'd normally find on a Linux computer or Unix. Now what we can do though with this data, because I've saved it into a text file,
so every URL that was resolvable, I saved into its own text file. So now we can make that searchable. Now what do we search for? We can search for keywords, like username equals, password equals, invoice, and you'll find exactly what you're looking for, really.
And also, of course, at the very end of the file, you can search for Excel type of file endings, .doc. You can find anything specific that you're looking for, as far as like, you know, the PDF file format, those types of file formats. And there's the screenshot of,
this is an invoice from February 12th, 2018. Of course, the username, the email was kind of blocked, kind of blocked out, at least you know it's from Gmail, but that did have an address along with first and last name. So it's a good starting point. Now, what does this actually infer for us?
Like, what does this tell us? Okay, well, is there stocking potential? Probably, of course. We could potentially uncover people of interest, and I could talk about that in just a second with that screenshot at the bottom. And of course, the link on the right side is why was this particular address or block referenced 46 unique times?
Now, do we remember that one link that was referenced 65,000 times earlier? That's the map. That's the map address that was linked 65,000 times. Now, why? Well, it's not a particular house. I've already checked. It's not one house. It's actually that block. So why would that particular block be referenced so many times?
Now, we could explain it away, such as like a realtor using is good, or it could be something else. It's just very interesting to see that kind of data there. But let's talk a little bit about the person of interest on the bottom. First name is Natasha. Last name, of course, blocked out, but it's referenced five unique times,
and that's because it's five unique Natashas with that same last name. Is this a person of interest? Who is this? Is this a celebrity? Is this an unwanted celebrity or somebody who doesn't actually want to be in the space? This is the stuff that I found kind of just as I was going through the data. This is the interesting part.
It's almost like OSINT by accident. I'm not exactly looking for somebody specific, but there it is. So this is just some of the interesting data points that I found. So what are the next steps in what we're doing here or what I'm doing here? And when I say we, now I'm including everybody here.
We could scrape the entire character set of is good. We could apply this methodology to other URL shorteners, such as TinyURL, Bitly, all these other things. The only thing that I would say is these other services use memberships or they ask you to kind of sign up, and so I didn't get to see exactly as far as
do you have to use their API? Could you just use other means, et cetera, et cetera? But that's something that I'd like to explore. So any questions on this so far? God damn, that was quick. Yes, question.
On the map data? Map data specifically, yes. There's actually a lot of map data, like incredible amounts of map data. So is good, well specifically is good, is used for a lot of map references.
There was a lot of one-offs though. So it's just one link that referenced an address. But to answer that question, yes, they were pointing to houses, houses that you can easily then pivot to like an assessor's office search to see first and last name of that address, and so we can go from there. Now I didn't go down that rabbit hole,
but I see that there's that potential there. Also, kind of funny thing here, there's a lot of map data that references secret beaches, like beaches that nobody wants to know about except among their friends, but that's actually shared quite a bit of time, or quite a few times. Also another piece of interesting data, there was about 388 references to porn,
that's like X Hamster, XXX, X and XX, Pornhub, and that's just because I searched it, not because I was told to, so there's that. I mean, there's a treasure trove of information here. The most interesting thing I found though actually, I thought I had a better screenshot of it,
was the invoices. So the invoices that I saw there, you had first and last name, you had how much they paid for said service, you had their email address, and you had pretty much every piece of information you would need to maybe start some sort of campaign against that particular person. But again, the problem is, not the problem, but the issue here is that I wasn't targeting anybody specific.
So it kind of just, this is kind of just information that I happened upon. So take that as you will, but that is there. So yes, yes, I found it once, absolutely. So I found it once by accident. I found a password reset link,
and I was trying to find it again, but it seemed like it was going to work. Now I'm not gonna be an asshole and actually change their password for them, but yes, absolutely yes, that was linked. I'm sorry, for the people who are watching on TV, the question was, did you ever see a URL that was posted to a password reset link
or something that logged you directly into an account? And the answer is yes, that was there. Not directly to the account per se, but to a password reset for that account. That's something that I found one time going through, and I know I can find it again, I just haven't yet, but it is there. So another question?
Oh, yes, Mr. Muffins. Oh, absolutely. Well, so actually in the URL shortening service,
in ISGUD's terms of service, they warn you, they tell you straight up, look, this is going to be public information. They tell you, do not share your personal information, do not share anything of that nature. Obviously, we all read the terms of service. So ISGUD already knows that that's an issue, but it is there, it's in their terms of service
saying don't do that, it's still being done. Also, one thing that I did include on the slides here, but I wrote an image loader. So what I did was I took all the URLs that had JPEG, PNG, all these different types of media files, and if you run it, it'll load a new picture every second. Some of the images are not safe for work.
So I will make that available to you guys. Do that on your own time, and have fun. Just, you've been warned. All right, I love answering questions, so we can keep going with this. Mr. Muffins. Is there a GitHub or something for this? Yes, absolutely, I just made it public
a couple of minutes before the talk, so if you want to take pictures there, there you go, that's the URL for GitHub. And my other stuff. In fact, I'm starting a new campaign, a new Twitter campaign, hashtag roast my code. Go ahead and do that, please. I sincerely mean it,
because I'm not a coder by profession. I just kind of throw code together and see if it works. So please, hashtag roast my code. Questions? The final size as far as link count, or? You guys count all the links on the website.
Yes, I do. So, this is funny. So when I generated the links, I actually generated 56 billion links, and just that text file alone was about 15 gigs of just six characters every line, and that was the entire space. So I was like, ah, that's a lot. So I cut it down to the 2.3 million.
And as far as size, again, earlier we had a statistic about only 228,000 links were actually resolvable or were resolved, and so I did save that data. And so, actually, thank you for this, but another interesting piece of information here with OSINT is we can see which companies
are using these URL shorteners, because I did see a lot of URL shorteners or URLs shortened from the same company. For instance, postimg.cc, as an example. They use it to show video games that they have for sale on their website as a shortened URL. So that was an interesting piece of information.
Mr. Code. So it spits back the unshortened URL, and then I would go in and just categorize the data by what it found.
It was just checking where it left. Yeah, so it was only checking the headers to answer that question. So, yes, I didn't download all the information. I just saw where it went to, and then it was up to me to take a look at that data and then look at that, you know. But, oh, another cool thing about this, too, is that I found a whole bunch of PDFs,
so if you guys wanna play, like, slide karaoke later, I have a whole bunch now. Or at least I can get them pretty quickly. So. All right, let's keep going. Questions? Yeah, we can keep going. Yes, over there.
This is another, thank you guys. I love this discussion. So the ransomware, again, if Isgood actually detects on their end that a link is malicious, they'll block it. And so when I go to the URL, it has this big, you know, warning, warning, malicious, malicious, and so I didn't actually see that.
By the way, all this was resolved in a VM, so I was okay with going to these URLs. But yeah, so by the time I got to it, it's not like I was downloading ransomware or malware or anything of the sort, including viruses and whatnot. So I hope that answers your question. So I wanted to say something off that point,
but the whiskey made me forget. Question. Not that I've seen, but I mean, I didn't go through every single link. This information will be available to you upon request, actually. So the code is there for you, but if you want to talk, we can talk further
about exactly what I found, I mean, with specifics. I'm not trying to burn any sources here or whatnot. So, yeah. Any further questions? All right, so if there are any further questions,
I will be down at the heart bar for however long you guys want to talk to me. I like to talk to people. That's my favorite part of the conference. So if you want to keep continuing this conversation, we definitely can, and I'll get off stage now.