Investigating the Practicality and Cost of Abusing Memory Errors with DNS
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 109 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/36388 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
DEF CON 2344 / 109
12
19
20
23
24
29
32
33
36
51
58
60
62
66
67
68
69
70
71
77
82
84
85
88
89
92
98
99
103
104
107
00:00
CodePartial derivativeInformation securityEmailPublic domainProjective planeExploit (computer security)Partial derivativeCASE <Informatik>Complete metric spaceStudent's t-testAreaLatent heatBitCodeUniverse (mathematics)Type theoryAddress spaceComputer animation
01:24
Quantum stateRead-only memoryError messagePRINCE2BitVariety (linguistics)TrailObservational studyRead-only memoryMultilaterationError messageRead-only memoryVirtual machineModule (mathematics)Computer animation
02:10
Read-only memoryError messagePublic domainPurchasingPublic domainProjective planeBinary codeWordBitRow (database)Exploit (computer security)Kernel (computing)Latent heatComputer animation
02:52
Computer clusterDomain nameBitPoint cloudCASE <Informatik>Representation (politics)Exception handlingPublic domainSubject indexingComputer animation
04:43
BitElectronic mailing listInformationPublic domainProjective planeLevel (video gaming)WebsiteTerm (mathematics)Extension (kinesiology)Computer animation
05:31
Direct numerical simulationWeb browserBootingCodeBoom (sailing)Document management systemUser interfaceQuery languageComa BerenicesProjective planeStandard deviationWikiServer (computing)Physical systemWeb pageRead-only memoryDirect numerical simulationResolvent formalismProcess (computing)Line (geometry)RootGame controllerQuery languageWeb browserBitCodeWebsiteTheorySummierbarkeit2 (number)Context awarenessTrailStructural loadPoint (geometry)Uniqueness quantificationPublic domainDifferent (Kate Ryan album)IP addressContinuous integrationType theorySound effectOpen setMultiplication signGroup actionExpected valueSlide ruleComputer animation
09:37
Scripting languageJava appletFlash memoryHTTP cookieTouchscreenUniform resource locatorPersonal area networkPublic domainQuadrilateralComputer-generated imageryGoogolComputer configurationLocal ringPerformance appraisalPhysical systemBand matrixBitElectronic mailing listSet (mathematics)Direct numerical simulationScripting languageDescriptive statisticsCASE <Informatik>Web 2.0Web browserConfiguration spaceTrailProjective planeIP addressComputer fontHash functionMultiplicationMultiplication signComputational fluid dynamicsDomain nameWebsiteProxy serverDifferent (Kate Ryan album)Read-only memoryPublic domainLocal area networkHTTP cookieProcess capability indexAdditionFlash memoryResolvent formalismQuery languageTouchscreenCommunications protocolResultant1 (number)InternetworkingMedical imagingStudent's t-testRoutingComputer animation
12:55
1 (number)Point (geometry)Server (computing)Moment (mathematics)Object-oriented programmingHTTP cookieContent (media)Link (knot theory)EmailNatural numberAuthenticationBitMobile appComputer animation
13:58
Boom (sailing)GoogolCross-site scriptingData storage deviceACIDWeb pageWebsiteCache (computing)BitAuthenticationComputer animation
14:32
Service (economics)Euler anglesGoogolBitPublic domainLastteilungCASE <Informatik>MathematicsSoftwareElasticity (physics)Service (economics)Point cloudExclusive orComputer animation
15:42
Service (economics)EmailScripting languageDomain nameWebsitePublic domainEmailBitoutputAddress space1 (number)Computer animation
16:34
Trigonometric functionsService (economics)WebsiteWeb 2.0StatisticsBitAnalytic setWebsiteService (economics)Computer animation
17:23
Computer networkSyntaxbaumComputer-generated imageryPrototypeGoogolFluid staticsMoment (mathematics)PurchasingPublic domainContent delivery networkWebsitePlug-in (computing)SoftwarePublic domainInternetworking1 (number)LoginContent (media)Web pageFacebookConnected spaceGraphical user interfaceSoftware testingComputational fluid dynamicsGoogolFluid staticsExtreme programmingYouTubeCountingTwitterComputer animation
18:47
Limit (category theory)AuthenticationInternetworkingInformation securityWebsiteDomain nameLimit (category theory)WindowEmailProcess (computing)Information securityOrder (biology)Computer animation
19:27
Public domainPublic key certificateSocial classSelf-organizationUniqueness quantificationProcess (computing)Formal verificationPublic domainPublic key certificateAdditionEndliche ModelltheorieException handlingTotal S.A.MathematicsProjective planeDiscounts and allowancesComputer animation
21:20
Public key certificateMessage passingLoginAvatar (2009 film)Dependent and independent variablesPublic key certificatePublic domainProfil (magazine)LoginEmailObject-oriented programmingComputer animation
22:03
Coma BerenicesLimit (category theory)Public key certificateIdentity managementPublic domainMereologyGoogolComputational complexity theoryVector potentialTrigonometric functionsSelf-organizationLoginDomain namePublic domainMereologyPublic key certificateProjective planeSet (mathematics)System callComputer animation
22:59
MassPublic key certificateWeb browserGoogle ChromePublic domainEmail
23:33
EncryptionDependent and independent variablesPublic domainNatural numberPublic key certificateEncryptionAuthorizationExploit (computer security)AdditionProcess (computing)Internet service providerComputer animation
24:34
Public domainSquare numberBitWebsiteServer (computing)Video game consolePublic domainDirect numerical simulationError message
25:13
BitServer (computing)State of matterLine (geometry)RootMereologySocial classBlock (periodic table)Public domainRow (database)Computer animation
25:54
Level (video gaming)Information securityInformationMagnetic stripe cardServer (computing)Magnetic stripe cardEmailPlastikkarteDependent and independent variablesSlide ruleInformation securityIncidence algebraLevel (video gaming)InformationSystem callCoprocessorCASE <Informatik>Multiplication signDatabase transactionComputer animation
27:18
Direct numerical simulationQuery languageMereologyBitGraph (mathematics)Server (computing)Query languageMultiplication signDirect numerical simulationResultantNumberSheaf (mathematics)Graph (mathematics)Computer animation
28:18
Query languageDirect numerical simulationBlogQuicksortConnected spacePhysical systemPublic key certificateServer (computing)Public domainNumberAverageMultiplication signCorrespondence (mathematics)Validity (statistics)Computer animation
29:34
BitFormal languageStandard deviationSubsetEmailVirtual machineTouchscreenComputational fluid dynamicsImage resolutionPhysical systemMatching (graph theory)Graph (mathematics)Set (mathematics)Right angleComputer animation
30:17
Image resolutionAddress spaceQuery languageGraphical user interfaceWeb browserWeb browserClient (computing)Query languageRead-only memoryGraphical user interfaceComputer animation
31:02
HTTP cookieGoogoloutputWindowHTTP cookieGoogolLoginCodeToken ringAnalytic setComputer animation
31:53
HTTP cookieToken ringData storage deviceMobile appCASE <Informatik>outputCartesian coordinate systemValidity (statistics)Program flowchart
32:44
GoogolAddress spaceDifferent (Kate Ryan album)SoftwarePhysical systemLocal ringInheritance (object-oriented programming)InternetworkingNatural numberAddress spaceIP addressComputer animation
33:34
Server (computing)Hydraulic jumpBitNumberEmailPublic key certificateObject-oriented programmingComputer animation
34:41
TwitterDirect numerical simulationPie chartRange (statistics)Query languageRoutingDirect numerical simulationTwitterEmailAddress spacePasswordRow (database)BitConnected spaceUltraviolet photoelectron spectroscopyComputer animation
35:35
Computer networkAddress spaceLatent heatPublic domainDependent and independent variablesTwitterFunction (mathematics)BlogMathematical analysisVorwärtsfehlerkorrekturRead-only memorySystem callEuler anglesFacebookHeat transferCodierung <Programmierung>Game theoryScheduling (computing)Public domainDependent and independent variablesType theoryBitHeat transferQuery languageSoftwareRead-only memoryWebsiteAuthorizationProcess (computing)TwitterObject-oriented programmingFacebookOpen sourceComputer animation
37:59
Server (computing)Direct numerical simulationEmailCore dumpWebsiteCodeDisk read-and-write headPublic domainInformationServer (computing)Hash functionAreaOpen sourceSound effectComplete metric spaceEstimatorOrder (biology)LoginStatisticsQuery languageWeb 2.0BitFrequencyMultiplication signFormal languageProjective planeDirect numerical simulationQueue (abstract data type)WebsiteAliasingUniform resource locatorConnected spaceRight angleRead-only memoryAddress spaceInformation securityComputer animation
Transcript: English(auto-generated)
00:02
undergraduate student at a big university that had nothing to do with this research. I'm a security engineer originally from Minnesota. I'm currently working in the bay area. Say that again? Is this better? Sorry about that. I'm also
00:27
the founder of hydrant labs LLC which has graciously funded this research. It's funny how you can do that when you're the only employee. In case you guys didn't catch that, that means unlike a lot of the speakers out here, I'm not hiring unless you want to work for a 19-year-old kid for
00:42
minimum wage. If you do, there's my e-mail address. Also if you have any questions about the research or you'd like to send me legal threats or both, that's my e-mail, or you can snail mail things to the whoisinfo on the project domain which will be listed at the end. So as usual, we're
01:01
going to start with a quick rundown of what I'll be talking about today. I'm going to talk about what a bit flip is and the history of their exploitation. After that I'll get into bit squatting which is a specific type of bit flip exploitation. Finally we'll move into my research on bit flips via bit squatting and I'll finish things up with a complete code release, partial data dump, followed by
01:21
Q and A. So what's a bit flip? A bit flip occurs when a bit flips from a 1 to a 0 or a 0 to a 1. It's a pretty simple concept. It can happen for a variety of reasons. Heat, electrical problems, radioactive contamination, cosmic rays, among others. I'm not going to focus very much on what causes a bit flip. I'm going to instead focus on how
01:42
we can actually exploit them. However, we will take a quick track into the history of bit flips. In 2003 a paper was published by Princeton University titled Using Memory Errors to Attack a Virtual Machine. In the study they literally took a 50 watt light bulb, put it over a memory module to intentionally induce bit flips and then used it to
02:01
escape the JVM. Since then a variety of research has been done into bit squatting which I'll get into a bit later. However, it wasn't until 2014 when a paper from CMU was published that investigated the use of DRAM flushing to intentionally induce bit flips. Many of you probably heard of this by its more common name, Row Hammer. In 2015 Google's Project Zero team showed that it was
02:22
possible to exploit a bit flip to actually gain kernel privileges. Now what is bit squatting? I keep saying this word. Bit squatting was a named coined by Artem Dynenburg. It refers to a specific exploitation of bit flips via purchasing domains that are one bit away from the
02:41
legitimate domain. In the hopes that a bit flip will occur and the user's traffic will be directed to your domain instead of the original intended domain. So let's take an example. We take the domain CNN.com. You can see the binary representation of it right there. If the last zero in the second N or in the first N flips from a zero to a one,
03:02
you can see it changes to CON.COM. So normally these domains aren't actually registered and so the request will fail silently or the user won't ever actually notice something happened. And so the idea is with a bit squat to purchase these domains instead. Now how do you actually come up with all the possible bit squats? You can't
03:20
just flip every single bit. So if we take a legitimate domain name, let's say www.defcon.org and you can see the binary representation of one of the letters in E for example, we can actually throw away any bit flips that occur in the first bit because in seven bit ASCII it's always going to be zero. And in most cases we can throw away any flips that occur in the third bit depending on where
03:41
you're indexing from. That's because this represents the case of the character and in domain names this is irrelevant. Unfortunately, this doesn't leave us with six possible bit flips. That's because domain names only have certain valid characters. Primarily A through Z, 0 through 9 and dashes. In
04:00
the case of E this gives us five possible flips, U, M, A, G and D. Now I mentioned there were some exceptions to that rule with case. And sometimes a letter if a flip were to occur in it, it changes to a different character. So for example, a N can flip to a dot or vice versa or a slash can flip to an O. So these can actually happen where a flip
04:21
will occur and it will still complete to a valid domain. Or for example, a sub domain can change to www.n.defcon.org. This was actually researched at Defcon 21 by Jason Schultz from Cisco. He explored the possibility of bit flips in the new GTLDs such as .exchange, .cloud. So using this
04:43
information we can find there's actually 43 possible top level registerable bit squats of www.defcon.org. I've written a tool called BF look up that will generate a list of all possible bit squats of a given domain. It will be linked on the project site which I'll show at the end. Now I want to quickly note some of the previous bit squatting before my
05:02
research. During Defcon 19, Arnim Dinaburg first coined the term bit squatting and more recently Robert Stuckey and Jason Schultz conducted more extensive research into bit squatting which was actually the inspiration behind project bit flip. And what is project bit flip? Project bit flip is my research into how do you actually exploit a bit squat. It
05:21
makes a lot of sense of how a bit squat can occur and how it gets sent to potentially the wrong person but how do you actually exploit that to pivot somewhere. So let's take an example. Let's imagine an internal site, say a corporate wiki or continuous integration system that includes jQuery from the jQuery CDN, code.jquery.com. Since your
05:43
browser is loading this page for the first time it doesn't know who code.jquery.com is so it's going to initiate a request to the user's DNS resolver. This would be Google DNS, Comcast, Open DNS, etc. Now suddenly a bit flip occurs. Code.jquery.com has become code.jqueezy.com
06:02
instead. This could have happened in memory on the device running the browser. It could have happened in the memory of the nick. It could have happened in transit when a check sum was recalculated or even in the memory of the DNS resolver itself. Now the DNS resolver, if we assume this is a cold request, doesn't actually know who's in charge of jquery.com so it's going to ask the DNS root via a
06:22
authoritative name server look up. It's going to send that. The DNS root is going to answer with NS1 and NS2.bit flip.com. Now what is this domain suddenly? Bit flip.com is the domain I purchased to act as the control domain for the entire site. And it has a 1 in it because someone else already
06:42
owned bit flip.com. So these answers get sent back to the DNS resolver which the DNS resolver is then going to use to send a query to the project bit flip server. It's going to ask for code.jqueezy.com. Now this is also where this is going to begin to deviate a bit from a standard DNS question
07:01
and answer. We're actually going to send back two answers to the DNS resolver. One for code.jquery.com and one for code.jqueezy.com. Now the reason we do this is we don't actually know where in the process the bit flip has occurred. So we don't know what answer the DNS resolver is expecting. So we send both back to them and let the DNS resolver pick the correct one and ignore the other packet.
07:23
The DNS resolver is then going to send this back to the browser which it will then use to issue a HTTP get request. Now if you were attentive, a few slides back you may have noticed those two different answers actually had different IP addresses. And this is because it allows us to determine which type of bit flip occurs most commonly because this HTTP get
07:42
request will be triggered for a different IP address depending on which flip was accepted by the DNS resolver. Now this HTTP get request is going to get sent to the project bit flip server. Now this is also where we get to deviate a bit from previous research. Instead of just answering this with a 404 or 200, we're going to be a little mean. We're going to
08:02
send back a 301 moved permanently. This has the effect of permanently caching the bit flip in the browser's cache so that subsequent page loads are then directed to the project bit flip server even when a bit flip hasn't occurred. And we're also going to send it to a unique sub domain of bit flip dot com. Now the reason for this is the browser is not
08:23
going to know who's in charge of that sub domain. So when it gets the request, it's then going to have to issue another DNS question. It's going to send that to the DNS resolver which will send it through to project bit flip. This allows me to directly tie a specific user's browser with their DNS resolver because you may have noticed at this point there's no
08:41
way to tell which browser initiated a specific query which came because they all originate from the DNS resolver. Following this we have a pretty standard path. The answer gets back to the browser. The browser issues a HTTP get for jQuery dot JS. Project bit flip receives it and we find the answer with a 200. And we send back this tracking
09:01
JavaScript. Wait a second. I asked for jQuery dot JS not tracking JavaScript. Too bad. I'm not jQuery. So instead we're going to send back this which the browser believing it originated from code dot jQuery dot com is going to faithfully execute in the context of your internal site. Whether it be a continuous integration system, internal wiki, basically anything
09:21
with any important data on it. Or any site that a user would be tricked into entering credentials into believing that they are on the original site. Now how do you actually build all of this? This is great in theory but actually answering all these queries is complicated. The second tool I'm releasing is BF DNS. It's a go line DNS server
09:41
specifically designed to answer bit squat DNS queries along with BF www which is the configuration that I used to answer all these queries along with a bunch of PHP scripts gross that are used to answer the actual tracking JavaScript. I keep mentioning this tracking JavaScript. What
10:00
does it actually track? Basically everything that you can track with JavaScript. It pulls the users installed plug-ins, user agent, time zone, language, refer, the document title, the screen size, the resolution, the current URL, the do not track cookie. The installed fonts that are via flash and then it also pulls the local IP addresses on the system via web RTC. Some of you guys may
10:23
have seen this from the beef talk. There's a way to pull with web RTC the session description protocol actually contains all the local IP addresses installed on the system. And so JavaScript has access to those even if they aren't the route that was used to access the site. And these are internal LAN IPs. So in addition to external IPs if
10:42
your computer is directly connected to the internet. It also pulls the cookie names and a SHA256 of their value. You could actually pull the cookie values. I don't want to get sued so I just pulled the 256 hash of them. Now we need somewhere to host all of this. So we need to select a host. We need somewhere that supports multiple IP addresses so
11:03
that we can answer each question with a different IP address. We also want IPv6 support so that we can evaluate IPv6 usage and also somewhere that bandwidth is really cheap in case a bandwidth spike occurs if a big DNS resolver were to cache one of our results such as Google DNS or Comcast because then it would be serving it to their millions
11:22
of customers and that would result in a lot of traffic. Finally I wanted somewhere that was a smaller company in case there were letters or legal threats sent to them. Somewhere that would actually look out for their customers and wouldn't be able to just say go away, pick another host. I ended up settling on a host called RAM node, small VPS, 3 terabytes
11:41
of bandwidth a month. It cost about 15 bucks a month. Finally we need some domains other than the project domain. So I don't know how many of you know a college student but we're really lazy. Whatever option requires the least amount of work is the option we're probably going to take. So rather than building a list of domain from a bunch of data sets like the Alexa 500 or other similar sets, I fired up a
12:04
web proxy and browsed the internet for a day. At the end of the day I looked at all the top sites that I'd hit and looked for ones that would have interesting data. We got the mic falling over. Sorry. Hopefully. Nope. I took care to
12:26
only grab sites that would yield interesting data and try to explicitly avoid any sites that would have data such as HIPAA or PCI. The first site I bought was Google user content.com. It serves images for Google sites. Main reason I purchased
12:42
it is because it's a really, really long domain name. So there's a lot of opportunities for a bit flip to occur in memory especially when browsers copy the domain name multiple times. In fact here's all of the possible bit flips of it or rather here's the 72 of them I was able to buy. There are 79 possible valid ones. Now, I hadn't actually set up a proper
13:04
server at this point. So I pointed to that, the VPS and ran netcat on port 80. Those of you that are in IT have probably had what I like to call a oh shit moment. You accidentally run RMRF on the wrong server. You accidentally shut down all but the server you wanted to. Something like that. This is one of those moments. For those
13:23
of you that can't read the tiny text up there, that's a request for mail-attachment.Google user content.com. As it turns out Google user content.com serves all mail attachment downloads for Gmail and Google apps. Oops. To make
13:40
it even better, by their very nature those links are valid without session cookies. Meaning that each misdirected request I can go grab that attachment myself if I wanted to. So I decided to actually look a little bit more at the authentication. It turns out it serves not just mail attachments but the authentication for Google,
14:02
Google fonts, Google cache pages and Google translated pages. I'm sure there's no valuable data in any of that. Moving on from there I decided to take a look at Amazon. Specifically cloud front.net. If you're familiar with CDN it serves a lot of popular sites such as ESPN, Amazon.com itself among a whole bunch more. There were
14:24
43 possible bit squats, four of which were already registered. So I registered the rest of them. Moving on from there continuing with the Amazon theme I took a look at AmazonAWS.com. It serves pretty much all AWS services as subdomains of AmazonAWS.com with the exclusion of cloud
14:42
front. This includes Amazon S3, elastic load balancer and EC2. Interestingly this is one of the few domains I came across in my research that a lot of the bit flips were already owned for. The other one being Akimi who actually owned all of their bit flips. In this case Amazon owned 33 out of the 38
15:01
possible bit flips. However, the rest were registered by someone else except for one. AmazonAWS.com was of course I wasn't actually satisfied with a single bit flip so I decided to buy subdomain bit squats where the dot changes to an N of Amazon S3, EC2 and elastic load balancer. Moving on from there
15:25
I decided to take on double click dot net. Those of you familiar with ad networks knows that this serves Google's ad network and it primarily serves them over JavaScript which makes it a great target. There were 45 possible bit squats, 19 of which were already registered. So I
15:41
registered the other 26 of them. Moving on from there I hit Apple.com. Being a short domain name a lot of these were actually valid sites and not actually owned as bit flips so I was actually only able to grab one of them, AppleG.com. But continuing with the Apple theme I took a look at iCloud.com next. If any of you have an iOS or OSX device
16:04
as I'm sure a lot of you do, your device will check in with iCloud.com regularly for backups, contacts and a lot more. Additionally most Apple accounts have an iCloud.com e-mail address which is delivered to this domain. Moving on from there I decided to look at web dev. I don't know how many of
16:22
you have ever heard of JQuery. JQuery is a compatibility layer that makes IE suck less among other things. It's used by over 70% of the top 10,000 sites making it a really good target. I registered the 15 available ones of that. Continuing with the web dev theme I hit discus.com. It serves comments for about three quarters of a million sites.
16:44
And finally the peak, there's 24 purchased. The peak of my web dev phase, Google-analytics.com. I'm guessing a lot of you know what this is. It serves analytics for pretty much every site ever. It's the most widely used website statistics service in the world.
17:03
Interestingly I wasn't the first person to have this idea. 53 of the bit squats were already registered by another party. However I was able to grab the remaining 10 shown there. Hopefully you're beginning to see a theme here. Moving on, sfdcstatic.com is the Salesforce CDN. There were 42
17:20
possible bit squats. I bought all of them. I'm going to pick up the pace a little bit here. Aspnet CDN.com, Microsoft's AJAX CDN network. It serves a lot of Microsoft sites and a lot of JQuery plug-ins. Another 38 domains registered. Googleapis.com, it's Google's JavaScript CDN. Another 22 out of 39 registered. Gstatic.com. This
17:44
one's a fun one. It's Google's static content hosting. It serves things like the Chrome Internet connectivity test. So when you plug your computer into the network and suddenly Chrome says oh you're online now, it's hitting gstatic.com. Additionally it serves things like the Chromecast log-in page along with a lot of other stuff. The other thing
18:03
that makes this interesting is that this is one of the domains that was purchased by both Artem Dynaburg and Robert Stuckey in their research yet the domain was freely available, not purchased by any other entities or by Google itself. I was able to grab 19 out of the 30 possible bit squats. Finally to finish things up, Facebook's CDN,
18:20
YouTube's CDN, Twitter's CDN and there we go. Now, that was 337 domains. I know you all were keeping count. You're probably asking yourself how did I pay for all of those? Coupons. I don't know if any of you have seen the TLC show Extreme Couponing. That was a documentary of my life for
18:42
about a month. Eventually I actually ran out of coupons and I found one and one. One and one has a nice advertisement on the site for 99 cent domain names. If you go and try to buy one, the first one is 99 cents and then when you try to buy the second one, it charges you $11 which doesn't seem quite
19:00
fair. So if you start the process in an incognito window and then log in halfway through, it's still 99 cents. So after doing that, about five minutes per domain, I got this email. Dear Luke Young, you have exceeded the limit of our current special offer. Further orders placed under this offer will be canceled. Sincerely, security team. And that's when
19:23
I stopped buying domain names. The final statistics, I bought 89 domains from GoDaddy, 255 from one and one at an average cost of $1.62 per domain coming to a total of $545. Now
19:44
the next thing I realized is I was actually missing out on a lot of data. Most of this traffic was coming in over SSL. So I decided to buy SSL certificates too. Those of you that are familiar with SSL probably know that I need what's called a wildcard SSL certificate. This is because I
20:02
don't know the subdomains that are going to be requested for each of these. You also probably know that wildcard SSL certificates can be expensive. A wildcard SSL certificate from Digistert is $595. Of course you get bulk discounts and other things. But if you just do easy math there, that's over 200
20:20
grand which I don't exactly have lying around for a fun little side project. I think I was able to find a solution. Some of you guys may have heard of them from other talks. StartSSL is a kind of unique model. You can pay a one-time fee and then get SSL certificates from them at no additional cost except for EV certs. So at a cost of $60 I was
20:42
able to get wildcard SSL certificates. Now, because this is such a unique model, StartSSL has a very manual process. They don't have an API for requesting certificates or anything like that which makes both requesting the certificates really, really annoying for 337 domains because you have to do domain verification on all of them.
21:03
But it also makes it so that you're more likely that someone's going to notice what you're doing. In fact, 17 of the domains were flagged for manual review by StartSSL who then approved all of them. By four different employees of the company I might add. Here were the 103 certificates that I
21:21
was issued. Now, those of you that are attentive, you might have noticed that I bought 337 domains yet I only got 103 certificates. There's a good reason for that. I'll try to answer this email. My login certificate was revoked for quote abuse. Please contact us. Reaching out to StartSSL I
21:45
received the following quotes. I'm sorry but for high profile names only the name owner should be able to get certificates for it and those resembling them closely never issued. Followed shortly by quote most certificates really shouldn't have been issued to start with. Oops. Here's the
22:02
actual excerpt from the StartCOM certificate policy and I'm paraphrasing here. StartCOM will not issue certificates whose domain names might be misleading or of well known brands or names that are part of requested host names such as Googleme.com. So I'm guessing the fact that I own domains with their example in it was probably not so great. I went
22:23
back and forth with Eddie Nigg, the CTO of StartCOM letting him know about the project and suggested revoking only the Google certificates. Here was the timeline on that. Yep. So I don't know how many of you have ever actually been to a standup comedy show but one of the golden rules is that you
22:41
don't take your phone out during someone's set. Mostly because you get made fun of by the comedian but also because it's really rude. I was at a comedy show with some friends and my phone started vibrating like I was getting a phone call. And then it kept vibrating for five minutes straight. When the set ended I took out my phone and I saw this. And in the end they revoked 81 certificates with two
23:03
emails for each revocation. Now I'd like to note that I had full wildcard certificates for all these domains for two months before they were added to the certificate revocation list. Not that that even matters. Some of you may remember this from Heartbleed. As a lot of browsers like Google Chrome don't even check the
23:21
certificate revocation list. Meaning that if I were malicious I could continue to use these certificates at the risk of being sued by StartCOM. I chose not to do that because I don't want to get sued. However they did leave 22 certificates on revoked. When I inquired about why they didn't revoke these I received the response that everything we haven't revoked so far was considered not so problematic and hence we left
23:42
them to expire naturally. Now you'll notice all the domains that are remaining seem to not contain the trademark name of whatever company it is. That's likely the reason they are left behind however I don't know that for a fact. Now moving on to the future. The EFF has just announced their free automated certificate authority called
24:03
Let's Encrypt. In addition so using something like that it may be a lot easier to do exploitation as it's a completely automated process and you don't have to worry about your domains getting flagged for manual review. They currently don't support wildcard certificates but are looking into it. You could also use a much larger provider
24:21
like DigiCert at a much higher cost depending on how much money you have to throw away at this. Now I suppose those certificate revocations kind of beg the question did anyone else even notice? Well the first and only public case of someone noticing was by a third party it was x8x.net he noted that all the bit flips of gstatic.com were suddenly
24:42
registered along with some other domains by quote the same individual with name servers at bitflip.com so at least someone is having fun. I thought that was it until I went to a friend's house and I went to check on the site and I saw this. I quickly logged into my server console and found that the server was alive and well. It turns out the
25:01
error that I was seeing was with DNS resolution. I figured I had just broken my DNS server or actually it was handling packets wrong so I hit the ever trusty Google DNS check and it worked. This is where things got a little bit weird. After a bit of further investigation and a few panic calls to friends in other states I verified that I wasn't
25:22
actually crazy. Comcast DNS server was refusing to answer requests for bitflip.com. Further investigation reveals that it's only for A and quad A records of the subdomains of bitflip along with the root domain. Now here's the weirdest part. The requests were still getting forwarded to my server however my answers were not getting forwarded back to
25:40
clients. This makes it really hard to pinpoint exactly when they put this block in place and I still don't know to this day. I tried to reach out to the business class support line never really got an answer. Short while after noticing the Comcast incident I got an email that my credit card payment for the server was declined for the month. A quick call to my
26:01
bank and I was even more confused. They said they were approving the transaction and that I had adequate funds. I reached out to ram node via a ticket that was escalated directly to the CEO that's the advantage of picking a small company who reached out to stripe their payment processor. He received a response that quote we have reason to believe that card has been associated with fraudulent activity. What
26:22
makes this odd is that card was only used with about five vendors. I can count them all in one hand and only one of them this server was purchased through stripe. What makes it really odd is that stripe not my bank was refusing this transaction. I actually reached out to stripe on my own independent of ram node and they said quote we are indeed
26:41
blocking at an RN due to a level of risk on this card that we're not willing to take. I know this is a very vague reason but for security purposes I'm limited in how much information I'm able to give out. Still don't have a solid answer on this one. Needless to say I paid for the month on another card and then paid through the end of August ahead of time in case it were to ever occur again. Now I want
27:03
to take a second and have everyone look at this slide here and see if you can tell me what's wrong with this slide. I put my bank on there. About two hours after the Def Con CD slides came out, I started getting password resets. Whoever
27:22
is doing this, I don't bank with them anymore. Nice try though. Now moving on to the part that you guys are all actually probably here for, the data. The first question people ever ask when I tell them about this is, is this actually even a problem? Is there even traffic to these bit flips? Let's start with a simple graph. This is the
27:43
traffic I received during a section of the time the server was running. Now some of you probably can't see that because it's a little small so I'll give you a number. I received over a million DNS queries every 24 hours for over a month. Now you'll notice that that graph is kind of all
28:04
over the place. There's a lot of reasons for that. Different servers caching my results for longer along with I actually transferred these domains, some of them away over time and so it's not the cleanest data set. Now of those million DNS queries, about 4.8 of them resulted in corresponding TCP
28:23
connections. Of those TCP connections, 85% of the initiated SSL connections completed the handshake and issued an HTTP request. Now those of you that were paying attention earlier might have noticed that I had 22 certificates out of the 337 domains. Those numbers just
28:41
don't add up. The best answer I can give you on that is a lot of people have misconfigured systems that are actually accepting SSL certificates that are for the invalid name. Because these are valid SSL certificates that I was serving they're just not for the domain that the user requested. Now what about those HTTP access logs? The server handled about
29:04
2.4 million requests and I was able to determine that repeat users, so users that had cached the result because of that 501 redirect remained cached for an average of 4.33 requests. So this means that after a flip occurred, wherever it may have
29:21
occurred in the request process, you continue to try to access jquery.js from me for the next 4.3 times that you visited that site. I can also tell you all sorts of interesting things from this traffic. I can tell you the most common languages that users had on their system. The graph on the left is the HTTP accept language header and the
29:45
graph on the right is the languages returned from the JavaScript. Interestingly it seems like the JavaScript executed on Chinese machines much more than any other language. Not really sure why that is. It's also interesting that most of the traffic was coming from, had a Chinese
30:02
language set and this actually doesn't match up with standard data sets of what most common accept language headers are. So for some reason bit flips were occurring more commonly on Chinese computers. I can tell you things like the most common screen resolution or IPV6 adoption which was
30:21
abysmal, about 1.67% of queries. I do have to note that I wasn't actually forcing users to try an IPV6 so this was just clients that preferred IPV6 over IPV4. I can tell you things like browser usage. You can see the Wikipedia, so Wikipedia's traffic over February is on the left. Their
30:42
breakdown of browsers versus the traffic I received is on the right. You'll see a significantly larger chunk is coming from Chrome. It could be because Chrome is doing more memory copies in each of its requests. It could just be because more of the users happen to be running Chrome. Along with a larger proportion of IE. I can tell you things
31:01
like OS usage. Same statistics, Wikimedia on the left and my traffic on the right. A large portion of traffic coming from Windows 7 and Windows XP with a much smaller portion coming from OSX and iOS. I can tell you things like cookies. I
31:21
have acquired 240,000 cookie names and values. Now these cookies could be anything from Google Analytics tracking data to actual session tokens for say your Gmail login. The top cookies were from Google Analytics, Baidu and for some reason weather.com. So I have literally thousands of users
31:45
zip code that they always look up weather for saved in my data set. I don't know why. I can tell you interesting things like here is someone browsing Amazon.com looking for an iPad and their session cookies were sent happily to me. Anyone want
32:01
to buy an iPad? We can look at things like here is someone trying to log into OAuth and their session cookies being sent to me along with their OAuth token. Or things that are more interesting like iPhones checking in to download, in this case just checking into the app store or trying to download an actual app from me. Some of you may
32:23
have seen some of the hacking team. Their research was into serving malicious applications to users that could be utilized in something like this where a malicious application is sent back instead of the application the user intended to download. As long as it's signed with a valid certificate, iOS will accept it and install it. Or we could
32:44
just have iOS devices asking me for software updates. I actually did some research into this and you could serve them different software updates instead. I also do things like the HTTP refer value. Here were the top Google searches that I pulled from the refer values. For some
33:04
reason a lot of people really want to know about would birthday gifts for their wife. I have no idea why. The other thing I pulled was local IP addresses. I was able to pull 158,000 IP addresses off varying systems. 12% of
33:20
those were non-private IP addresses. This means that those systems were likely directly connected to the internet or using a lot of them are IPV6 addresses which by their very nature don't go through a NAT. I can tell you the most common local IP addresses, super fascinating graph, they're pretty much all 192.168 primarily consumers.
33:41
Where it gets more interesting is we look at other traffic. For example, here's the SMTP traffic that my server received. And you can't really see it out there. It's kind of small text. The numbers on the side are 20,000, 40,000 and 60,000. That's how many requests I was receiving each day. That's
34:02
someone trying to send an email, a bit flip occurs and it gets sent to me instead. Now if any of you have ever worked on mail systems, you probably know that SMTP doesn't really work well with TLS. Pretty much everyone uses self-signed certificates and even if you weren't using a self-signed certificate, I actually own the certificates for these
34:22
sites as well. So it doesn't even make a difference. And you'll notice that big jump there at the end up to about 60,000, that was June 19th where Hotmail cashed my entry for iCloud.com and sent all emails that were intended to go to iCloud users to me instead. Oops. In continuing with
34:42
the SMTP theme, I was looking through the traffic and I set up a query that would show me where the origin BGP route came from. And I found that 38% of my traffic was coming from AS 13414 which is owned by Twitter. Primarily those two
35:01
subnet ranges. And you'll notice that pie chart there is for what the DNS requests were coming from them. They're all for iCloud.com. As it turns out, these are mainly MX and A record look ups and they were resulting in SMTP connections. These DNS queries resulted in about 390 connections attempts per day and based on this info, I'm
35:23
guessing that what was happening was a bit flip was occurring and Twitter was trying to send password reset emails, promotional emails and other things to users with an iCloud email address and sending them to me instead. I reached out to Twitter who said after some discussion it looks like we're trying to restrict outbound traffic from our network to bit flip domains. This should address the specific
35:41
problems you outlined without having to worry about the domains and who owns them. Now, how did I actually run all these queries? I wrote a tool called BF Splunk. It's source types for Splunk and a bunch of queries. It will be listed on the site at the end. I'd love to say Splunk sponsored this talk but I never actually called the sales
36:02
person back. Oops. Now, how do you actually remediate this? Buy your bit flips. You probably already buy the typos of your domain. Buy your bit flips as well. It's really not that more expensive and it can actually save your users a lot of problems. You don't have to actually answer them. You don't have to do anything. Just own them so that someone else
36:22
doesn't. The other thing you can do is use ECC memory and set up an RPZ for internal flips. However, that doesn't protect your users. That just protects users within your network. Now, before I get into the data release, I want to talk about what I did with the domains. I reached out to all the companies whose domains I purchased for this talk and offered to transfer the domains to them for free. The
36:43
first company to get back to me was Salesforce. I wanted to take a second to really give props to their team. They had a response with authorization to transfer the domains in under two hours and the domains process was initiated in under 24. Next company was Apple. The domains were transferred in about two days. Following them was Amazon AWS with the domains
37:03
transferred in a little over two weeks due to some scheduling issues on my end. Facebook followed them closely by a tad over two weeks. Microsoft, they're being transferred right now. Now, those are the companies that accepted the transfers. Some companies like Twitter actually said that they
37:21
didn't want the domains. Here was the exact quote. We don't actually try to prevent bit flipping attacks by registering all the nearby domains due to the fact these attacks are relatively rare and that we own a lot of domains so this would be quite an undertaking. So we're not interested in acquiring the domains you have. Please just maintain possession of them until they expire, which they
37:40
have. The next company was Google, which after three weeks had a very similar response saying their domains team was not interested in purchasing them. Those 154 domains are now up for purchase. Now, the companies that did actually transfer them, some of them didn't actually do it right. Here's the who is info for two of the domains. They're
38:03
still pointing the name servers to me even though they own the domain. So I'm still getting all the user traffic even though they own the domain. I'm not going to call out the companies that they are because I just noticed this about ten minutes before the talk so I haven't given them a heads up. Now, on to the data release. I'm happy to
38:20
say I'm releasing complete logs in JSON format. So complete DNS logs for every single query I received. They contain the source, the port, the queue name, queue type, pretty much all that. The info is self-explanatory on the site along with anonymized web server logs which contain things like the user agent, the accepted language header, the
38:42
HTTP host they were intending for, the method. They do not contain the URL or the URL query and they contain a hash of the source address so you cannot tie it back to a particular user. In the same vein, I'm releasing anonymized SSL and SMTP logs which both contain the same hashed source IP. And finally, here's my contact info along with the
39:02
project website which contains all these data dumps. At this point, I think I have a very short period of time for some questions if anyone has any. Did anyone offer a bounty
39:22
or anything to help you cover the cost? No. And I wouldn't have accepted it if they did because that puts it in a bit of a, it's already a gray area legally and that can push you if you're accepting money for it or making a profit off of it and can push you a bit more. I did get some care packages from companies though. Question for you. Did you do anything
39:42
to look at double bit flips to get an estimate for just how frequently these happen? Is it one out of every billion queries, trillion queries, what order of magnitude? So you're referring to when two bit flips occur? Yeah, using the ratio between two bit flips and one bit flip, you could figure out the ratio between one bit flip and zero. I did not, however, I did receive, if you look at the
40:02
data, a lot of the queries I got where a bit flip had occurred, lots of other bit flips occurred in the actual query. So in the URL would be malformed as well. Indicating probably that those users have really messed up memory. So I don't have statistics on that. It would be very interesting to look into that. Any other questions? Does
40:33
enforcing name server bailiwick have any effect on your responses? Can you repeat the question? Does enforcing
40:40
name saver bailiwick, does that have any impact on your responses? I do not know. Any other questions? How did you
41:01
figure out who to contact in these companies to get quick response? How do you know who to e-mail in Google or Microsoft about this, just me and someone on the outside? So all of these contacts out to the company were sent to the security at alias or to whatever information they had
41:24
listed if I just Googled the company name and security. A few of them did have to get pinged through personal connections to make sure they actually got handled. All right. Thank you very much.