CRYPTO AND PRIVACY VILLAGE - Hiding in plain sight: Disguising HTTPS traffic with domain-fronting

Video thumbnail (Frame 0) Video thumbnail (Frame 10781) Video thumbnail (Frame 15128) Video thumbnail (Frame 26847) Video thumbnail (Frame 30611) Video thumbnail (Frame 32185) Video thumbnail (Frame 36938) Video thumbnail (Frame 48157)
Video in TIB AV-Portal: CRYPTO AND PRIVACY VILLAGE - Hiding in plain sight: Disguising HTTPS traffic with domain-fronting

Formal Metadata

Title
CRYPTO AND PRIVACY VILLAGE - Hiding in plain sight: Disguising HTTPS traffic with domain-fronting
Title of Series
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
2018
Language
English

Content Metadata

Subject Area
Code Multiplication sign 1 (number) Client (computing) Mereology Data dictionary Information privacy Public key certificate IP address Web 2.0 Direct numerical simulation Facebook Malware Different (Kate Ryan album) Arrow of time Information security Thumbnail Email Software developer Database transaction Staff (military) Bit Substitute good Connected space Type theory Message passing Telecommunication Order (biology) Website Normal (geometry) Right angle Quicksort Web page Implementation Server (computing) Firewall (computing) Transport Layer Security Web browser Term (mathematics) Energy level Proxy server Sequence diagram Metropolitan area network Plug-in (computing) Domain name Home page Noise (electronics) Demo (music) Information Chemical equation Software Personal digital assistant Point cloud Communications protocol Abstraction
Point (geometry) Authentication Domain name Server (computing) Greatest element Transport Layer Security Software developer Client (computing) Mereology Cartesian coordinate system IP address Web 2.0 Message passing In-System-Programmierung Software Personal digital assistant Encryption Communications protocol
Multiplication sign Non-standard analysis Client (computing) Coma Berenices Stack (abstract data type) Public key certificate Web 2.0 Direct numerical simulation Facebook Malware Semiconductor memory Single-precision floating-point format File system Extension (kinesiology) Email Software engineering Block (periodic table) Structural load Flow separation Demoscene Connected space Message passing Process (computing) Befehlsprozessor In-System-Programmierung Googol Internet service provider Chain Order (biology) Website Right angle Reverse engineering Content delivery network Domain name Slide rule Server (computing) Game controller Implementation Link (knot theory) Transport Layer Security Router (computing) Proxy server YouTube Domain name Information Content (media) Generic programming Cartesian coordinate system Software Web-Designer Point cloud Routing
Web page Virtual machine Parameter (computer programming) Function (mathematics) Client (computing) Information privacy IP address Mathematics Root Different (Kate Ryan album) Core dump Gastropod shell Flag Mathematical optimization Exception handling Domain name Email Touchscreen Interface (computing) Content (media) Plastikkarte Instance (computer science) Connected space Radical (chemistry) Message passing Arithmetic mean Digital photography Software Hash function Personal digital assistant Summierbarkeit
Sensitivity analysis Multiplication sign Combinational logic 1 (number) Client (computing) Information privacy IP address Public key certificate Web 2.0 Encryption Diagram Recursion Vulnerability (computing) Personal identification number Email Electronic mailing list Connected space Digital photography Message passing Internet service provider Order (biology) Chain Website Configuration space Normal (geometry) Lastteilung Quicksort Reverse engineering Content delivery network Web page Dataflow Server (computing) Mobile app Backup Implementation Real number Causality Root Proxy server Address space Domain name Default (computer science) Multiplication Dialect Validity (statistics) Weight Denial-of-service attack Line (geometry) Cartesian coordinate system Personal digital assistant Point cloud Computer worm
Content delivery network Domain name Dependent and independent variables Email Key (cryptography) Connectivity (graph theory) Firewall (computing) Data mining Facebook Malware Roundness (object) Different (Kate Ryan album) Order (biology) Point cloud Pressure Communications protocol Resultant
all right like to welcome our next speaker Matt's going to be speaking to us today about hiding in plain sight disguising HTTP traffic with domain fronting take it away Matt thank you hi
my name is Matt I'm a software developer come here my first time at Def Con can
everyone hear me up the back I heard there's some sound issues but lots of thumbs in the air thanks very much cool so a lot of a lot of us have probably heard about the main fronting it got a fair bit of attention recently when signal messenger made some noise when I think Google said hey we're gonna put an end to this so there was lots of questions about what is it there was an academic paper floating around which I don't really understand how to read papers like that so I asked my friend to help me out and he explained it to me and I sort of figured it out so I thought let's bring it here and explain it in simple terms to maybe people who don't even necessarily know what TLS is or just have rough I understand Excel will be maybe to some people going to a basic level that way more people can understand so hopefully we'll get the right balance so what is it I put on my dictionary right a hat and so it tried to come up with a dictionary definition and it's abusing an implementation detail of shared infrastructure to disguise the true destination of a HTTP transaction so there's a few there's a few keywords in here implementation detail is one big one and shared infrastructure is another and we'll go into why these these parts are important later and HTTP as far as I know there's probably not really any other protocols that share different paths to different things within the same place the connections get terminated at domain parties not new it's been around for quite a while it didn't do an exhaustive search but it has this is the oldest thing I could find that uses it it's some kind of proxy that's written in Python and it's some that the code the code off github this the commit message says abstract host substitution maybe that's the old name for it cool so let's look at some of the advantages of domain fronting what sorry I set up sorry uses the main funding I mentioned signal messenger they primarily seem to use that for bypassing censorship Blanton seems to do the same I think that's some kind of a VPN and then there's a unique client which is a no VFS proxy for tor I presume pretty much everybody here's heard of tor which is usually used for you guessed it bypassing censorship but also for evading detection so the OB FS proxies there's different plug-ins for tor basically that let you pretend you're different types of traffic all kinds of interesting ones I wish I could go into them today but that that's a huge rabbit rabbit warren and Manuel way they'll be bypassing censorship obviously censorship isn't just a country saying no you can't go to this website that's also cooperation saying no you're not going to Facebook at work and also evading detection malware doesn't want to be going to hey you just got hacked comm and so advantages why is it useful well it it can't be detected without breaking TLS someone has to actually perform some kind of a man-in-the-middle attack in order to be able to see that you're doing that so so one potential case here would be if I'm in if I'm running in a corporate network which doesn't want to invade the privacy of their staff by intercepting their TLS communications I won't be able to tell that this is happening so of you know maybe a piece of malware could use it to to evade my firewall from detecting it it uses an existing infrastructure on the server side so when I've been playing around with this it's mostly CD ends like cloud fronts and not CloudFlare anymore and not Google Cloud anymore either moron and it's compatible with anything that can be tunneled over HTTP so if you can think of a way of tunneling your traffic over HTTP you could use domain fronting to to send that traffic around and it's it's very simple and easy it sounds complicated but once you once you realize how simple it is you you'll just walk away and go cool I'm gonna go do that at home it'll take you five minutes okay where's it not useful bypassing censorship where the TLS is man in the middle so if I break your TLS session I can see you're doing this I said that before and it's not useful for hiding web traffic in a normal web browser you can you can't do it in a normal web browser but your each web page you go to is going to load things from all kinds of different places you need to find suitable domains to hide behind and not every website would have domains that match will so yeah it needs to be on a shared infrastructure and the front domain for that that destination needs to be known so I I'll explain what I mean by front domain a bit later when I go into the demo and the security is void the reason for that is you need to basically trust a different certificate to where you're actually talking to again we'll see more in the demo so how does it work well let's go back to the days of HTTP 1.0 and you know you obviously when you're your web browser was doing you're trying to browse - WWWE we're come and you do a DNS lookup you do your DNS server returns the server address and you connected our web server and say hey get me the home page and it returns it the problem here of course is we needed one IP address per domain we were hosting so if you wanted to host both you know website a calm and website be calm you needed to have two different IP addresses and some of us couldn't do that or you know and then we obviously want many millions of websites maybe we would run out of ipv4 even faster than we currently are so they introduced the host header so basically that just changed this sequence diagram slightly but not at all really it just added this extra piece of information on the on this the second green arrow here where
we just specify hey I want to go to website a or website P that way that way we could share that IP address across both of them but is 2018 and we encrypt web traffic always supposed to so you know we this thing called TLS I guess was its original name sorry SSL was this original name and now known as TLS basically just takes HTTP as it was wraps it in a secure layer that validates you know where were you talking to in some cases you can validate you yourself if you decide to do bi-directional authentication you know am i talking to my bank or am i talking to someone that claims to be my bank because it takes care of that for you and it encrypts the traffic between you and the server as we know and without without changing the protocol so as the application developer doesn't matter if I'm using HTTP or if I'm doing IMAP or pop or whatever you know TLS was designed to just go over the top however sorry you're getting ahead of myself here so TLS handshake is pretty complicated and I don't understand anything but the first two that's happening there and and I don't and it's not not irrelevant for domain fronting basically what happens here is I use here let's handshake you your client says hey I want to talk to you and then the server goes sure I am your bank definitely your bank not someone else and they do some kind of special packet dance and arrive at some encryption keys that they both agree on and from from there everything gets encrypted right so it's a layer it's hidden from me as the application developer but everything from that point is encrypted and as the application developer it just looks like the same protocol that I came up with in this case HTTP so forget about all that stuff that I don't know and we pretend it's not there and just focus on the first couple of messages now if I'm if I'm sitting on a network watching your traffic okay so maybe I made your Starbucks Wi-Fi maybe I'm your ISP maybe I'm the government maybe I'm your your sis admin it add your company I can't see what's happening down the bottom here that part's encrypted obviously but what I can see is at this these first these first few messages are unencrypted the client hello actually has a part at the beginning that says hey I want to go to google.com and the reason for that is because at the beginning when we get when we look back at our HTTP 1.0 versus
HTTP 1.1 is I've totally messed this up I'm sorry so the reason for this is when
I say hey I want to talk to you TLS the server immediately needs to know who it is representing so if I have website AE on website B both on the same server and my client says hey I want to talk to you less I need to send it back it back a certificate to say hey I'm website a or website B so that meant TLS had to be extended slightly to identify which server we were talking about you know am I talking to google.com am I talking a yahoo com so that got it's a SNA extension so I I think when did the xbox360 cannot that didn't support it I just know that if you if you want to support xbox360 it doesn't support this you need to have one IP per so that's roughly how long ago got introduced but it wasn't a standard thing so saying I'm on your network I'm sniffing your traffic and I'm watching you use HTTP I can see that you're going to Google comm because they're sitting there in that very first message that you send in your TCP session to Google so just to just to recap that the entire contents of the HTTP session are encrypted which is all that stuff for web developers maybe early on certainly early on in my web development career none of this was clear to me like where all this stuff went it just somehow magically arrived and then you got it from dollar post in PHP somewhere and then what's not encrypted is the domain name of the server we're talking to that's exposed by SNA so that's how you even though you're using HTTP your ISP can see what you're what you're looking at and the service certificate also isn't encrypted and I can't remember but I think TLS 1.3 is supposed to encrypt the server certificate anyone here no no it does thanks cool so let's take a look at HTTP stacks so with domain fronting what's important about or what makes it work is the implementation of the HTTP stacks and how they usually get shared across customers say if you're using Amazon CloudFront so very simple single server stack like let's say I'm when I was 14 I used to run a web server under my under my desk which was just Apache and it basically it's this this first thing on the left so sorry the first thing on the left is a client and then we got the web server which is like Apache nginx IAS whatever and that pulls stuff from an origin and that's generic because the origin can be a filesystem it could be a rails application it could be a django application doesn't matter it's this is just to show the separation of concerns so you see this webserver is responsible for HTTP a HTTP TLS and and caching and when you're running one website it's it's all it's all pretty simple and yet what we like to do usually for performance reasons though is delegate the TLS so TLS is you know reasonably heavy on the CPU and perhaps what you're doing in your application might be very heavy on the memory so we often often we split this out and this is often what's happening at somewhere like cloud front or Google cloud where they have lots of customers and they want to have a very fast endpoint with an IP that lots of money lots of customers use that just does the job of terminating those TLS connections and getting them to the next link in the chain which is the next web web server that goes and then fetches the content and so when we've delegated it this is where I can steal that little NSA graphic I I love so this is how how I think Heroku kind of works is that's a shared infrastructure provider I haven't tried if domain fronting works through Heroku yet that's probably something I'll do right after this but you basically you talk to that reverse proxy the whose job is hey let's terminate its he LS here and then it goes to something else whose job is look at the host header in in the request and routed to whichever one the host header matches so what what happens in the CDN networks often is the reverse proxy decides what certificate to send back to the client based on the sni header that thing you saw in my wife shark screenshot and then the next link of the chain decides where to send it based on the host header on which they don't always have to match and in domain fronting they basically we deliberately make sure they don't match so if we were to make its I'm sorry this is way too small to see on the screen but if who to make a request to www.carmensognonvi.com/newsletter and what will end up happening is if that reverse proxy that should satisfy the TLS session then the router would look at the host header and go uh I want to go to evil comm and this seems to be roughly how it works behind the scenes at Amazon unfortunately I don't work there I don't know anyone that does which I would love to know why it works there so already I've already ruined that one and again remain remembering what we saw in the in the Wireshark I would only see if I'm sniffing the network I would only see the packets going to good comm why does it work an anonymous Google software engineer said it worked because of a quirk of our software stack and Google has since done what they can to get rid of domain fronting so it highly depends on implementation in order for it to work the shared infrastructure must not check for a mismatch between sni header and host header apparently that's what CloudFlare does to stop it and it kind of makes sense to do that and also HTTP requests must be routed separately to how TLS through the TLS so both those layers need to be dealt with separately so to put it together what do we need to actually do all we need to do is connect like normal to one host and set the host header to another and they've got to be on the same infrastructure that can route between them so we find evil.com needs to be accessible via the same infrastructure or something innocent-looking and the infrastructure needs to have the right information implementation quicks so let's say there's plenty of websites on a on a popular CDN like jquery's on a CDN and that same CDN I could go and sign up for myself and put my malware command and control server behind that that CDN which would allow
me to use somebody else's domain who uses that same CDN to go there I keep giving away that so I find it finding that there's there's loads of web sites there we go if we go to say Alexa top top 500 we should be able to just do reverse DNS lookups to you know google.com to this whatever this customer is using a comma I think Facebook uses Akamai also would be a good another good one to try and they're easier to find is sign up for so what makes some domains better than eldest it depends really what we're trying to do if we're evading detection like malware we want to have something that looks pretty business as usual you know if if I'm in a company that sells apples it probably doesn't look very suspicious if I'm going to fruit calm or maybe something innocuous you know if I'm if I've infiltrated a company and I'm trying to exfiltrate some data and that company is also a marketing company that uploads a lot to YouTube maybe I could hide my stuff through youtube.com if I am you know in maybe a country that blocks blocks access to sites which is which is signals problem is they were they were in a country which was blocking blocking their messenger and they also apparently block websites they chose a web a an e-commerce website which they thought would be would would have collateral damage so if that country were to block that site that would that would negatively impact that the country as a whole so perhaps not the Newton perhaps be a reason why that wouldn't get blocked and then maybe you could do a common you find something that's you know it looks like business as usual it's innocuous and it's got collateral damage probably a really good one to go with so I've talked a lot and I'm gonna keep talking but instead of boring slides let's fire up this SSH session and I'm gonna try and talk and hold them like at the same time at once again
all right so what we're gonna start with is I have a root shell here and I'm gonna start off a TCP dump so you can see the traffic that's actually leaving this machine so this is just an empty ec2 instance in Amazon so there's there should be hopefully no other traffic on 443 so this command here for those not familiar with TCP dump TCP dump i takes network traffic that's going through your network card and logs it somewhere and I'll briefly go over the arguments here minus c4 means stop after I've seen four packets and so there's three for three for the TCP hands-free for the TCP handshake and then the fourth packet is our first message from client to server which is a client hello which has the SLA header in it - a means give me ASCII output show it straight to the terminal and is like a performance optimization
like don't do a reverse lookup to the IP address and I our interface f0 and then we're looking for TCP port 443 so let's run that and then switch to another screen and then we're just going to run a curl to pick some Dell photos it's like a lorem ipsum before pictures and so curl - s just means shut up don't do anything except get that and giving the output to stand it out and then we're going to pipe that to md5 son so we can look at okay what was the content I actually saw so the content I actually saw in this case or the hash of it is you know c8 e EA D so the purpose of this is just basically go get me this webpage make a hash of it so I've got an idea of what I saw and then let's get another webpage so in this case protect your privacy now calm and obviously different hash so these are the two hashes and then what we're going to do is take a look at when I when I do a domain fronting which which one I get so hopefully this makes sense now just realize I've messed up ITC feed up okay so only the first only the first curl request got shown up only the first curl request got dumped here because it stopped after four packets so you can see right here this is this is my sni header that says hey I'm going to pick some doll photos and then let's run the TCP dump again and switch to the other screen and run I'll just run this curl command here so when I'm going to protect your privacy now calm you can see on the other side you've got in the same roughly the same place I'm going to protect your privacy now calm alright and it's in my history because there's testing it out cool so basically let's do some domain fronting and turn over to let's get TCP dump running again okay so all we've done here is we're still establishing our TLS connection with protect your privacy now calm so you can see that most of this curl command is just the same as the one above it the different the only difference here is we're telling curl with the minus H flag change the host header to pick some photos and so when I run this the the md5 sum of what came back is actually pixel photos but if I go and look at the other side where I've dumped the packets it actually just shows protect your privacy now calm so when I said it was really easy and you can go and do it if I've in five minutes at home there it is
cool so there's a few risks here obviously one of them is the reliability when I try going to protect your privacy now calm I don't know that they're going to keep it on that same CDN as Pixum DUP Pixum comm so pics in the photos they
could change that at any time and they could point it somewhere else so a potential solution to that is you have a list a backup that means if you were shipping an app that relied on two main fronting maybe you have a list of other ones that still work but a bigger problem is you can't validate that the server's authentic because if you're connecting to protect your privacy calm but then really intending to talk to pick some doll photos the server you validated who you're talking to is protect your privacy calm not the photo site that you're trying to go to and and also since you're connecting to that place that you don't you didn't set up yourself they could change that certificate or CA at any time so you can't pin it either which you'd normally be able to do if it was you know it's a self-signed cert or whatever so also the traffic is visible to the infrastructure provider so say if you had traffic that you needed to hide from Amazon CloudFront for example or or it was sensitive because you established your connection with site a that infrastructure provided just by default is able to see what what you did it's it's kind of obvious if you used to using a CDN but it's not that obvious sometimes when you when you do these tricks I don't know I guess I forget about all these sort of things and sensitive data could be stolen and malicious payloads could be injected so basically just treated as an unencrypted connection encrypt and say and all your messages maybe come up with your own way of validating that you're really talking to who you think you're talking to cool so I actually saw a DOS attack happen via domain fronting it was a combination of some not quite configured correct it not quite correctly configured infrastructure and domain fronting so what happened was we had a very there was a very normal web server set up you know you start with one web server you refactor your application then you make it to web servers and then you you know you yet you eventually end up with a cluster of them and then you end up going ok we need to see the end and so it was put behind a CDN and so they marked the CDN IPS as trusted proxies to the web server as you do because obviously TCP connection hits the CDN by end point and then the CDN makes another connection and then attack is found a nice slow web page and they decided to make thousands of requests to it but they use domain fronting which in the case of the CDN that was being used and for some reason was going between two different IPS so it was going from one CDN endpoint to another and then to the web server which it didn't expect so basically the connection diagram would look like this you client way up to the left connects to reverse proxy that's one TCP connection and the reverse proxy then connects to the web server so normally because the web server can only see the connection from 2 to 2 to which so for those that can't see the the IP addresses there bit small so the web server sees the reverse proxies IP address from that actual TCP so the reverse proxy usually adds into the x-forwarded-for header hey I'm forwarding it for one dot one dot one dot one and and the web server knows that because did actual TCP connection comes from two to two to that it can trust that one more Mon one is really who it's talking to and that's why I just so in this case what what was happening was the user was hitting the front domain the the domain fronting domain that was hyped being hidden behind that added x-forwarded-for one one one one it got forwarded to the other one which added another x-forwarded-for header and then passed it up to the web server who didn't really understand that there could be more than one so that allowed that basically opened up an IP spoofing vulnerability because the web server was thinking that the users true IP was in fact the IP of the CDN endpoint so what we expect to happen usually is that header has one address which is a true IP of the user or at least that's what the web server was expecting and that the web server reads a header and it knows who was talking to in actual fact there was - we didn't understand it and it assumed it assumed that the the CDN was the real user it got worse because we had failed to ban configured and it ended up banning the whole CDN and taking the site out because they thought this idiot was tossing it and yeah the root cause miss can't miss configuration we'd forgotten look at the look at the full list and because it went through domain fronting which is a a way that it wasn't expected to work it invalidated that that assumption so in order to plug that know your infrastructure X 44 is actually a de-facto standard your infrastructure provider might do it differently to how you expect think CloudFlare I think has a defect C of connecting IP or something is completely different and wherever there's a possibility to get proxies multiple times make sure that the chain of your first the chain is trusted so you know if you've got a CDN endpoint that maybe hits another one that hits a load balancer that hits another load balancer it needs to be able to draw a line between all those endpoints and go okay I trust this one which trust that one and and find its way to the the true IP so in that case nginx is I don't know the other ones unfortunately but it's real IP recursive if you want to google that and cloud flows had a connecting IP so the future of domain fronting I don't think it'll last much longer cloud cloud flare it's already said they're getting rid of it it can't be relied upon its implementation details which could change like I mentioned before Amazon might just suddenly start doing something differently one day net lafay blue goal there's CDN providers whatever you use they might find a more efficient way of doing it or decide to get rid of it different regions of the world might use different infrastructure I think I noticed this when I was in China I went
to Apple comm and I was like I wonder why that's going and I had look it was completely different to the result I was getting from Perth it was back in my in one place at some other CDN elsewhere so if you were writing something that needs to be used all over the world I they'll be there might be pointing to different infrastructure and it's also actively possible sorry it's possible to actively prevent it from working which is cloud players deliberately doing this by checking for a mismatch between the s and I and host header or by using s ni for rerouting the requests if you remember the several layers of where it decides on its final destination it could just simply use the s ni header which I believe was the intent behind HTTP to it seems I'm wanted Amazon and Google have Peter of respondents of pressure against it Google's already broken it deliberately Amazon said they will and cloud for for some reason says it's a risk to their customers it would put our traditional customers at risk I presume that means people behind corporate firewalls they because those corporate firewalls can't less thoroughly ban malware traffic if it's using domain fronting so domain fronting may be not that useful for making your own personal HTTP well private so a couple more suggestions here TLS 1.3 has explored the possibility of encrypting the SMI component it didn't make it into the spec because it required an extra round trip in order to there was actually two proposals one was an extra round trip to negotiate a key first the other one I think was based on a static key I didn't really pay much attention to it however I did read recently that there was there was work done on this by CloudFlare I think and this was only in the last couple of days I think that came out use a commercial conventional tunnel like your VPN if you set the tunnel up yourself Tunnel traffic other ways there's there's other there's other protocols out there for achieving this I've seen some pretty obviously old-school malware command-and-control IRC a friend of mine tried to do IP of a facebook Messenger that was pretty horrible HTTP over XMPP and just coming up with crazy silly ideas can be can be pretty fun but I hope you all walk away from here understanding you know how I said I facilitate sharing my infrastructure that uses TLS you know working around that problem of needing one i paper per host how third parties can see where your HTTP traffic is is going without doing a man-in-the-middle attack how to find domains and actually do domain fronting yourself and and also how do how to protect yourself from misconfigurations like like what i saw before where basically the infrastructure just didn't do what what was expected thanks thanks for having me any any questions [Applause]
Feedback