Let's Dance in the Cache - Destabilizing Hash Table on Microsoft IIS
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 85 | |
Autor | ||
Mitwirkende | ||
Lizenz | CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/62238 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
DEF CON 306 / 85
24
28
29
47
51
53
59
60
62
70
72
75
80
84
85
00:00
Prozess <Informatik>TabelleHash-AlgorithmusWurzel <Mathematik>Interaktives FernsehenDifferenzkernPhysikalischer EffektSoftwaretestWurm <Informatik>BimodulProgrammierumgebungAlgorithmusZeichenketteParserSchaltnetzAbfrageSchlussregelBildschirmfensterParametersystemGemeinsamer SpeicherZweiDifferenteMaßerweiterungKontextbezogenes SystemResultanteInelastischer StoßImplementierungAuthentifikationProxy ServerZahlenbereichFunktion <Mathematik>PasswortElektronische PublikationE-MailCachingExogene VariableMathematikBenutzerbeteiligungProdukt <Mathematik>ServerInformationSkriptspracheMinkowski-MetrikLoopBitrateEinfach zusammenhängender RaumATMZweiunddreißig BitGanze ZahlDatenstrukturWiderspruchsfreiheitSoftwareentwicklerDefaultSchlüsselverwaltungMultiplikationsoperatorDatensatzIdentitätsverwaltungFunktionalSoftwareNichtlinearer OperatorCodeMusterspracheRechenbuchDämon <Informatik>GarbentheoriePhysikalisches SystemProgrammfehlerDatenfeldRichtungSoftwareschwachstelleVerzeichnisdienstRegulärer GraphFaltung <Mathematik>ExploitEreignishorizontMathematische LogikSpieltheorieDisjunktion <Logik>Service providerGrenzschichtablösungURLArchitektur <Informatik>Dynamisches SystemSystem FHydrostatikFlächeninhaltToken-RingTopologieFront-End <Software>LoginPatch <Software>TypentheorieBefehlsprozessorStapeldateiMAPNichtlineares GleichungssystemGeradeMultipliziererBetriebsmittelverwaltungSchnittmengeMailing-ListeAggregatzustandDivergente ReiheRekursive FunktionComputersicherheitKonfigurationsraumEinfügungsdämpfungMultiplikationKartesische KoordinatenStrategisches SpielCASE <Informatik>HauptidealRandomisierungBenutzerfreundlichkeitt-TestKomplexitätstheorieKernel <Informatik>Dienst <Informatik>ResamplingPrimzahlNebenbedingungNeuroinformatikElement <Gruppentheorie>HalbleiterspeicherAutomatische IndexierungFormale Sprachesinc-FunktionProgrammierspracheObjekt <Kategorie>Güte der AnpassungMechanismus-Design-TheorieChatten <Kommunikation>Offene MengeSchwellwertverfahrenPrimidealMereologieDatenparallelitätQuellcodeLinearisierungVerschlingungDemo <Programm>SpeicherabzugFundamentalsatz der AlgebraKomplex <Algebra>InformationsspeicherungZusammenhängender GraphLastInjektivitätUmwandlungsenthalpieCoxeter-GruppeInverser LimesSISPSynchronisierungPseudozufallszahlenReelle ZahlDoS-AttackeVererbungshierarchieComputerspielDreiecksfreier GraphWiederherstellung <Informatik>Einfache GenauigkeitProgrammierungDatenbankAnwendungsdienstanbieterArithmetische FolgeMapping <Computergraphik>Open SourceExchange <Programm>SoftwarearchitekturExchange ServerInformatikIndexberechnungThreadOffice-PaketFehlermeldungÄquivalenzklasseTaskPolygonzugRoboterKorrelationsfunktionBeobachtungsstudie
Transkript: Englisch(automatisch erzeugt)
00:00
Welcome to the five o'clock track three. Unfortunately, OrangeSci cannot be here today. However, we are going to try to stream, I'm sorry, play his presentation for you guys. All right, enjoy. Hi, sorry I couldn't be on the stage because of the issue of Visa. I hope one day we can meet face to face in Vegas. About 10 years ago, I failed my algorithm course
00:30
at university, but I'm not here to talk about the hash table. This is a big adjustment for me. I'm Orange, and today my topic is Let's Dance in the Cache,
00:46
this tabulating hash table on Microsoft IIS. Before the talk, I would like to give you a short case. If there is a super-secret area, which is protected by a super-secure password, you cannot log in only if you know this long password.
01:09
However, under the design of IIS, all of these passwords are valid and can log into the system. Amazing, right?
01:22
I guess you might have several questions in your mind. This is one of the attacks I will present today, and you will learn the details later. Hi, I'm Orange, and now the principal security researcher at DevCorps. I am a zero-day researcher
01:42
and focusing on web and application security. My job is to discover the most severe bugs and attack service that can impact the world ahead of time. I'm Orange, and today I'm going to talk about the attack service that can impact the world ahead of the bad guys. My research has also received
02:08
several awards, such as the best server-side bug of the Pwning Awards and the champion of Pwn2All. If you are interested, welcome to follow my Twitter or chat with me later.
02:25
This is the outline today. First, we will introduce several essential concepts. Those concepts are important to let you into Microsoft IIS internals and our topic today.
02:42
After having a good understanding of IIS internal and the cache mechanism, we will elaborate on our research, followed by three different types of vulnerabilities. Also, we will demonstrate how we apply the attacks to Microsoft Exchange Server as a showcase. Of course, we will provide mitigations,
03:10
recommendations, and future works to those components. So first, what is the hash table? Hash table is the most fundamental data structure in computer science,
03:31
and its purpose is to store data. No matter what the data is, hash table can hold the data in memory
03:41
very well. Hash table is especially good at handling the key-value pair and can operate data with a high performance. Due to those benefits, hash table is used extensively
04:00
in computer infrastructures, such as the operating system, programming language, database, or web server. Here is an example of the hash table usage in programming language. Programming language uses the hash table widely. However, you don't know because they wrap the hash table into a more
04:27
high-level structure such as the array, dictionary, or object. So if you are a developer, you must have used hash table. As the internal, I assume that everyone here is a good student at university,
04:48
so I skip the hash table internal this time. As for what is the hash folding attack? Also, we are not going to talk about this attack only.
05:04
We still give it a slot because I believe that the hash folding attack is the best case to learn the mindset of how an attacker abuse the hash table. The idea of hash folding attack is to drop all the records into the same bucket.
05:26
Since the hash algorithm is public, an attacker can predict its hash and make all records fall into the same bucket. At the time, the attacker degenerates the hash table, which can store
05:43
numerous records into a single list. For example, the attacker has cracked several malicious records on the left side. The hash function in the middle will process and convert the record into an index.
06:06
The first record is converted to the index 4 and put into the corresponding bucket. The second is also put into the same bucket. The third, the fourth, the fifth, and the attacker can drop all records into the same bucket,
06:32
and the table now becomes a single linked list. So the hash folding attack can make the table always sit on the worst case.
06:45
That means for inserting n records, the time complexity comes to n squared. We should all agree that n squared is in a terrible performance.
07:03
Okay, with a good understanding of the hash table, let's discuss the relationship between the hash table and IIS. First, IIS loves hash table. As the well-known and the only web server in Windows, IIS uses the hash table extensively
07:24
in its architecture to store information such as HTTP header, server variables, configuration, and cache. Due to the massive use of this data structure, Microsoft has designed its own hash
07:43
table implementation from a very early stage. The following two are all the implementation from a very early stage. The following two are all the implementation that will be used in IIS.
08:01
One is called tree hash table, and the other is called allocator hash table. The tree hash table should be the most common, like the simple code in your textbooks. It uses the linked list to resolve the collision and rehash the table when it's in an unhealthy
08:25
state. The rehash is a series of rebuilding processes. It first enlarges the table, recalculates all records, and reposition them to their new index.
08:41
As for what hash function is used, we will have a future discussion later. The other table is the allocator hash, an implementation that aims to build a scalable and high concurrent hash table under the multi-threading and multi-core environment.
09:03
The name LKR is an acronym for the name of the creators. This implementation was also patterned by Microsoft in 1999. The LKR hash is a successor
09:21
of linear hashing, which is used another better algorithm to resolve the expensive rehash to enhance the performance. The creators put a lot of effort to make this implementation portable, flexible, and usable. The application can define its own table-related
09:46
functions to adapt to multiple products across Microsoft. It's interesting to note that part of the creators are also the IIS developers, which should be the reason why we find lots of
10:04
this implementation in IIS. Now we have a complete understanding of the hash table. It's time to go deep into our research. In this section, we will show the IIS internals
10:22
and define the scope we really cared about. We will first explain our mindset and idea to uncover bugs. We are mainly focusing on the hash table implementation and its usage. Since most of the
10:41
hash tables are used in IIS are cache-related, we also study and look into its mechanism. Because Microsoft has designed its own hash table without open source, the implementation should be an explored area to discover bugs. So in this context, we hunt not only for memory
11:09
corruption but also logic bugs. Here I would like to highlight the bug CVE-2006-3017.
11:21
It's a logic bug in the unset function of PHP. The hash table implementation didn't distinguish the key type of an array so that an attacker can delete arbitrary elements with an index number.
11:40
When I saw this bug, it amazed me and I believe this is just the bug I'm looking for. Of course, we also look for the algorithm complexity problems such as the hash folding attack. However, since it's already a 20-year-old attack, most of the lower hanging fruit
12:04
should be disappeared. So in this context, we put more effort into the implementation. As for the usage, we mentioned that the LKR hash is decided to be a portable,
12:22
flexible, and easy to use algorithm that can be applied to any situation. The convention requires the application to configure its own table-related function during initialization. This kind of extensibility gives us more opportunities
12:44
to uncover bugs. When I saw this, I was curious that is the particular function good? Is the key calculation good? Or how does the function select the correct record
13:01
when the collision happens? These types of questions all interested me. So in this context, we care more about the relationship between the record, the key, and the function.
13:23
When an HTTP request is coming, the kernel, HTTP.sys, will first handle the connection and dispatch the request. The kernel checks if there is an active IIS worker. If not, it knocks the IIS
13:44
service to spawn a new process. The WAS-Windows activation service first reads the configuration to note several specifications for the process initialization and spawn the worker process.
14:08
Several IIS components are loaded here. The worker then loads the modules, which are defined in the configuration.
14:21
IIS by default loads several modules for different purposes, such as the compression, redirection, authentication, and cache. Among all modules, the following four are the global cache provider to manage the cache between
14:41
modules. Each module is responsible for handling a different cache, such as the cache for static file, the cache for configuration, and the cache for Windows token. Once the modules are loaded, the worker enters the request lifecycle. The lifecycle is an event loop
15:05
that processes all the notifications from the request. There are several events in a lifecycle. The module can subscribe to the request, and the event loop can access the request from the request.
15:20
There are several events in a lifecycle. A module can subscribe to its interesting events to complete its logic. There are also global events that a module can subscribe to. For example, the cache provider subscribes to the event cache cleanup and the cache operation
15:47
to manage the cache between modules. So our research is not only focusing on the cache between the request lifecycle,
16:00
but also the global cache providers, especially those four, are our golden targets. The following section is about our research and the bugs. In this section, we will first elaborate on our idea and the direction we are trying to find bugs. We will attach
16:22
three different types of vulnerabilities to prove our idea works. Our bugs include a hash folding DOS, a cache poisoning, and authentication bypass. We will also talk about the exploitation
16:40
of how we maximize the severity to make this bugs impactful. Of course, a real-world authentication bypass on Microsoft Exchange Server will be detailed and demonstrated. Okay, the first case we would like to share is the hash-folding-DOS.
17:06
First, we would like to give you a spoiler that all hash tables implemented by Microsoft are affected by hash-folding attack. The tree hash table is vulnerable by default,
17:21
and the LKR hash is vulnerable only if a pulled hash function is configured. However, vulnerable is not equal to exploitable. Besides finding the
17:42
entry point, we still have several obstacles to overcome. We find that URI cache module seems to be a good target. The purpose of this module is to cache configuration for URLs.
18:03
Because grid configuration is costly, so cache is the best strategy. In this module, every HTTP access triggers the cache operation, and the most important thing is that the module uses tree hash table as its cache storage.
18:28
Well, an accessible by default tree hash table sounds good to us. This is the time chart of every thousand new records in tree hash table. Because it's not
18:45
open-sourced, we must reverse the structure and code the table dynamically to get the data set. The line in blue is the type of inserting random records, and the line in orange is the records
19:01
with collisions. You should be able to figure out there's a huge difference between the two lines. However, the jitters along 35,000 and 75,000 are weird. What's the jitter?
19:23
So radically, the line screws in orange should be in linear time. So what actually makes the time increase unsmoothly? The answer is rehash. This is the part of the insert operation.
19:45
The implementation first travels the link list to ensure no duplicates and do its internal job. However, after the internal operation, there is a code to the function rehash table if needed.
20:10
The function checks the number of records and rehash the table if the number is greater than two times of the threshold. The threshold is selected from a list of prime numbers,
20:26
and once rehashed, the function enlarges the table by the next prime number and remaps all records to their corresponding bucket. This is a huge and expensive operation.
20:43
And that's the reason why the chart increased unsmoothly. To exploit that, there are still several constraints to be solved. For example,
21:01
how much of the cache key we can control or how to construct the collision panel. There are several elements in a cache key, and the only element we can control is the URL pass, which reflected at the end of the cache key. The implementation will first convert all the
21:26
text to our cache and send the key to the hash function. The hash function simply multiplies each byte of the cache key by 101 and sums them up.
21:47
The function also uses an LCG to scramble the result. However, is this hash function good? This is the answer from Eleg and Zary in their awesome talk at 28C3.
22:11
This multiply-then-add method is a variant of the DZD hash. And this algorithm has been proven that it is collateable by equivalent substrings.
22:29
When two keys have the same hash, such as the PS and Q2, they must be equal no matter what you add before or after them.
22:45
For example, after you add the letter A, the hash of PSA and the Q2A are still equal. By knowing this feature, it's not difficult to understand that two equivalent substrings
23:07
can be combined to form different keys, such as the PS PS, PS Q2, Q2 PS, and Q2 Q2. They all share the same hash.
23:22
Since you can repeat the string many times, constructing the payload is easy. You just need to find a few strings with the same hash and arrange them in combinations.
23:42
So all of these share the same hash value. This is handy for our exploration. This is handy for our exploitation. However, there is a fatal flaw in this attack.
24:01
That is too weak. To get a noticeable slowdown, you must have sent about 35,000 requests at least. For our one request at one record exploit, it's too slow.
24:20
And the hospital ridiculous is that there is a catch scavenger to recycle records regularly. The scavenger is a threat to delete unused records every 30 seconds. Those two obstacles make our attack not so practical.
24:43
To overcome this, we have to dive into the implementation more. During our investigation, we find an interesting behavior that can rescue our attack. The implementation will scan the key
25:03
recursively. It trades all subdirectories as new records and adds all of them to the table. For example, in our imagination, the URL will only trigger one search and one insert operation.
25:27
However, each subdirectory will also be traded as a new record so that the URL will trigger multiple insert operations. What's more interesting is that all new keys will be packed with a newer
25:46
record. The scavenger is nice to know more records and only deletes them one by one instead of killing all of them in a batch. So the only question left is how to construct a collision
26:08
in this new context. Because all subdirectories are traded as independent keys, we have to construct a payload that satisfies all the following equations.
26:26
It may be a little hard to imagine how to make this happen through our previous equivalent feature. But if we make all the results to be zero, then things count much easier. Zero multiplies anything
26:49
zero, and zero plus zero is also zero. So we can satisfy all the equations if we make the hash of each subdirectory to zero.
27:06
So we can prepare a set of zero hashes and amplify the attack 10 times at least by a slight modification. The result is that we can make a server unresponsive with about 13 connections per second.
27:28
This is not an unreasonable number because IIS can handle sources of connections easily and concurrently. Because this bug affects the windows by default,
27:44
we also award a good amount of bounty. Let's check out the demo. We first double check and ensure the server has 8 cores and 32 gigabytes run.
28:13
The window in the lower left is the script that monitors the status of the IIS.
28:21
It's a while loop and checks the server every second. Okay, we first run the random test. We send random test to the IIS.
28:45
You can see the loading is less than 10 percent. Okay, we now run the collision mode.
29:03
As you can see, the CPU is not under high loading. Okay, 100 percent CPU load.
29:35
Okay, as you can see, the monitor enters the timeout loop.
30:05
Here, we speed a little bit up. Okay, as you can see, the server is unavailable and can't handle any requests anymore.
30:33
Okay, the second test is a cache poisoning attack.
30:44
There are two types of response cache in IIS. One is the static cache supported by the kernel, such as the picture, the CSS, and the JavaScript resources. The other is dynamic cache, which is usually used to cache responses that rarely change.
31:07
For example, the product information of an online shop or the news announcement of NCNS. The benefit of the dynamic cache is that you can reduce the number of the database access.
31:24
Dynamic cache is handled by the HTTP cache module, and you can configure the cache by the output caching component. To use the cache, you must first set out your own rule.
31:42
The rule can best on several connections, such as the file extension, the query string, or the HTTP header. Here, we set out a rule for the extension of SPX and cache the result best on the ID. So the root cause of the cache poisoning is that the module used a bad parser
32:11
to handle the query string. The same that you have already set out a rule to cache a specific parameter. An inconsistency between the module and the backend may lead IIS to cache the wrong result.
32:29
In the real world, the most common backend is sp.net, and a simple HTTP parameter
32:40
pollution can rule them all. Though the key do be cached in the query string, the module only uses the first occurrence as the cache key. However, the sp.net concatenate all together.
33:03
This inconsistent parser behaviors cause the IIS to cache the wrong response. For example, if a patch just prints out your name best on the ID, the attacker can
33:21
poison the result by repeating the ID on the URL. The IIS only recognize the orange as the cache key. However, the sp.net will concatenate both values and show them all.
33:43
So the result is that the next time the user exits the patch, he got hacked. Okay, our last cache is the authentication bypass.
34:03
Back to our opening, a super secret area which is protected by a super strong password under the design of the IIS. All of these passwords are valid. I guess you might have several questions and be thinking,
34:26
what's the root cause? Or how do I get those passwords? Or suspect this must be an edge case. What kind of the scenario is vulnerable?
34:40
First, logon is an expensive operation. To not reduce the performance, IIS catches all tokens for password-based authentication by default. The implementation uses the LKR hash instead of the tree hash table and configures a scavenger
35:06
to delete unused records every 15 minutes. As we mentioned, LKR hash is highly customized. During initialization, the module defines
35:23
several functions such as the hash function, the logic of how to extract the key from the record, and how to decide which one is the correct record when the collision happened.
35:45
And this is the hash function defined by the token cache module. It simply uses the DJB to hash the username and the password and then makes them together by XOR.
36:02
And this is the function used to decide which one is the correct record when a collision happens. The function first checks whether the logon method of both records are equal or not,
36:21
and then compare their username and compare the username again. You may be wondering, why did this function compare the username twice? I guess the original intent of IIS was to compare the password. However, the developer
36:45
copied and passed the code but forgot to replace the name to the password. It's a big fail. The failure creates an inconsistency between the hash calculation and the cache comparison.
37:07
The hash calculation involves both the username and the password. However, when a collision occurs, the table only compares the username to get the correct entry.
37:24
Since we can adjust the field of the password to change the hash, if the hash hits a record that is already in the table, the module will return that to us the directory. So the result is that you can reuse another user's login token with random passwords.
37:49
However, there are still a few prerequisites. First, each password attempt only has the success rate of 4.2 billion because the hash is a 32-bit integer.
38:06
The other prerequisite is that there must be a successful login before our attack. But once the login is done and the token is done, the user will be able to access the
38:23
But once the login is done and the token is cached in the memory, you have unlimited attempts during the 15-minute time window. Both of the above prerequisites make this bug unusable.
38:42
It's just like putting the lottery in the space of the 32-bit integer. So to make this bug a more severe vulnerability, we have developed several ways to win the lottery.
39:03
The first enhancement is to increase the probability of the collision. A vulnerability that requires user interaction is too low. So for the second, we must find a way to exploit without user interaction.
39:26
The last is to defeat the restriction of the 15-minute time window. We figure out a way to cache the token forever. The first is to increase the probability.
39:44
As we mentioned that the 4 billion taskbar records are a ridiculous number. The LKR hash even uses an LCG to make the result more random. But just because of this LCG, we can lower the key space because the LCG
40:05
is not one-to-one mapping under the key space of 32-bit integer. So there must be results that will never appear. So we can pre-compute a dictionary that excludes the password whose hash is not in the results.
40:28
This development can reduce a number of the key space. Although there are still billion taskbar records, we have increased our success rate by 13%
40:42
at least. The second enhancement is to regain the initiative. There is a feature called Connect Earth that is usually used in the virtual hosting because the vendor has to separate the IIS process for their customers.
41:04
With this feature, each IIS process can be under a different user context. Under the feature, IIS will autolog on the user you specify while spawning the new process.
41:22
This token will also be cached. That means we can reuse the customer's identity and no longer to wait for user interaction. In this case, we have regained the initiative.
41:42
To prove it works, we did a test in our lab environment. Windows Server can handle about 1,800 login attacks per second. Because every attack costs nothing, you can run this all day and the success rate is about 4.2%.
42:08
This is already higher than the SSR in your Gacha games. And you can also run it for longer, 5 days for 20%, 12 days for 50% and the success rate
42:25
of running for 24 days can get 100%. We have reproduced in our lab environment and can get the password in about 5 days.
42:45
The last enhancement is to defeat the time window. Our idea is simple. In modern software architecture, it's common to see the pattern of a background daemon monitors the system's health.
43:00
Our server cron jobs access the internal APIs regularly. So we can assume a situation that a credential is attached with the cron job and the gap between each access is less than 15 minutes.
43:22
In these assumptions, the token will be cached forever. Okay, I know it's very ideal. Is there any real case? Sure, let's talk about the Exchange Server again.
43:44
There is a service called Active Monitoring, which is enabled by default and responsible responsible for monitoring all services. It tracks the OWA and active sync service every 10 minutes with a test credential.
44:06
Thanks for the health check, the token of the credential will be cached forever. So you can try as many as you like until you get a successful login.
44:22
And the password is also usable to log into the OWA. The account for the health check has its own mailbox too. This is useful for further exploitations such as phishing or chaining another post-auth RCE together. That's our last case.
44:54
Let's talk about the mitigation and future works.
45:00
For the design of the hash table, it's recommended to use the pseudorandom function such as the SIP hash or highway hash that can reduce the collision and make the attacker more custom. For the design of the cache, inconsistency is still the key.
45:23
Just like our cache today, the cache poisoning due to the different password behaviors and the authentication bypass due to the unpaid password. Since the hash floating attack is the by-decide attack, there are several workarounds to mitigate
45:44
the problem such as the input size limitation or a state to randomize the hash. Each solution has its limitation so the last recommendation is to learn the attacker and defense from the history.
46:06
There are still several cool future works in my mind such as using Tyler's timing attack to reduce the key space. However, since I am lazy, welcome to pick them up if you are interested.
46:25
This is the end of my presentation. If you have any questions, here is my contact information. Thank you again for being here. Thanks.