Analysing QUIC and HTTP/3 traffic with qlog and qvis
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 637 | |
Autor | ||
Lizenz | CC-Namensnennung 2.0 Belgien: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/52829 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
00:00
UDP <Protokoll>TLSKommunikationsprotokollWort <Informatik>Zentrische StreckungProtokoll <Datenverarbeitungssystem>BenutzerbeteiligungCASE <Informatik>SoftwareTopologieWellenlehreDifferenteComputeranimationJSONUMLXML
00:30
RechnernetzBildschirmfensterFahne <Mathematik>Konfiguration <Informatik>FehlermeldungÜberlastkontrolleEntscheidungstheorieKomplex <Algebra>Protokoll <Datenverarbeitungssystem>Metropolitan area networkImplementierungInformationChiffrierungZahlenbereichProdukt <Mathematik>MetadatenDatenmissbrauchBitPhysikalisches SystemSpeicherabzugSchlüsselverwaltungVollständigkeitMomentenproblemSprachsyntheseMotion CapturingSensitivitätsanalysePunktAggregatzustandZweiComputeranimation
01:56
EreignishorizontDateiformatImplementierungGesetz <Physik>NummernsystemWeb SiteWarteschlangeEreignishorizontComputeranimation
02:22
EreignishorizontRechenwerkServerClientRechnernetzKryptologieMultiplikationÜberlastkontrolleTypentheorieKontrollstrukturDatenflussEreignishorizontDifferenteWorkstation <Musikinstrument>Mailing-ListeWürfelKommunikationsprotokollHilfesystemVisualisierungInteraktives FernsehenCoxeter-GruppeMultiplikationsoperatorNichtlinearer OperatorUnrundheitPetri-NetzATMCASE <Informatik>Computeranimation
03:04
TLSMultipliziererSequenzdiagrammAtomarität <Informatik>KryptologieClientFolge <Mathematik>Streaming <Kommunikationstechnik>RahmenproblemServerParametersystemVisuelles SystemE-MailServerClientKryptologieTypentheorieNichtlinearer OperatorStreaming <Kommunikationstechnik>RahmenproblemATMMessage-PassingEinfach zusammenhängender RaumSoftwareKeller <Informatik>Folge <Mathematik>E-MailLeistung <Physik>SchlüsselverwaltungArithmetisches MittelAblaufverfolgungMultiplikationsoperatorDigitales ZertifikatBitChiffrierungNormalvektorWeb SiteWarteschlangePunktSystemaufrufDiagrammt-TestMereologieUnrundheitZweiPlastikkarteRuhmasseMAPBestimmtheitsmaßZellularer AutomatWort <Informatik>Computeranimation
07:13
ParametersystemClientStreaming <Kommunikationstechnik>MultipliziererSequenzdiagrammServerNormalvektorKryptologieAtomarität <Informatik>RahmenproblemTypentheorieGamecontrollerBildschirmfensterSchreib-Lese-KopfMultiplikationsoperatorMechanismus-Design-TheorieComputersicherheitEinfach zusammenhängender RaumSichtenkonzeptArithmetisches MittelExogene VariableElektronische PublikationChiffrierungStapeldateiMereologieClientServerSpeicherabzugÜberlastkontrolleFluktuation <Statistik>ParametersystemKontrast <Statistik>AdressraumDatenflussValiditätTopologieWeb SiteDesign by ContractToken-RingSystem FDivergente ReiheGruppenoperationComputeranimation
09:41
UDP <Protokoll>ÜberlastkontrolleKontrollstrukturTLSTermDatenflussÜberlastkontrolleSoftwareKommunikationsprotokollHalbleiterspeicherComputeranimation
10:08
ROM <Informatik>Puffer <Netzplantechnik>PufferüberlaufÜberlastkontrolleKontrollstrukturDatenflussRechnernetzPufferspeicherInverser LimesStreaming <Kommunikationstechnik>RouterBildschirmfensterDatenübertragungZahlenbereichSchwellwertverfahrenEindringerkennungStellenringInformationBildschirmfensterPuffer <Netzplantechnik>SoftwareDatenflussÜberlastkontrollePufferüberlaufMAPInverser LimesInformationEinfach zusammenhängender RaumOverlay-NetzEinfügungsdämpfungGraphfärbungHalbleiterspeicherBitMultiplikationsoperatorDifferenteKartesische KoordinatenGamecontrollerNeuroinformatikPufferspeicherVerschlingungRouterInternetworkingMinkowski-MetrikBitrateExogene VariableStreaming <Kommunikationstechnik>UnrundheitMechanismus-Design-TheorieDiagrammBandmatrixAlgorithmusIndexberechnungBrowserProzess <Informatik>Arithmetisches MittelCASE <Informatik>SpeicherbereichsnetzwerkMomentenproblemPunktNetzbetriebssystemOffice-PaketGewicht <Ausgleichsrechnung>Produkt <Mathematik>SystemaufrufQuaderGreen-FunktionMereologieSoftwarepiraterieClientVersionsverwaltungForcingQuick-SortQuelle <Physik>Ego-ShooterComputeranimationDiagramm
16:42
Inverser LimesKontrollstrukturStreaming <Kommunikationstechnik>ÜberlastkontrolleInformationTypentheorieBildschirmfensterÜberlastkontrollep-BlockGraphThermodynamisches GleichgewichtBildschirmfensterMultiplikationsoperatorUnrundheitVerschlingungAblaufverfolgungServerSoftwareTypentheorieGamecontrollerPuffer <Netzplantechnik>ClientDatenflussAlgorithmusStreaming <Kommunikationstechnik>Fluktuation <Statistik>BrowserBitImplementierungProgrammfehlerPartikelsystemSchätzfunktionGeradeWort <Informatik>BitrateQuaderDiagrammComputeranimation
19:26
ÜberlastkontrolleBildschirmfensterKontrollstrukturInverser LimesStreaming <Kommunikationstechnik>GamecontrollerExpertensystemWort <Informatik>Web SiteCASE <Informatik>DifferenteEinfach zusammenhängender RaumMereologieProxy ServerInstallation <Informatik>Ausreißer <Statistik>Einfache GenauigkeitAblaufverfolgungDatenflussSoftwareÜberlastkontrolleBitComputeranimationDiagramm
20:21
Schreib-Lese-KopfGeradeElektronischer ProgrammführerFolge <Mathematik>Komplex <Algebra>Physikalisches Systemp-BlockClientStreaming <Kommunikationstechnik>GraphZählenRahmenproblemVisualisierungStochastische MatrixTDMAWeb SiteInverser LimesFormation <Mathematik>MereologieDifferenteBandmatrixProxy ServerExogene VariableMultiplikationsoperatorElektronische PublikationUnrundheitMAPDiagrammGraphfärbungBitVisualisierungGeradeBildschirmmaskep-BlockTDMAArithmetische FolgeWeb-SeiteSchedulingRechteckClientStreaming <Kommunikationstechnik>SoftwareEinfach zusammenhängender RaumOrdnung <Mathematik>ServerPhysikalisches SystemSchreib-Lese-KopfZahlenbereichMinimumSpeicherabzugBenutzerbeteiligungTropfenCASE <Informatik>MomentenproblemGanze FunktionAutomatische HandlungsplanungKonfigurationsdatenbankDatenübertragungRobotikEntscheidungstheorieFacebookComputeranimation
24:21
ClientStreaming <Kommunikationstechnik>RahmenproblemZählenGraphÜberlastkontrolleExplosion <Stochastik>TDMAStatistikBitFlächeninhaltTDMAVisualisierungSchedulingServerMinimumFacebookNormalvektorUnrundheitHilfesystemWeb-SeiteFolge <Mathematik>Computeranimation
25:30
Streaming <Kommunikationstechnik>Gewöhnliche DifferentialgleichungSpannweite <Stochastik>ZählenRahmenproblemBrennen <Datenverarbeitung>TDMAInklusion <Mathematik>Regulärer Ausdruck <Textverarbeitung>EinfügungsdämpfungStreaming <Kommunikationstechnik>GeradeMultiplikationsoperatorCASE <Informatik>Elektronische PublikationHyperbelverfahrenKartesische KoordinatenPuffer <Netzplantechnik>Ordnung <Mathematik>Web-SeiteSchreib-Lese-KopfVisualisierungMAPClientMusterspracheResultanteMinimumMetrisches SystemSprachsyntheseTopologieRechenschieberBitExtreme programmingArithmetisches MittelPunktTDMADiagramm
27:50
Metrisches SystemProgrammverifikationÜberlastkontrolleTypentheorieImplementierungCoxeter-GruppeDemo <Programm>Funktion <Mathematik>VariableWeb logKeller <Informatik>Gerichtete MengeAnalog-Digital-UmsetzerGraphische BenutzeroberflächeInformationElektronische PublikationChiffrierungSchlüsselverwaltungTLSEin-AusgabeProtokoll <Datenverarbeitungssystem>QuellcodeHilfesystemCodeEreignishorizontSystemaufrufTrajektorie <Kinematik>CASE <Informatik>Web-SeiteMotion CapturingURLGüte der AnpassungElektronische PublikationImplementierungProtokoll <Datenverarbeitungssystem>DifferenteUmwandlungsenthalpieProgrammierumgebungFunktion <Mathematik>Umsetzung <Informatik>AdditionMAPMetrisches SystemVarietät <Mathematik>BenutzerbeteiligungStandardabweichungKommunikationsprotokollWarteschlangeOrdnung <Mathematik>Web SiteGruppenoperationTopologieSoftwareSprachsyntheseWorkstation <Musikinstrument>Gebäude <Mathematik>Computeranimation
29:54
Element <Gruppentheorie>Computeranimation
Transkript: Englisch(automatisch erzeugt)
00:06
Welcome to this talk about web protocol performance. As you all know, the new QUIC and HTTP3 protocols are coming, and they are expected to be deployed at a massive scale later this year. With this kind of immense undertaking, we of course want the ability to debug and
00:23
analyze the protocol behavior in case some problems might arise. The typical way you would do this, for example TCP, would take to what is called a packet capture, somewhere in the network, containing all the packets that are actually set. We can then analyze these using specialized tools like Wireshark.
00:44
This is still possible for protocols like QUIC, but it's suboptimal for three different reasons. First of all, QUIC, unlike TCP, is almost entirely encrypted. For TCP, you can still watch the packet numbers and deduce things like packet loss,
01:01
even if we are using HTTPS. This is no longer true for QUIC, which encrypts also its own metadata. This means that if we want to analyze QUIC packet captures, we always also need to store the full TLS decryption keys, which can of course be a very big privacy issue when used in actual production systems.
01:23
The second problem is that a lot of the core performance information for these protocols is never actually sent on the wire. It's only kept at the endpoint implementations themselves. So with the traditional method, we would miss this crucial info. The final thing is that QUIC and HTTP3 are actually quite a bit more complex
01:43
than previous protocols, and we would actually need more advanced tools than Wireshark to really drill into their details. Two years ago, we found a solution for the first two problems, and the idea is to lock things inside of the endpoints themselves in the QUIC implementations.
02:03
This means that we can log only the QUIC metadata, so no longer the privacy-sensitive user data, as well as log the internal endpoint state, giving us more debugability. Of course, we want every implementation to log this in the exact same format, which is now called the QLog format.
02:23
QLog is relatively simple. It's basically just a schema for JSON, describing which QUIC events you can log and how exactly they should look like, so it's all nice and machine-readable. To interpret these QLog files, we then built some custom tools ourselves
02:40
in what is now called the QVis tool suite, offering a list of interactive visualizations to help make sense of the protocol behavior. What I'm going to do in the rest of this presentation is I'm going to look at three different QUIC performance features and explain how we can analyze them using QLog and QVis.
03:02
Let's first look at QUIC's handshake. As you might have heard, it's up to one round-trip time faster than TCP because it can combine the transport and cryptographic handshakes into a single operation. It then even has an even more optimized mode called ZeroRTT, where we can also send an HTTP3 request in this first round-trip as well.
03:25
This is possible because of the TLS 1.3 feature called Session Resumption, where in the first connection we already exchanged encryption keys for a second connection onward. This is done in a TLS message called the New Session Ticket, as you can see.
03:42
And a nice tip for people looking to optimize their existing stacks is that this feature can also be used for TCP and HTTP2. So let's explore this in a bit more detail in QVis's sequence diagram, which shows the packets going over the wire from the client to the server and vice versa,
04:03
as well as what exactly is contained within them. We can see that the normal QUIC handshake starts with the client sending an initial packet, which carries some of TLS's cryptographic data. After a round-trip, the server will then reply to this with its own initial,
04:21
as well as what are called handshake packets, which are already encrypted a little bit more and carry things like, for example, the TLS server certificate. After these things, actually, from the server perspective, the handshake is already done, and it goes to what is called this 1RTT encryption level,
04:42
which basically means QUIC is then fully encrypted. These 1RTT packets is what QUIC is then going to use for the rest of the connection to send the actual data over the wire. We can see that after we receive this, the handshake is also done on the client side, and they can start sending the HTTP3 requests.
05:04
Before we get these, however, I want to go back because I think some of you might be confused by, for example, these packets and also these two, because it seems like these acknowledgements are being sent by the client and are received immediately by the server
05:20
before it, for example, sends the second packet. This is, of course, impossible because there is some delay on the network. The reason for this is because this is a client-side trace for the clients, and we don't really know when exactly these packets were received at the server, so we kind of pretend like they were received instantly.
05:42
Now, this is one of the nice things about Qlog and Qviz is that we can actually also load the accompanying server-side trace. When I do that, we can actually accurately combine the two and show the actual roundtable times in this diagram as well. And here you can see indeed that we did have
06:01
these full four packets sent without waiting for these acknowledgements from the client, even though they were indeed sent after just receiving these two first packets from the server. We can see also that the client here starts sending the HTTP3-level requests after receiving this packet from the other side.
06:22
Now, these requests, we can see that there are three of them, simple GET requests. And interestingly, if I click on them, you will see the power of Qlog because it is simply, as we said, JSON. So it's very human readable, meaning that we can easily inspect the details of, for example, the HTTP headers
06:41
associated with this request. As soon as these requests arrive here at the server, we will see the server starts replying to them with indeed what we have said are the one RTT packets in what are so-called stream frames for individual requests and resources.
07:00
And I counted these earlier today, and the server sends back about 16 of these packets before it has to wait again for acknowledgements from the client to come in before it can send more data. These 16 packets is what we're later going to refer to as the connection to congestion window.
07:21
So this is a normal quick handshake. Let's look at what a zero RTT handshake looks like in contrast. For this, I'm gonna switch back to the other view so you don't have to keep your head turned sideways. We can already see quite a big difference because next to the initial packet, we also see a new zero RTT packet here,
07:41
which indeed contains an HTTP3 request. Now, as we said, this is only possible because of TLS 1.3 session resumption, meaning that in the previous connection, we already have discussed new encryption parameters for this new connection, which were communicated in this session ticket.
08:01
This means that this first zero RTT request is fully encrypted and secure, making it possible for the server in its first batch of replies to already start sending some of the response data back. However, you see that we're only sending three packets, where in the previous connection, we were sending 16.
08:20
This is not because we're requesting a very small file here because later on, we're actually sending more. This is because zero RTT is actually limited by very serious security considerations. This means that not all types of requests are eligible for zero RTT. And also what we see here is that zero RTT data
08:40
is limited to three times the amount that we have gotten from the client. So here from the client, we've gotten two packets. And if you see the server is here gonna send back six packets in total to prevent what is called a UDP amplification attack. So we can see that zero RTT is not always available.
09:02
It's only if the connection is resumed. And it's also not always that powerful because these three packets only contain about five kilobytes of data. Luckily, there are other mechanisms in place that allow you to send more data. For example, using address validation token, like what is communicated here
09:21
for use in the next connection. But the core takeaway here is that zero RTT is going to be a fluctuating feature that's not always going to be there and not always going to behave in the exact same way. So not always give you the same performance benefits as you might like. The second part is about flow and congestion control.
09:41
And as you might have heard, QUIC is built on top of the UDP protocol, which is often accepted as being faster than TCP. However, this does not automatically make QUIC much faster than TCP as well. In fact, a lot of the features making TCP slower than UDP are also in QUIC.
10:01
Things like reliability and retransmission and also flow and congestion control. These last two concepts are in some terms similar, but also quite different. They're similar in that they both have the goal of preventing memory buffers from overflowing. These buffers typically hold network packets
10:22
and if they become too full, this means that new packets cannot enter the buffer and instead they are dropped leading to packet loss, which is something we really want to prevent. The main difference between the two is where these buffers are that they try to control. For example, flow control controls the buffers
10:42
at the end points themselves. For example, inside of the operating system, you might have a receive buffer storing packets coming in. This buffer is then read by the browser, but this can take a while because the browser is not always the only application running on the computer.
11:01
And we can see that this buffer will, the empty room in the buffer will grow and shrink depending on how fast packets are coming in and how fast they are being read by the application. And the empty space left in this buffer is what is called the TCP receive window.
11:21
And this value is in TCP communicated with every TCP packet back to the other side. So if you're a sender in TCP and you get a value of zero for the receive window, you know that the buffer at the receiver is actually completely full and you have to stop sending for a while to allow them to catch up.
11:42
This becomes even more complex with QUIC because QUIC has three different limits. The connection level limit like TCP, but also the limit for each individual request and response that is active at the same time, which QUIC calls streams, as well as a limit on the amount of streams
12:00
that can be active at the same time. So this is flow control, which is communicated over the network. The other thing, congestion control, deals with buffers not at the end points, but with buffers inside of the network itself. For example, here we have two senders sending at a very high rate
12:20
because their local links can take it. However, both of them have to pass through what is called a bottleneck router. As you can see, this one cannot keep up with this very high amount of traffic. These routers typically do have a little bit of buffer to catch some burstiness in network traffic, but this of course cannot cope
12:40
with this huge amount of traffic. What we much prefer is that these senders actually send at a much lower rate so that we can actually manage this bottleneck in a better way. The problem is that we never know how much memory is available on these routers, nor how fast these intermediate links actually are.
13:02
This is never explicitly communicated on the internet. We have no chance but to try and figure out how much bandwidth we have to use. This is what is done using a congestion control algorithm. It's relatively simple in concept in that we start to send very slowly.
13:21
We only send, for example, 10 packets at the start, or 16 as we've seen in the previous example. This limit of amount of packets is what is called the congestion window. Of all these packets become acknowledged by the other side, meaning that there was no packet loss, that means we can grow our congestion window
13:42
and send more packets in the next flight. We keep on doubling this to, for example, in this case 160 packets until we see our first packet loss, which is a very clear indication that we are overloading the network and we have to back off, lowering the congestion window, sending fewer packets at its side.
14:03
Crucially, unlike flow control, this congestion window is never actually communicated to the other side. This is a purely local process that is only being done at the sender. This means that if you use only packet captures to debug QUIC, we would miss this crucial information,
14:22
while with things like Q-Log, we can actually log this as well, and as we will see, analyze it. So flow and congestion control are two very crucial mechanisms for both TCP and QUIC that can influence performance. Now let's see what that looks like in Qviz's congestion diagram.
14:41
The first thing you will see here is these blue blocks, where each of them represents one of the packets that we sent on the network. And we also see these corresponding green blocks, which are the acknowledgements for those packets, coming in about one round trip time after they were sent. There's a third color here in the red, and red indicates that the corresponding packet
15:02
on the left side was actually declared lost because it was not acknowledged, for example, at this time. So this is already interesting to see how things evolve. It becomes better if we overlay this with what we've just discussed, which are the flow control and congestion control information. So here on top, we will see in the dark red
15:22
and the pink, these are the two flow control limits, most important in QUIC. And we can see here that as long as these stay atop or a long way above the blue, this means that we actually have enough space, enough room within the flow control to actually keep on sending data.
15:43
Here on the bottom, what we see is two things, is a purple is congestion control, and sometimes here, the yellow, these are the bytes and flights. And as we've said, the congestion window is kind of like an upper limit to the bytes and flight. And so ideally, these two are all always
16:02
almost exactly at the same place, which we can see almost always happens here. Unless of course, when we have had a packet loss here, where the congestion window drops as expected, it takes a little while for the bytes and flight to come down as well, because we are receiving more and more acknowledgements. This causes us to have a gap in the sending force,
16:23
which is again, intentional. This gives the network a little bit of time to recover for the buffers to drain. And so after that, we can start slowly increasing our send rate again. Another thing we can see here at the start is this slow start, which I mentioned is that we're gonna start sending
16:40
very, very fast at the beginning, until we encounter here a big block of packet loss, after which of course, the congestion window is going to go down and we can move towards kind of an equilibrium. The final thing this graph shows you if you scroll down is actually the round trip times that were measured by the sender.
17:01
And strangely, we can see that the round trip times, they seem not to be constant, actually fluctuate along with our congestion window. And this is logical if you think that the faster you're sending, the faster you're also filling up these buffers in the network. And the more filled these buffers are, the longer the packets are going to stay in there
17:21
before they are put on the next link. And so it seems like the round trip time is going up as we're sending more and more. This is one of the reasons why more advanced congestion control algorithms don't just look at packet loss, but also incorporate these RTT fluctuations
17:40
to try and estimate how congested the network actually is. So this is a kind of a normal trace for what is called the new Reno Congest Control Algorithm. This is kind of what you expect or like to see for a normal situation. Of course, also have some problematic stuff like what you can see here.
18:00
On the right side, this is again normal, but here we can see that something clearly has been going wrong because our congestion window here, the purple is more than high enough. For some reason, our bytes and fly dropped to zero for about 200 milliseconds. And the culprit isn't far to be found. This is of course here, the pink which is the stream flow control,
18:22
which stayed constant for a long, long time until here it finally, we started getting updates from the client and we can again start sending our data to them. This turned out to be due to a bug at the client where it waited way too long to start sending flow control updates.
18:40
You can see this can indeed really impact your performance, slowing down how fast the server is sending you things. And this was luckily only at the start. We've also found some traces like for example, here in a very early implementation of the Firefox browser's quick stack, where we have an almost constant situation
19:00
where the bytes and fly to never really reach our allowed congestion window. And this is again because flow control here, it is updating, it is updating at a good rate, but it isn't giving enough additional allowance in the buffer. And so we always get the same, just a little bit of extra data from each update,
19:22
not enough to actually catch up with what we can do from the network. This is how we can see that flow control can indeed severely impact performance. And now we can use this tool to estimate if there was actually a problem with flow or congestion control in a certain trace. Now this is not something that I expect
19:41
most of you will ever really need or will use in normal situations. I think this is mostly gonna be interesting for the outliers. If you have a 99 percentile situation where suddenly the website is loading very, very slowly, that can definitely be due to a weird network
20:00
or a strange congestion control edge case. And you might verify that that is indeed what is happening with this kind of trace and this kind of tool. So this is a bit for experts, but definitely something you might fall back to in case of a very strange behavior that you're seeing.
20:22
This last part is about bandwidth sharing between different resources. This is needed because HTTP3 only uses a single underlying quick connection. This means if we still want to download multiple files at the same time, we are somehow gonna have to decide how we're gonna order them on the network.
20:41
One simple way of doing this is sequentially where you just send each file one by one back to back. You can also imagine cutting them up into smaller chunks and using a round robin scheduling algorithm to interleave chunks on the wire. There are many different ways of doing this and on the web this is driven
21:00
by what is called a prioritization system. And we don't really have time to go into too much detail here, but if you want to know more, I did a full talk on this at last year's FOSDEM, which you can find at the bottom of the slide. Another related concept to this is head of line blocking. This occurs if you're sending a file and just one of the packets gets lost.
21:22
In this case, we have to wait for the retransmission of packet number two to come in before we can process three, four and five, because otherwise there would of course be a gap in this file's data. Now conceptually, if we use a round robin's catcher, we can lessen this head of line blocking from occurring because here packet two will only block packet number five
21:43
because that's the only one of the same stream. This is what is often said to be a core difference between TCP and QUIC, because TCP will always block on all lost packets no matter which stream they belong to, while QUIC is much smarter as we can see on the bottom.
22:00
However, this is only theoretically true because it depends on the used multiplexer. As we can see on top, if QUIC would be using a sequential scheduler sending a single file at a time, then of course we'll still also have this form of head of line blocking. And as we'll soon see, even when using the round robin's scheduler, we can still have head of line blocking even in QUIC.
22:25
Now let's see what this looks like in Qviz's multiplexing diagram. Remember that our goal is to get an idea of how the server actually sends back multiple responses at the same time. So in this test, we request 10 different files from the client indicated by the 10 different colors
22:42
in the visualization here. And to deduce how the multiplexing is being done, we have two separate visualizations. The first one is a waterfall showing when the request was sent. And also when we have the first and the last bits of data for each of the files. And from this, we can actually deduce
23:01
that most of the resources are actually being sent sequentially by the server, while some like here and here at the end seem to be done in a more round robin fashion. But to deduce which of these is actually happening, we need the second visualization at the bottom, which at first glance might seem like we're just plotting big color rectangles.
23:21
But if we zoom in, you will see that what we're actually doing is having small rectangles, one for each of the quick packets that was sent. And we're just coloring each of these packets, depending on the files data that was carried. Doing this, we get a very fast high level overview
23:40
of what's actually happening. So here, for example, we see that purple and orange weren't actually being multiplexed. It was just that a little bit of data for each was postponed for some reason and only sent during the red stream. This is very different from what we see here at the end, where if we zoom in, we do see that the server there
24:01
switches the active stream for each packet in turn. So this is a very fine-grained round robin approach, which for example, might be interesting if you're sending progressive JPEGs or other incremental resources. Now this is kind of a normal situation, what you would expect for let's say a relatively simple webpage.
24:23
Let's look at a bit more complex situation, which we found for example, at one of the Facebook servers. What you can see here is that here for all of the streams, it actually chooses to use the round robin multiplexer indicated by the nice rainbow-like glow
24:40
that we have due to the zoomed out view. However, we also see a couple of areas like here and here at the end, where it seems to do sequential instead. And this is explained by looking at the black bars here beneath, which indicate when data was actually retransmitted after packet loss. So we see that the Facebook server here
25:01
uses round robin for normal data, but switches to a sequential scheduler for retransmitted data for all kinds of boring technical reasons. We can see that using this visualization then indeed helps us deduce how this multiplexing and prioritization is done on the wire. And then we can compare this
25:20
with what we would have expected from what we would want to happen on our webpage. The final thing that this visualization allows that we haven't shown yet is shown here at the bottom where we have a third visualization. And this helps us deducing the head of line blocking that I just explained. What we have here is on the y-axis
25:41
is the bytes for this individual file going from zero to the one megabyte that the file is large. And here we can see what we want is actually for the earliest data, so the lower byte counts to arrive first early on in the timeline and the later one, of course, arriving at the end.
26:00
This is what you see here happening at the start. This is what you want, a nice slanted line with everything being delivered in order. Of course, here we see that something else was going on where we have a few of these early packets around 300 kilobytes that suddenly get delivered way late in the timeline.
26:21
We can see here, of course, that this is because these packets were lost. They were actually sent here, lost, and then retransmitted here, as we can see in the black. This means that while this data was indeed received at the client, we can't actually use it because we need to wait for the retransmission of these few tiny packets.
26:42
So only when these arrive can we start processing this entire buffer in one big go. That is what is signified by this big vertical black line. So you can see that the more vertical black lines you have, the more head of line blocking you have even in QUIC. This is kind of expected because individual files, of course,
27:02
need to be delivered in order. But if we start clicking on these other streams, we see that they all show very similar patterns as well. And this is because here, if we zoom in where the loss actually happened, indicated by the purple on the bottom here, we see that loss is often very bursty. It doesn't impact just one or two packets,
27:22
it impacts a lot of packets at the same time. If you're doing this kind of multiplexing, then you are starting to introduce packet loss on all of these streams at the same time, causing head of line blocking on all of them as well. This means that QUIC might not be as performant due to this feature as you might have sometimes have heard,
27:43
especially not for web page loading use cases. And with that, we've come to the conclusion. And I hope I've convinced you that QUIC performance is indeed quite a bit nuanced. Meaning that if you want to test the performance of your own web page on HTTP3 versus HTTP2,
28:01
it's probably not going to be enough to use existing tools or look at high level web performance metrics. These might tell you that it's faster or slower, but not why, which can be due to a variety of different reasons as we've discussed. This is especially going to be the case for the next couple of months
28:21
because the prioritization feature, which drives how resources are multiplexed, is actually a relatively new addition to the HTTP3 specification. And very few implementations have proper support for this at this time. This means that HTTP3 is probably going to be slower than it can be for at least a few more months to come.
28:44
So if you do look into HTTP3 performance, please dig deeper, use Qlog and Qvis to verify what is actually going on. You might wonder where I can get these fancy Qlog files. Well, most QUIC implementations actually have native Qlog support.
29:02
Usually all you have to do is set the Qlog their environment variable to get them as output. And even in cases where this is not true, Qvis has some converters on board for, for example, Chromium's netlog format, as well as decrypted packet capture files.
29:20
Now to end with a call to action, Qlog is really only just getting started in its standardization trajectory at the IETF. We're also looking to expand it to other protocols besides QUIC and HTTP3. Similarly for Qvis, we're also always looking for people who have ideas about making new protocol debugging tools.
29:43
So if you're interested, please let us know at the below URL. And with that, all that is left for me to say is good afternoon, good evening and good night.