Introducing Vegvisir: An automation framework for testing QUIC application logic
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 542 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/61985 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Student's t-testCommunications protocolWeb browserFacebookGoogolStudent's t-testGoodness of fitAutomationProjective planeBitOSI-ModellMultimediaUniverse (mathematics)Cartesian coordinate systemStatistical hypothesis testingSoftware frameworkWeb browserOSI modelSoftwareServer (computing)Client (computing)Different (Kate Ryan album)VideoconferencingWebsiteComputer animation
01:12
Communications protocolModal logicComputer wormDefault (computer science)EncryptionInformationSpacetimeUDP <Protokoll>Thresholding (image processing)InternetworkingDefault (computer science)ImplementationDifferent (Kate Ryan album)Communications protocolCartesian coordinate systemSpacetimeStatistical hypothesis testingLibrary (computing)Computer programmingSoftware developerRevision controlComputer animationLecture/Conference
02:59
Server (computing)Web browserEmailGraphical user interfaceComputer networkCommunications protocolPoint (geometry)UDP <Protokoll>Revision controlIPComputer wormInformationLengthSource codeDefault (computer science)MIDIFrame problemVariable (mathematics)Integrated development environmentTransport Layer SecurityStreaming mediaEvent horizonVisualization (computer graphics)Local ringPrime idealControl flowClient (computing)ImplementationCharacteristic polynomialDifferent (Kate Ryan album)EncryptionPublic key certificateCubic graphStatistical hypothesis testingCountingBootingWebsiteServer (computing)Web browserVisualization (computer graphics)Single-precision floating-point formatGame controllerStatistical hypothesis testingDataflowImplementationÜberlastkontrolleMultiplicationComputer hardwareComputer fileInteractive televisionCartesian coordinate systemScripting languageConfiguration spaceData structureClient (computing)Software developerDifferent (Kate Ryan album)HypermediaComputer programmingCuboidMultiplication signPoint (geometry)Query languageEncryptionFront and back endsCASE <Informatik>Function (mathematics)Functional (mathematics)Integrated development environmentTransport Layer SecurityStandard deviationComputer architectureCommon Language InfrastructureProjective planeMathematicsWeb 2.0Student's t-testArithmetic meanForcing (mathematics)Procedural programmingParameter (computer programming)Connectivity (graph theory)Computer configurationGraphical user interfaceMereologyRight angleEmailSoftwareComputer animation
10:39
Parameter (computer programming)Client (computing)Server (computing)Configuration spaceImplementationComputer-generated imageryLocal ringCommon Language InfrastructureGraphical user interfacePhysical systemComputing platformUniform resource locatorError messagePublic key certificateGoogle ChromeDefault (computer science)Cellular automatonStatistical hypothesis testingIntegrated development environmentIterationClient (computing)Physical systemSoftware frameworkImplementationLocal ringMedical imagingWeb 2.0CASE <Informatik>Statistical hypothesis testingTotal S.A.Shape (magazine)Parameter (computer programming)Configuration spaceStandard deviationMechanism designScripting languageGreatest elementMultiplicationGoogle ChromeRevision controlBitLatent heatVariable (mathematics)Different (Kate Ryan album)Web browserGraphical user interfaceTouchscreenRepository (publishing)Arithmetic meanServer (computing)Combinational logicHeegaard splitting2 (number)Type theoryCommon Language InfrastructureRight angleElectronic mailing listSoftwareFunction (mathematics)Integrated development environmentMetric systemComputer fileMultiplication signInformation securityComputer animation
18:04
Cellular automatonInsertion lossTerm (mathematics)Statistical hypothesis testingSoftware frameworkMultiplicationFunction (mathematics)CASE <Informatik>Configuration spaceComputer fileBitMultiplication signComputer animation
18:58
Computer data loggingClient (computing)Function (mathematics)Software frameworkFunction (mathematics)Integrated development environmentNetwork topologyEncryptionImplementationKey (cryptography)Client (computing)Computer fileServer (computing)Computer data loggingDirectory serviceVolume (thermodynamics)
20:04
Integrated development environmentComputer programStatistical hypothesis testingAbstractionWeb browserNetwork topologySoftware frameworkInterface (computing)Open sourceVideoconferencingStatistical hypothesis testingComputer programmingFunction (mathematics)Computer fileCASE <Informatik>Streaming mediaMultiplicationRight angleHookingDirection (geometry)Point (geometry)Statistical hypothesis testingCodeResultantArithmetic meanScripting languageMultiplication signCommon Language InfrastructureFrame problemGraph (mathematics)Computer data loggingMusical ensembleMatrix (mathematics)Goodness of fitIntegrated development environmentPhysical systemTrailSet (mathematics)Projective planeComputer animation
25:12
Program flowchart
Transcript: English(auto-generated)
00:05
All right, so good morning everybody, my name is Joris and I'm a PhD student over at Hasselt University here in Belgium and I'm doing a PhD on multimedia streaming and network transport layer protocols, or even better, the intersection of those two.
00:21
Today I'm here to talk a little bit about a project we did called VEGVISIG, which is an automated testing framework for orchestrating client and server setups using the QUIC transport layer protocols. Before we jump into that, maybe let's talk a little bit about what QUIC actually is, because I assume not everybody has heard about it.
00:41
QUIC is a general purpose transport layer protocol that was standardized by the ITF in May 2021. If you have any updated applications on your phone or have been using the latest releases of browsers such as Firefox, Chrome, whatever you're using, you've probably been already using QUIC as it has been deployed to a lot of many different applications and websites already.
01:03
For example, Facebook, Instagram, they are using it. If you're streaming videos over YouTube, you have probably been already using QUIC. QUIC is a name, it's not an acronym. It used to stand for QUIC UDP internet connections, but it has actually not been called that for quite some time already, no. Right, some of its features,
01:23
the biggest feature is encryption, so the protocol actually encrypts everything by default. Which is great because that's the main driver against ossification. Also, it's reason that it was created because TCP is actually an ossified protocol when we compare the two. It's also great for preventing third parties from actually interfering with the data you are transmitting over the network.
01:43
It's less great for research and development as you have to actually account for that in your test setups, which we are going to talk about a little bit further ahead. It's currently most implementations implemented on top of UDP in user space. Some implementations are actually looking at implementing it more towards the kernel,
02:00
but those steps have not been taken by many of the implementations. At present, at least as far as I know, 25 implementations exist, most of them also being open source, written in multiple programming languages. They also provide libraries which you can directly use to actually use QUIC in your applications.
02:22
Another benefit of QUIC, or another new thing with QUIC, is HTTP3, which you might have heard of. The reason for the introduction of H3 is that H1 and H2 actually only run over TCP. That's why they created a new version of HTTP called HTTP3. There's not that big of a difference between H2 and H3 in practice, but just for the sake of naming it, it's HTTP3.
02:46
Right, so now that you know what QUIC is, that's at least a requirement for understanding this talk, let's talk about how we can actually use this in experiments and stuff. Maybe let's try doing something very simple.
03:02
I just told you that most browsers implement QUIC. Maybe let's try connecting to a website that only implements an HTTP3 server. It should be simple, right? As you can see on the screenshot, it is not in practice. The reason for that is really simple. That's because browsers decided early on that HTTP3 servers should be discoverable through the old SVC header provided by an H1 or H2 deployment,
03:28
which really sucks if you want to do some automated testing, because that means you also have to put up or spin up an H1 or H2 server and actually account for this. Luckily, we have some options like, for example, within Chrome and Firefox to enable or force QUIC on certain domains,
03:44
which we can automate by means of, for example, parameters supplied in the command line or by configurations in the browser itself. Right, so we can connect to a web server at this point. How do we actually know what's happening? Under the hood should also be simple, right?
04:02
Remember, everything is encrypted, so actually seeing what's happening is not that trivial, actually. Luckily, most implementations nowadays use standard off-the-shelf TLS libraries, and these TLS backends actually support an environment variable called SSL keylog file.
04:21
The idea behind the SSL keylog file is that you can point it towards a file, which then gets used by these TLS backends to output all the secrets used for encryption during whatever the application is actually doing. If you load those SSL keylog files into programs like Wireshark, you can actually decrypt the traffic, which is nice if you want to see what happened.
04:42
Unfortunately, tools like Wireshark, at least as far as I know, don't actually have any visualizations about stuff that's happening at the congestion or flow control layer. You have that for TCP, but those things don't exist yet. Luckily, we have other stuff for that. Qlog is one of them.
05:00
There has been a nice talk about this by its inventor a couple of years ago at FOSDEM. I really invite you to look at it. But basically, in a nutshell, what Qlog is is a structured way of logging, and a unified way of logging that can be implemented by any endpoint implementation using WIC. In a nutshell, this is basically a file, for example a JSON file, that just logs everything that's happening.
05:26
If you have some scripts or tools that can parse this, you can actually do a lot of fun stuff with it. For example, the Qvist visualization tool, which is also by the same creator, is a tool that allows you to load these files and actually visualize, similar to what Wireshark can do for TCP,
05:42
but then for Qwik, what's happening on the congestion layers. For example, on the left you see a flow control and congestion flow graph, and on the right you see a port of the home trip time that was experienced by the application. Right, so we can look at what's happening under the hood. Maybe let's try something more advanced, setting up your own Qwik client and Qwik server to do some local experiments,
06:05
maybe change something to the implementations, it doesn't matter really what you want to do. Even that is not that trivial, simply because there are many implementations written in many languages, meaning that they have their own requirements, their own installation procedures.
06:20
Another distinction is that different implementations actually have different performance characteristics, meaning that some are more tuned towards certain scenarios, some only support a certain feature set, so you also have to account for that. An additional requirement is that you also have to set up self-signed certificates, and for some reason some implementations accept all kinds of certificates,
06:44
and then for some reason others fail, we have never really figured out why, we just use a common way that works for them all now. Another query that you can experience is within the codebases themselves. A fun one I always show to my students is this one, this is from the Qwik Go codebase,
07:02
which uses the Qubik congestion control algorithm, if you know something about congestion control. And it makes sense, the file is called new-Qubik-Sander, the function is called new-Qubik-Sander, it even specifies in the documentation that it makes a new-Qubik-Sander, unless you actually put the Reno bool on true, then it behaves like a totally different beast,
07:23
actually new-Reno in that case. So some weird queries that you actually have to account for too. So the point I'm trying to make is that there are a lot of different implementations, testing them all takes time, it's not that easy to set it up, you experience a lot of these small issues, it's cumbersome to test multiple implementations.
07:43
Which is the reason why I am presenting Vegvizir today. The idea behind Vegvizir is to actually aid with this kind of development, so if you're doing research or even development within Qwik, the idea is that Vegvizir can automatically set up these kinds of interactions between clients and servers, but also using simulated networks such that you can have actual repeatable and shareable experiments.
08:05
The way you do it is by defining experiments with configuration files, and a single experiment can consist out of multiple test cases, and the idea is that a single test case looks something like this. So you have the two entities, the server and the client,
08:20
which just assume their prototypical roles as known within the server-client model, and in between them sits a network component that we call the shaper, and the idea of the shaper is that it actually applies some kind of scenario on the traffic passing between the server and client. For example, it can introduce some latency or it could limit the throughput, it doesn't really matter what you want to do,
08:41
the idea is that you can do it in a repeatable way. You also see the Docker container stuff on top, the idea of actually deploying these test cases within Docker containers is that we can easily share them with other people, which is a really nice benefit within the academic community, but also we can free certain implementations,
09:02
like we can actually save a Docker container and reuse it at a later point, so say for example something changes and we want to try an older version, that's totally possible with this setup. Additionally, within the QUIC community there have been some other efforts, if you are part of the QUIC community you might actually recognize this setup.
09:21
It's pretty much the same as one used by an interoperability project called the QUIC Interoperability Runner. They also provide containers for their setup that are more tuned towards testing the actual interoperability between QUIC implementations, but the benefit of using the same architecture is that we are actually completely compatible with their setups,
09:42
so that means that even though VEGVZ is relatively new, at this point in time we are already compatible with 15 out of the 25 QUIC implementations right out of the box. You also see on the right side that we have a client that can be defined as a CLI command. That's because early on we realized that if we want to test applications,
10:03
not every kind of test is suitable to be placed in a Docker container, which is why we also allow to define tests by just spinning up local programs as you are used to from a terminal. A good example of this is a browser. If you are doing some kind of media streaming experiments,
10:21
you actually want hardware acceleration and such to be enabled, which I guess you can do this in Docker containers, but it's really cumbersome to actually do this in a good way. Right, okay, so how are these experiments actually defined? Well, we decided to not use one single configuration file simply because that would mean we had to be very verbose.
10:42
We actually split it up into two types of configurations. On the left you can see the implementation configuration, which is actually what defines what is available within an experiment. So the idea is that an implementation configuration is similar to your list of installed software on your computer. You simply have a list of entities that you can pick from.
11:02
We also introduced a parametric system to make it actually really dynamic steerable from within an experiment configuration, and we will see some examples of that in a second. On the right you see the experiment configuration. That's the actual definition of what needs to happen within one experiment, so what defines the test cases. The idea is that you define how the entities
11:22
from the implementation configuration should behave and what parameters should actually contain those values through arguments. It also configures sensors. I'm going to talk about that in a second. But the biggest benefit of splitting these two up is actually that the configuration, the experiment configuration,
11:42
automatically produces this permutation, or rather a total of combinations from all these entities. So say, for example, you define two servers, two clients, and two shapers. The total amount of tests within this experiment will actually be eight because it compiles a complete combination of all these configurations.
12:02
Another benefit is loose coupling. So you might wonder, yeah, I still don't see the reason why you split these two up. Well, a big thing with academic research is that we actually want to test different versions. So if we have an implementation configuration that, for example, defines a client called Chrome, which then refers to a Chrome browser,
12:21
we can actually have one implementation configuration that refers to version, for example, 99, and we can have another implementation configuration that refers to, for example, version 100. The benefit of that is that if we simply swap these implementation configurations, we don't need to change the experiment configurations, meaning that we can, without having to verbosely rewrite all these stats,
12:41
really easily test multiple setups. Some examples. This is an example of an implementation configuration. I do invite you to go to the GitHub repository where everything is really nicely explained and we provide some more examples. I'm unfortunately limited by the screen size. You see that we always have to define in the implementation configuration three types of entities,
13:04
like we talked about earlier, the clients, the servers, and the shapers. These three are examples using a Docker system. In the top two, you see that we actually refer to Docker Hub images. These are actual examples that come from the QUIC and DropRunner project, which we are compatible with. The bottom one is a locally built Docker image.
13:22
The reason I highlight this difference is because the framework automatically pulls the latest Docker Hub images if these are available. But if you are using some kind of local implementation that you built as a Docker image, you actually have to build it locally and then refer to it locally. Another thing you can see here is the parametric system.
13:43
For example, the top client defines a request parameter that is then used within an experiment. The idea is that an experiment configuration then contains a value of it, and that you can access this value within a Docker image simply by using requests, then in this case as an environment variable.
14:03
All the parameters are passed as environment variables if you are using Docker images. Or in the case that you are using CLI commands or even in a more specific case of shapers, because shapers are a little bit more complicated, you can also use them directly in the commands you specify within the implementation and experiment configurations.
14:21
These are directly substituted and you can actually reference all the parameters within arguments, which is nice. A simple example of a CLI client, so one that's not using a Docker image in other words, you can see that the command is rather long.
14:42
That's because we cannot, well compared to a Docker container, we actually have to specify everything that needs to happen in a CLI command. This example provides three or rather four system parameters, which I highlighted here. The reason I did this is because the framework automatically generates all these details for you,
15:01
such that the experiments can be even more dynamically steered. This is especially handy for future use cases, where we for example want to expand upon multiple client setups and stuff like that. On the bottom you see a construct key. We actually have two special mechanisms for CLI commands. The benefit of Docker images is that they can actually have scripts on board that actually prime the environment.
15:27
That's the downside of using CLI commands, unless you want to put everything on one single line, which is rather also cumbersome. Instead we provide two mechanisms called construct and destruct, which are run before and after a command is executed.
15:43
These can be used to prime an environment and clean it up afterwards. This example sets or changes or manipulates the Google Chrome preferences to set the standard download folder output towards one generated by the framework.
16:01
Then we come to the experiment configuration examples. These are the actual configurations that define how a test should behave. You see here, once again, we have client shapers and servers, which we picked from the implementation that we just showed. We simply fill them in with the arguments required for the test to work.
16:21
A special thing to notice here is the shaper scenarios. Clients and servers are really simple. You just mention which one you want to use. For shapers, we have a more complicated setup. The idea behind the shaper is that it actually entails one kind of shaping. For example, you can use a TC Natum shaper within one container. This one container does not only do one kind of shaping.
16:42
The idea is that you can define multiple scenarios within this container. By passing through the scenario key, you can actually pick which one is used during a test. In this use case, we have one client, two shapers, and three servers, which means that we will have a total of six test cases that will get generated and compiled by the framework and run sequentially one after the other.
17:06
I mentioned sensors earlier. There's also a configuration you can do with the experiment configuration files. The idea is normally that the framework just automates all these tests, and that when a client exits, this should signal like the end of a test. However, in certain circumstances, this is not possible.
17:23
For example, if you use a browser, it's obvious that browsers do not have the ability to shut down from within a web page, which would pose some security risks, which is why we built a sensor system which can actually govern what happens within a client. For example, we provide two simple sensor setups,
17:42
time mode, which simply checks if a certain amount of time has passed, and then closes the client and signals that the test case has ended. Another one that we built is the browser file watchdog sensor, which enables us to check if certain files were downloaded by browser contacts. This enables us to pull metrics from the browser and also signify the end of a test.
18:05
If you provide these two configuration files to the framework, the framework will spin up a nice GUI. On the bottom, you can see which tests are happening, how much time has passed. You can see a little bit of packets passing between them, signaling that some kind of traffic is happening. You can actually increase the verbosity,
18:21
but this is not necessarily needed from within the terminal, as the framework automatically saves everything that happens as output in a file within the test case folders that we will now discuss. The experiment output is always saved under the label that is provided within the experiment configuration, because we can have multiple runs of an experiment.
18:42
The first entries that you will find within such a folder are actually timestamped to signify multiple runs. If you enter that, you will actually find the different folders that contain the data of the multiple test cases that were compiled by the framework. If you take a dive into one of these folders, you can see what output we are collecting in these cases.
19:03
By default, the framework will always create a server, client, and shaper folder, which get automatically mounted on the docker volumes under the slash logs directory. Anything the implementations want to save, they can just write files to this directory, and the framework will capture this and save this in the log files.
19:24
Additionally, clients also have a downloads folder mounted, simply because we want to differentiate and not come into a situation where downloads accidentally overwrite output logs generated by a client. You can also see that we have, especially under the server and client entries,
19:42
you can see keys.log and a qlog folder. The framework is automatically primed to save these encryption details and what's happening at the QUIC and HTTP tree layers by setting the SSL keylog file, which we discussed in the beginning, but also by setting a qlog environment variable, which gets recognized by most QUIC implementations out there nowadays.
20:05
Finally, we come to extensibility. At this point in time, we have a framework that is great at aggregating a lot of data. We did some tests that ran for two or three days straight, containing more than 8,000 test cases,
20:21
which were great if you want to gather a lot of data. But what makes a testing framework a testing framework is the actual ability to infer something from the output generated by a test, which is why we provide these two programmable interfaces called sensors and hooks. I explained sensors a little bit. We provide some basic sensors, but you actually also have the ability to program custom sensors.
20:43
This makes a lot of sense if you want to do very specific, or test for very specific behavior within your experiment. For example, if you are doing a video stream in the browser, you can actually send the decoding matrix of the video out of band to an HTTP endpoint, for example, that you set up in a custom sensor.
21:03
If the sensor, for example, detects that some frames are being dropped or decoded in the wrong way, it could prematurely halt a test, signaling that something went wrong. If you have lots of test cases, like we do, we actually have test cases, like I just said, running 48 hours. This is really beneficial because it halts the test in an early phase,
21:21
saving us a lot of time. On the other hand, we have the hook system. The framework currently is very broadly applicable. The downside of that is that we don't really know what's happening inside the test, but you can actually program some custom behavior through the pre-run hook and post-run hook system. As the name suggests, the pre-run hook runs before an actual test is run,
21:43
so you can prime environments by, for example, generating some dynamic files that you will need during the experiment. It doesn't really matter what you want to do there. The post-run hook is really nice because you can use it to analyze whatever happened during a test. For example, if Q logs are being generated,
22:02
look at the Q logs and maybe even generate some nice graphs that you can immediately check after a test case has ended. Another thing I need to mention with the pre-run hook and the post-run hook, if you don't like programming in Python, it's not really a problem. Python has this really great sub-module called sub-processes.
22:22
If you have some existing scripts that are made to work with the output produced by your experiment, you can simply call them also from this hook, meaning that you get exactly the same results without having to actually translate your existing code within these provided hooks. Right, that's in a nutshell what VEGVZR does.
22:42
Thank you for your attention. I think we have a couple of minutes left for questions.
23:01
A test case can be anything you want. If you're programming right now, you're developing something locally, the thing you need to do is actually wrap it within a Docker container, that's one way, or run it as a CLI command. You simply need to provide it to the framework, and the framework will just spin it up. The framework doesn't actually check what your test case is doing. If you want to spin up a simple, let's say, CLI command like echo,
23:25
and you want to print something to the terminal, you simply put it in the JSON, it will run.
23:52
What's the relationship here? Good question actually.
24:00
I'm not sure if there is a direct relationship with testing in the university. It's just that during my PhD and also the PhD of some of my colleagues here in front of me, we actually encountered that we had a need of such a framework, right? We had an actual need of spinning up multiple test cases
24:21
and helping us with setting up these experiments, which is why we designed this. Early on, we just had a very minimal thing that just worked for us, and then as time progressed, it actually became more and more mature, and we decided, well, this is actually a very good idea, so we created an open source project for it,
24:41
and we actually also submitted it to an open source and data set track for the MMSIS conference, which is happening in June in Vancouver.