TRex
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 561 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/44565 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Physical systemProjective planeOpen sourceRouter (computing)ResultantComputer animation
00:37
ExplosionNetwork socketMilitary operationEndliche ModelltheorieComputer networkSoftware testingDifferent (Kate Ryan album)BenchmarkReal numberLibrary (computing)Streaming mediaStandard deviationServer (computing)Computer hardwareOpen setSoftwareOperator (mathematics)Open sourceNormal (geometry)Asynchronous Transfer ModeResultantVirtualizationFreewareState of matterCartesian coordinate systemSlide ruleDataflowGreatest elementRouter (computing)Type theoryLibrary (computing)Cache (computing)Context awarenessNumberClient (computing)Right angleSoftwareMultiplication signScaling (geometry)Server (computing)ScalabilityComputer animation
04:13
Function (mathematics)User profileDuality (mathematics)Control flowPlane (geometry)Direction (geometry)Streaming mediaCore dumpStreaming mediaComputer programmingPoint (geometry)Profil (magazine)2 (number)Level (video gaming)Different (Kate Ryan album)StatisticsStructural loadBlock (periodic table)Video game consoleMixed realityDirection (geometry)Template (C++)Graphical user interfaceScaling (geometry)Type theoryBuildingLinearizationAsynchronous Transfer ModeCore dumpMultiplication signJava appletWordOpen sourceSystem callServer (computing)Range (statistics)Computer architectureNumberCausalityRight angleMultiplicationDrop (liquid)Poisson-KlammerCartesian coordinate systemClient (computing)Software testingLatent heatComputer animation
09:12
Firewall (computing)State of matter
09:29
Server (computing)Scale (map)Core dumpInteractive televisionMeasurementSimulationBit rateEmulatorField (computer science)Abstract syntax treeClient (computing)CodeTemplate (C++)Queue (abstract data type)DataflowHausdorff spaceDataflowGame controllerEmulatorClient (computing)Server (computing)Software testingProfil (magazine)Table (information)Scaling (geometry)Cartesian coordinate systemCuboidPoint (geometry)StatisticsSemiconductor memoryCore dumpConnected spaceGodUtility softwareInstance (computer science)SimulationPerspective (visual)Streaming mediaOrder of magnitudePointer (computer programming)DivisorStack (abstract data type)Thread (computing)Computer programmingResource allocationWritingWindowBuffer solutionLevel (video gaming)Kernel (computing)Closed setKeyboard shortcutInteractive televisionMultiplication signDrop (liquid)Right angleDependent and independent variablesComplex (psychology)Data managementRandomizationNetwork socketPlanningMathematicsPairwise comparisonCodeCASE <Informatik>Computer fileEvent horizonSystem callFreewareBit2 (number)View (database)Multiplication
15:32
Euler anglesComputer animation
Transcript: English(auto-generated)
00:05
Okay, I'm Hanoch, I'm from Cisco system, and I'm going to present to you a small project that we created to test our routers and our feature on the router. It's called QX, and actually it was my first journey in the open source, because
00:26
my friend told me, try to open source it, don't put it in your drawer, and this is the result, okay? So I'm going to talk about traffic generation, how we are doing it in Cisco to test our routers,
00:46
and it's all based on software. Okay, today there is a DPDK library and I will talk about that. So because I have very, very short time, I will talk about stateless and advanced stateful mode of the traffic generator.
01:01
Let me start with the result. The result after we open source the traffic generator, it seems that many have the same problem. Many needed traffic generator to test the routers. So for example, many open source started to use T-REX, like OpenNV, DPDK if you want, Fido, Cisco internally, and many things in Cisco and Intel and Mellanox, and Reddit
01:27
and small. This is from analytical, our documentation. You can see that it's growing, about 1,000 users actively. And this is the mode of operation.
01:42
Because there are many type of modes that you need to test, there need to be modes of operation from the traffic generation. For example, you don't need realistic traffic to test a switch, right? Because there is only a switching of packet of simple lookup. But when you want to test SNORP that doing inspection and normalization or DPI environment,
02:08
you need to create a realistic traffic. Something that with layer 7 that simulate client, servers, and application. If you want really to evaluate the performance of the gear that you want to test.
02:24
So there are, in general, two modes of operation, stateless and stateful, with more, one of them is I will talk about advanced stateful and I will talk about stateless.
02:41
Okay. The problem that we try to solve, again, is to estimate the performance of stateful features on the router. Stateful feature on the router behaving in a weird way. You know, for every flow we open a context, we cache the clients, cache the server, try
03:01
to normalize the traffic, and so forth. And by bumping a UDP packet and short packet, we won't get any reasonable number, right? It won't give us anything. But because of that, we need to generate realistic traffic.
03:21
The problem is that realistic traffic really cost really expensive, like 500k for 50 gig or 100 gig, and it's not flexible, and this is the reason that we open source this. So what is T-REX? T-REX is a software, it's an application, it's a Linux application, it sits on top
03:41
of DPDK, and it's exposed a free way of mode that I've already talked about. And using that, it can come with a container, and it's scalable. Everything is about scale and virtualization. This is how I perceive it.
04:02
It's really, really fast. Everything is from bottom up is scale. And this is the slide I'm using in Europe. There is another slide for the US. Okay, let's talk about stateless. Stateless is a way to generate traffic to test switches.
04:27
It's composed, the building block is a stream. We call it a stream. And you can add more stream, remove stream, and the API is JSON RPC, and we created a Python that simplified the way that you can work with it.
04:46
So there is a server side, you install the T-REX server, and then you interact with it with the client, with the Python. You add stream, remove stream, get statistics, start it, and everything else. There is a nice GUI that works on top of this API, on JSON RPC API.
05:06
And let's see what is stream. This is the traffic mix that we are using. So you can build a profile that builds from streams. In this example, there is three streams.
05:21
The blue one, the green one, and the yellow one. So the blue one is a packet that I can generate using SCAPI. Anyone know SCAPI? Okay. So you can build a packet, a template of a packet, and then you can build a program that change the packet over time. For example, I want to create a range of source IP to destination IP.
05:44
And then I can choose the mode. The right, the blue one is continuous, just from the packet in specific, right? The green one is a burst of packet, let's say I want only three packet, and the yellow one is the multiverse, with inter packet gap, inter burst gap, and I can connect
06:03
them. I can create a program. Let's say when the green one finish, start the yellow one, and then point to another stream. It's like a program that you can build on Python, using Python. Let's see how simple it is before that. So this is the architecture, the high level architecture, there is a server.
06:21
There is an RPC using JSON RPC. There is a data path that I talk later on that is scalable with the number of cores. So as you add more cores, you will get linear scale with the performance, which is really high. And then come the Python that encapsulates all the JSON RPC into a nice API.
06:44
And there is a Java API for someone, Ericsson is supporting that. And there is a GUI, and there is a console, there is the API. Okay? And then you just wrapping everything for the user.
07:03
So we separated the definition of the profile from what to do with the profile. This is the definition of the profile. I define this is the yellow word of a profile of stateless. We defined really a simple continuous stream with this definition, Ethernet over IP over
07:25
UDP over 10 access. Okay, this is the packet. There will be a different type of packet from different direction. From one direction it will be 16001 to 48001, and the other direction it's the opposite.
07:41
This way we can create bidirectional traffic. So this is the profile. And then we can manipulate the profile, load it. So there is a console that you can start the profile, load the profile, get statistical about the profile, and so forth. And there is the API. This is the API in packeton that we wrap everything.
08:00
So in this way we are connecting to the server, and then reset everything, reset all the statistics, add the stream that I showed already, clear the statistic, start the traffic, and here I can multiply and say that the traffic that I want is five megapackets per second for a duration of 10 seconds.
08:21
And then I can wait. After I wait I can get statistics, how much packet, how many drops, how many packets were sent, and so forth. Okay? So really, really simple API. And T-Rex do the hard work to separate what you ask it, what you ask him, to multiple
08:44
call. It's by magic dander, okay? You don't need to separate the profile. So this is the thing that we do for you. This is the performance, this is on the Excel 710 on one core.
09:02
You can see that we can reach to 30 megapackets per second in one core, and it's linear scale. So it's all about the performance. So let's talk about stateful. Stateful is more for features that inspect the traffic, like DPI, like SNOT, like firewalls,
09:26
like NAT. They need to generate stateful traffic. In this example T-Rex can act like a server or like a client and generate traffic on top of TCP stack that we wrote.
09:40
The reason that we wrote the TCP stack is that if you will take a TCP stack from Linux it won't scale. It will scale to 1 million packet per second, and we needed much more. We needed 10 million, 40 million active flow generating 200 gig. And this is how we did it. We took a BSD, native BSD stack, and we changed it in a way that it would be multi-core.
10:06
Every instance of the, every thread instance had its separate stack, and we did through the API, through the control plane, separated the application to each core. And we manage that. From the perspective of the user, you see one box that do one thing.
10:25
Okay, so there is a layer of emulation of application on top of TCP stack, on top of DPDK, and everything is event-driven. Every core has an event-driven loop, no thread, nothing, no locks, no interaction
10:40
between them. Only a messaging between the core. And by that we can reach really high scale. This is an example for the emulation layer. Okay, the clients, for example, do a request, and then wait for response, and then can do a delay of random, and then send another request, wait for response, and close.
11:02
The server side do the opposite. You wait for the request, send a response, and so forth. Okay? This is just the low world of the lower level micro core that we have in the emulation layer. Let me show you a real profile.
11:21
Remember the profile of the stateless, it was Python, it talk about streams. Here we are talking about application on top of TCP stack. So in this example, we have a utility that can take a pickup file, convert it to the instruction that the emulation layer understand, and then replay it on top of the TCP stack
11:43
from the client to the server, and so forth. And by that reaching millions of servers, millions of clients talking to each other and exercise the device under test with millions of flows. I will show performance. So just to dive in, what we are doing inside, so from simulation point of view,
12:06
from the client side, we are simulating, creating the socket, connect. Once we got the Cincinnati, we are writing a buffer to the TCP stack, and then reading. This is an example of write and then read, and then we close the flow.
12:21
From the server side, we don't open all the servers ahead of time, because we cannot do that, right? Let's say we have a millions server. We won't do the bind and socket, because we might not need all of them. So we have a special API today, because we wrote the TCP stack from BSD.
12:43
We had a lazy allocation. Once we get the packet to a server, we dynamically simulate everything like it created the socket, bind, listen, and then start the program. But you don't need to do that, right? This is internal. You just need to define what you want to do, to provide the pickup, and we will do
13:02
it for you. And get all the statistics. This example is two templates. There is tons of statistics. Statistics is the god here, right? We cannot miss even one packet. So you can take a JSON of all the counters from the TCP stack, from the flow table
13:21
stack, from UDP, and from other layers of the traffic generator. I want to touch really one point of the complexity, what we did. This is a real engine, by the way, of the car that I showed you before. So let's see what is the problem with scale with TCP.
13:44
This is one flow of TCP. In the transmit side, you have a sliding window, let's say 32K or 64K. This is how the MBuff, MBuff is a layer inside the kernel that manages the pointer inside the packet.
14:01
This is how a packet looks like. Okay? It needs 32K. In the worst case, if we need 10 million flow, let's do the math, 10 million flow multiplied by 32K, it's about close to half a gig of memory. But we cannot have half a gig of memory for 10 million flow.
14:21
We support 40 million flows, okay? So what we did is we changed the API of the stack. Instead of pushing the data to the stack, the stack is asking the data from us, from the upper layer. And by that, we are saving a lot of memory. We can use for 10 million flow only .1 gig of memory.
14:46
And you don't need to do anything. You just need to write, to use one of the profiles that already have, and you change it a bit, and it will do it for you. The same problem happens in the havoc side when there is a drop.
15:01
There is a window that tries to accumulate everything. Okay? And just to show you the last comparison, we compare Nginx with T-Rex to T-Rex client and server. And the performance is factor 100 faster, and from memory perspective, it's three orders
15:23
of magnitude. That's it.