So you want to build a deterministic networking system
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 542 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/61600 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 2023156 / 542
2
5
10
14
15
16
22
24
27
29
31
36
43
48
56
63
74
78
83
87
89
95
96
99
104
106
107
117
119
121
122
125
126
128
130
132
134
135
136
141
143
146
148
152
155
157
159
161
165
166
168
170
173
176
180
181
185
191
194
196
197
198
199
206
207
209
210
211
212
216
219
220
227
228
229
231
232
233
236
250
252
256
258
260
263
264
267
271
273
275
276
278
282
286
292
293
298
299
300
302
312
316
321
322
324
339
341
342
343
344
351
352
354
355
356
357
359
369
370
372
373
376
378
379
380
382
383
387
390
394
395
401
405
406
410
411
413
415
416
421
426
430
437
438
440
441
443
444
445
446
448
449
450
451
458
464
468
472
475
476
479
481
493
494
498
499
502
509
513
516
517
520
522
524
525
531
534
535
537
538
541
00:00
Computer networkDeterminismPhysical systemSoftware developerKernel (computing)Patch (Unix)State of matterBlock (periodic table)BuildingComponent-based software engineeringSpacetimeVideoconferencingVirtual machineControl flowSynchronizationData transmissionAdditionSoftwareStability theoryMereologyScheduling (computing)Decision theoryOrder (biology)Information technology consultingReal-time operating systemLink (knot theory)Cartesian coordinate systemPhysical systemVirtual machineLevel (video gaming)Pairwise comparisonAnalogyBitData transmissionProjective planeStandard deviationBridging (networking)Game controllerSystem callComputer hardwareOpen source2 (number)Product (business)Line (geometry)Queue (abstract data type)Multiplication signServer (computing)Connected spaceComputer fileInternetworkingPresentation of a groupRow (database)Different (Kate Ryan album)Kernel (computing)Software developerPatch (Unix)Computer wormTime zoneBuildingSpacetimeData streamComponent-based software engineeringBuffer solutionComputerFrequencyPoint (geometry)RoboticsArmSlide ruleArithmetic meanState of matterComputer animation
08:51
EmulationData transmissionSynchronizationService (economics)Database normalizationKernel (computing)Component-based software engineeringSpacetimeNetzwerkverwaltungComputer hardwarePeer-to-peerExecution unitComputer wormDemonTape driveTraffic shapingMereologyState of matterSlide ruleConfiguration spaceComputer networkSoftwareExtension (kinesiology)DemonMultiplication signBitImplementationStreaming mediaCommunications protocolSpacetimeReal-time operating systemKernel (computing)Plug-in (computing)Moment (mathematics)Goodness of fitData transmissionPoint (geometry)Link (knot theory)Standard deviationStack (abstract data type)Slide rulePhysical systemRight angleBand matrixNumbering schemeComputer hardwareException handlingArithmetic meanVirtual LANForcing (mathematics)HypermediaComponent-based software engineeringReading (process)Different (Kate Ryan album)Database normalizationHydraulic motorBus (computing)MeasurementState of matterFiber (mathematics)Network topologyCartesian coordinate systemRippingQuality of serviceProfessional network serviceFlow separationBoiling pointRaw image formatExecution unitInterface (computing)DataflowRouter (computing)MultilaterationComputer animation
17:36
MereologyState of matterTraffic shapingConfiguration spaceSlide ruleComputer networkLink (knot theory)Point (geometry)ImplementationBitDebuggerSpherical capPhysical systemRevision controlSoftwareStandard deviationLimit (category theory)Band matrixAxiom of choiceFunction (mathematics)Tap (transformer)Goodness of fitFlow separationSocket-SchnittstelleTouch typingPower (physics)Projective planeStack (abstract data type)Communications protocolPresentation of a groupSlide ruleEmailMathematical analysisGame controllerVideoconferencingCartesian coordinate systemDatabase normalizationRight angleComputer hardwareRoutingStreaming mediaNetzwerkverwaltungComputer animation
26:22
Program flowchart
Transcript: English(auto-generated)
00:05
Hi. Welcome to my talk. So you want to build a deterministic networking system, a gentle introduction to time-sensitive networking. Just out of interest, how many of you have heard of TSN or time-sensitive networking so far?
00:22
That's quite a few for a networking session. That's great. How many of you have already worked with that? Not so many. Okay. You will after that talk. Who am I? I think I'm a former systems engineer. I worked a lot with time-sensitive networking and its predecessors.
00:42
I also took part in standardization, so I also did some of that. And since last summer, I worked at a kernel developer at Pangetronics. That's a German Linux consulting and support company. We have roughly 7,600 patches in the kernel, and we also do consulting for real-time networking, amongst many other stuff.
01:05
And by the way, we're hiring, of course. Now to what we will look into today. We will look into applications. I will give you some examples why you would probably want to do networking over real-time
01:24
data transport over networking and what the implications of that is, what the requirements of these applications are. We will look into the basic building blocks, so sorry for the folks who already know about that. And we will talk a bit about which Linux user space and kernel components are used in building these applications.
01:46
And I will sum up the state of the union a bit, and then just as an announcement in advance, there are some bonus slides where I will give some more details and some references to open source projects already working with TSN. So if you're interested in that, just download the slides from the Penta and, well, check out the links.
02:07
And I also gave an example of how to basically glue together a stage box, so a transport system for audio data over the network. I won't make that into the talk because it has been shortened to half an hour.
02:23
So the example I will focus on today is audio-video bridging. So if you want to transport real-time data over a network for an application just as this talk, you want to have as low jitter buff or as small jitter buff as is possible to reduce latency in the system.
02:41
Because if you transfer data over a traditional network, packets could get dropped. So you have to resend them or you have to make sure that somehow magically interfering traffic doesn't do you any harm. And that usually involves quite large jitter buffers of up to several seconds.
03:01
And if I talk now and you hear me from stage and you hear me from the PA four seconds after that, that would be quite annoying. So you want to cut that down to as low as possible transmission latency, overall end-to-end latency. Of course, for TSN, which started as audio-video bridging or ABB as a standard, they
03:27
came across the fact that this technology could also be useful for quite some other applications. Most of the customers do like machine control stuff with that. So if you have a large production line and you want to transmit data between your PLC and your server drives or your robot arms and stuff, you
03:47
also want to make sure that your control data arrives in time at the actor or your sensor data is read in within a certain point in time.
04:00
And that's quite important to keep that timing. Same holds, of course, for aerospace and automotive and railways and stuff. I won't go into these applications today because we're, as I said, short on time. The first requirement of said applications is that you need to establish a common time base in the network.
04:22
That's due to the fact that while measuring time in computers is basically hooking up a hardware counter to a crystal oscillator. These crystal oscillators tend to have frequency drift over time, especially with temperature. And due to the different switch on points in time, you also have quite large offsets.
04:42
So if you start one device, say at 12 o'clock and the other at 1 p.m., they have one RF offset in there. So you want to make sure that all your network devices have a common meaning or a common sense of time passing and a common sense of what time it is.
05:07
Because lots of scheduling decisions for networking traffic may depend on timing. Also, for some applications, as the audio example, you also would like to regenerate your audio sampling clocks. So basically, in order not to introduce any additional degradation in audio quality, you want
05:26
to make sure that your sampling clocks of your ADC and DAC run basically in lockstep. And that is why you want to make sure that your time is distributed evenly. And the way this is done usually in networks is just as shown.
05:44
Basically, in this old style picture, you elect a so-called master clock. So basically, that's the best clock reference in your network or the most stable clock reference in your network. And then basically, you compare all other clocks to that clock reference, and they have to adjust their local time for that reference time.
06:06
It's basically just as those three gentlemen do in that picture. I like that comparison because you find a lot of analogies and the standards to just the way that works with like pocket watches.
06:21
And if you look into that, you will find that basic idea quite useful to keep in mind. Now, the other thing we want to have guaranteed is, as I already said, bounded transmission latency. So, if we go across the transmission of a data stream in the network, so that's what the standard calls a talker at the left.
06:48
And that's what the standard calls bridges. Usually, as we're dealing with layer two, that's ethernet switches. And in the right, that's what the standard calls a listener. You can also call it a source and a sync, but the standard talks about talkers and listeners.
07:03
And the packet goes from bridge to bridge, along its paths across the network. And each switch, basically a bridge, has an ingress queue and a switch fabric and an egress queue. That's due to the fact that you can only transmit one packet out of a certain network port at a time.
07:27
If another packet at another port arrives for that destination port, you have to store it. And you have to wait until the last transmission is done, and then you can transmit the next packet. And this introduces what's called the residence time in each switch.
07:42
So, even if you have a perfect pass-through through a network without any additional interfering traffic, you add a little time. At each step, your payload packet travels through the network. So, if our audio starts here, it's a bit later when it arrives here, and a bit later when it arrives there, and so on and so forth.
08:03
So, that's fine, as long as you have no interfering traffic. Because if you have additional interfering traffic, and that might be because we of course want to use our audio on converged networks. So, we want to use the same network for, say, our live PA system and for our network internet connection.
08:24
And we want to download a large file because we want to download a presentation recording from FOSDEM. And basically, that's where this entity arrives, and it creates a large amount of traffic here.
08:42
This will cause the packet here to be delayed until it's sent out of the egress port. And basically, it won't arrive in time. And if we go for as small jitter buffers as possible, that's a problem because we have a buffer on the run at the listener side. And basically, we have audio dropouts in the audio case, or we have stolen motors in the industrial control case.
09:07
That's something we have to avoid under any circumstances. So, basically something we want to have is quality of service. And so the picture, of course, you're professional networking engineers, so you don't need that picture.
09:21
But the picture I like to use for that is a bus lane in the street because also the bus runs in a more or less isochron sway. So, you send those bus or packets down the lane, and the way not to be hindered by the interfering traffic there is just basically to introduce a priority lane.
09:46
And that is what we also use in networks basically when we introduce quality of service measures. Another thing we need for at least some of these applications is link layer redundancy. So, imagine if there's a mixing desk right in the back, and we run a network link back there.
10:07
And someone just trips over that link, rips out the cable, or maybe it's a fiber link, and someone stomps on the fiber link. Bad things happen, and basically if our system is over, we don't want to have that. So, we want to introduce means of having redundancy schemes there.
10:26
Basically, you can't think of it as a real-time capable, real-time healing with no waiting time, like spanning tree-ish thing. You want to have the standard spanning trees quite uncutted for these kinds of applications, so we have to introduce other stuff there.
10:48
We have some other application requirements there that are not so important, so I leave them out for now. Now, what kernel and user space components do we have to implement that?
11:04
We will look into what the TSM components are later, or what the TSM standards are, because that's basically just numbers and letters. So, for time synchronization, for especially TSM, we use GPTP. That's a flavor of the precision time protocol, a generalized precision time protocol, of
11:25
which you can think of PTP, standard PTP, IEEE 1588, boils down to layer 2. So, of course, we're dealing with raw ethanet frames, so we can't use UDP for transport. It also has some other quirks, but they're not too important right there.
11:44
The way we do that with Linux kernel, we have the hardware time-samping units and the PTP hardware clocks. That's basically the interface to hardware clocks in your ethanet Mac or PHY. The user space component to run all the remaining stuff is PTP for Linux.
12:03
That's basically the way it works, and it works quite well. You can achieve down to several nanoseconds precision from point to point with that. For traffic shaping, that's the quality of service measure we want to employ. The kernel has the TC subsystem, and usually, if you configure that manually, you use IP router 2 or Netlink if you want to do that programmatically.
12:29
That's basically the way it works. We will look into a bit of detail later. For network management, so basically if you have to reserve data flow from a talker to a listener,
12:44
that's where it gets a bit sketchy because that's, of course, user space demons, and there aren't much. There's also a problem because there are several ways of doing that. The traditional way of ABB style, the initial implementation used the so-called stream reservation protocol.
13:04
Modern ways for especially pre-calculated or pre-engineered networks is using Yang-Netconf extensions. There are some demons for that, but support for the TSN extensions is not too great.
13:21
So, if you're into that, that's quite a nice thing to work on. For the real-time data packerization, that's mostly user space. Of course, you want to use some kind of features like ETF, QDiscs, and XDP to have as low overhead as possible
13:44
and to make sure that your transmission is sent out as asynchronously as possible, and you want to use offloading for that. Then there's some very application-specific user space components.
14:00
For audio-video stuff, you can use the Gstreamer plugins. For industrial control, I'd recommend using the Open6651 implementation. That's not quite finished yet, but it's a good starting point at least. For the link layer redundancy, that's what PCR-INFRER is.
14:24
Basically, the standards are finished since one or two years. There's not much hardware supporting that yet, and you really want to have hardware offloading for that. So, you're basically down to proprietary vendor stacks at the moment.
14:41
There are efforts to put stuff mainline, but they are not quite there yet. But stuff is coming, and that's the good thing with that. I think one slide is missing there, which is not a too big problem.
15:03
Yes, one slide is missing. Basically, how to put stuff together with TSN, I will summarize it without a slide. With time synchronization, we have GPTP. That's IEEE 802.1AS for the IEEE standard fetishes here in the room.
15:24
Traffic shaping, the basic standard stuff is the credit-based shaper, but there are more time-aware shapers available right now. They are basically making more efficient use of your network, and the way
15:40
that works is basically reserving bandwidth along your data flow path in your network. Network management, again, that's a bit application-specific, so the audio-video and professional audio-video stuff is still using the stream reservation protocols.
16:06
For the payload, as I already told, that's really, really application-specific. For redundancy, we use PCR and FRIR. Usually, there are some exceptions to that, especially for professional audio-video.
16:21
PCR and FRIR were unstandardized when those standards were written, so there are some proprietary, or not proprietary, but some other redundancy schemes where you basically send two different streams and try to separate your networks via media.
16:41
The means of VLANs usually try to force different data paths through a network. Basically, nowadays, you want to go PCR and FRIR whenever your hardware supports that. State of the union, the hard stuff is already done. There are already implementations in the kernel. There are user-space demons available.
17:07
That's, again, the stuff that's difficult to get right. If you want to implement those standards, first of all, you have to read tons of paper. I did that for an employer. It took me two years.
17:23
That's really hard to get right. The good thing is that that is already implemented. You just have to use it, and you have to use the right knobs. For some stuff like GPTP and traffic shaping, you want to really, really use, for GPTP you have to use, for traffic shaping you want to use hardware offloading.
17:46
You have to bear in mind that your network gear has to support explicitly GPTP and traffic shaping. So, the bandwidth reservation and basically making sure that your traffic shaping is applied properly.
18:03
That's not true for every hardware, especially not for commodity hardware. And bear in mind that sometimes configuration, especially for traffic shaping, can be quite tricky. As I said, I have added bonus slides to the presentation. I will check that they have the right slides in there later on.
18:27
Or just contact me. The point is, especially credit-based shapers can be really, really tricky to set up properly and to make sure that you reserve the bandwidth you want. Because you want to have the remaining bandwidth to be available for best-effort traffic.
18:45
So, the idea is that you can use, say, 70% of your link for your audio-video stuff and still have 30% of your gigabit link, which is what we're usually dealing with for audio-video. Available for best-effort, network management traffic, and whatsoever.
19:05
So, you really want to make sure your shapers are configured the right way TM. And it's quite hard to tweak the right knobs and IP route. So, there are good examples, and I'd strongly recommend to read the docs on that.
19:23
There's also a link to the TSM reader docs for Linux. It's quite a good starting point for getting into that whole topic. And, yeah, basically I think that's it. Do you have any questions?
19:45
Any questions here? Thanks for this. What's the highest speed Ethernet implementation of this you've seen? Have you seen anything beyond 10 GigE, for example?
20:01
I have seen a 10 Gig implementation for that. As far as I recall, the standards have some limitations with respect to how you communicate your bandwidth requirements. And they're a bit capped. I'm sure and I know that they are working on that for future revisions of the standards.
20:26
Because, of course, now faster links are becoming available more and more. Most applications for TSM, like the control stuff or the AV stuff, are running on 100 megabit links still.
20:41
You want to go to gigabit links because you can achieve quite a bit lower end-to-end latencies on faster links. But I haven't seen, personally, haven't seen faster stuff than 10 Gig so far. But I'd be interested to do so.
21:01
Do you have happy stories or really users that have put this in production? And can you tell more about this? Yeah, so if you want to check that out, you can just Google for Milan and TSM, which is the professional audio video stuff.
21:22
And just before COVID started, shortly before COVID started, they ran the Wammstein concert in Munich over a TSM system. It's a really large system with several video balls and several hundreds of thousands of audio streams and power techniques and light control and stuff all in the same network converged.
21:45
So that's the largest installation for live audio I know of. And I think that's quite a good story to tell. I was curious if you had the chance to play around with synchronous Ethernet as well?
22:02
I haven't looked into that too deep yet. I can't tell you too much about that. You mentioned XDP. Are you aware of any applications of XDP in that area?
22:25
To be honest, I haven't seen them and I will start working on some of them for a customer project in just a few weeks probably. The idea is that basically because it's layer two, you don't have much network stack above the hardware layer.
22:44
So if you can cut some of the Linux networking stack because you don't use it anyway, you work on raw sockets anyway, you could just cut some of that out and try to achieve lower latencies in your basically Linux stack there.
23:05
Probably on the next FOSTM I can probably give you a talk on that. This is probably a big question, but how do you go about debugging this sort of stuff? So like setting it up or if you think there's a problem, how do you go about finding problems?
23:21
That's actually a bit of a pain point and you have to know at least a bit what the same values for like path delays for the PTP and stuff are. One of the most useful debugging tools I found so far is a good Ethernet switch because it will give you output for your stream reservations.
23:45
It will give you output for your PTP or GPTP. You can also like sniff traffic with wiretaps basically and analyze it in Wireshark or Skypie or whatever your tool of choice is.
24:03
That works best to be honest for 100 megabit links because you can use passive taps. It doesn't work that great for gigabit links because it violates some of the standards a bit. You can also use like mirror ports and switches to exfiltrate traffic, but basically it's a more manual approach of debugging.
24:26
I'd like to get in touch with, if anyone is interested in, just write me an email to start a community-based project of automated analysis of TSN networks basically.
24:42
I think it's something we really, really need, especially for people who aren't that deep into the standards. We need to make sure that we can basically have a one-click check and set up and can tell from a tool that at least if that looks okay-ish or not what you're doing.
25:04
I'm not aware of any projects so far, so I'd like to start, but I'm not too experienced in how to start such a project. If you're experienced in that or are interested in that, just write me an email, get in touch, and maybe we can set up something.
25:24
Any more questions for the last one? You mentioned some protocols for link redundancy. Can it also be used for node redundancy?
25:44
I'm not entirely sure. I would have to look something up. I think basically it should because it's about the data path. If one node drops out, basically that would work as well, but it won't work for the endpoints.
26:06
For the talk of the listener, of course, it won't work, but for nodes in the middle of your graph, that would probably work. Okay, thank you very much again for the presentation.