Multipath TCP for FreeBSD
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 41 | |
Author | ||
License | CC Attribution - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/18658 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
4
8
13
14
16
22
25
33
39
00:00
ArchitectureInternetworkingComputer fontImplementationTopologyComputer networkOptical disc drivePatch (Unix)Military operationExtension (kinesiology)MultiplicationAddress spaceSingle-precision floating-point formatDatabase normalizationÜberlastkontrolleLink (knot theory)Channel capacityMobile WebCellular automatonNetwork socketStandard deviationTravelling salesman problemComputer configurationControl flowMessage passingSession Initiation ProtocolInterface (computing)DDR SDRAMLattice (order)SequenceNumberLevel (video gaming)Sequence spaceCarry (arithmetic)Assembly languageOrder (biology)Numbering schemeDefault (computer science)Hand fanPunched cardConditional-access moduleUncertainty principleArc (geometry)LogicComponent-based software engineeringContinuum hypothesisScheduling (computing)Game controllerCodeFlow separationInterior (topology)Absolute valueSystem on a chipCommunications protocolBuffer solutionBlock (periodic table)Data bufferData structureSummierbarkeitElectronic mailing listSynchronizationHeat transferLimit (category theory)Router (computing)Bit rateQueue (abstract data type)Insertion lossScalable Coherent InterfaceSample (statistics)Execution unitComputer filePhysical lawSoftware testingNetzwerkverwaltungDataflowComputer programWebsiteAlgorithmAssembly languageCodeData structureDiagramSequenceGraph (mathematics)ImplementationInformationMathematicsLogicOrder (biology)Perspective (visual)SimulationSoftwareStatisticsParsingNetwork topologyModal logicSemiconductor memoryFrequencyType theoryLevel (video gaming)WindowDrop (liquid)Software testingBand matrixReduction of orderFlow separationComputer configurationState of matterAnalytic continuationBitConnected spaceExpected valueFunctional (mathematics)Sheaf (mathematics)Line (geometry)MultilaterationLimit (category theory)MereologyMetric systemMoment (mathematics)MultiplicationProjective planeSlide ruleResultantScheduling (computing)HypothesisSampling (statistics)Term (mathematics)Sequence spaceHeat transferQueue (abstract data type)Hydraulic jumpNumberElectronic visual displayEmailLink (knot theory)QuicksortSystem callRevision controlPrototypeSimilarity (geometry)Server (computing)Covering spaceRange (statistics)Regular graphArrow of timeInternetworkingCoprocessorWeightException handlingCASE <Informatik>Process (computing)Connectivity (graph theory)Magnetic stripe cardRandomizationParallel portPoint (geometry)Router (computing)Ocean currentNetzwerkverwaltungControl flowCuboidLastteilungModul <Datentyp>Arithmetic progressionKernel (computing)Communications protocolRoundness (object)Web pageCartesian coordinate systemOnline helpThread (computing)Direction (geometry)Insertion lossData streamLattice (order)Electronic mailing listGraph coloringAddress spaceSource codeDatabase normalizationQuality of serviceBuffer solutionBackupClient (computing)ÜberlastkontrolleBit rateChannel capacityView (database)Streaming mediaDisk read-and-write headDifferent (Kate Ryan album)Finite-state machinePatch (Unix)Block (periodic table)Single-precision floating-point formatLengthNetwork socketProxy serverMultiplication signStandard deviationSocket-Schnittstelle2 (number)MappingSpacetimeRight angleService (economics)Interface (computing)Game controllerDefault (computer science)Mechanism designTangentRepository (publishing)Software developerVideo gameDynamical systemNatural languageFeedbackSpeech synthesisSymmetry (physics)Dressing (medical)Computer programmingVariable (mathematics)Vector spaceTask (computing)Decision theoryTotal S.A.Analytic setArithmetic meanForm (programming)Series (mathematics)Forcing (mathematics)Contingency tableMaxima and minimaExtension (kinesiology)Local ringTable (information)Cellular automatonMixture modelToken ringStapeldateiLogical constantComputer virusWater vaporResponse time (technology)Basis <Mathematik>Error messageMetropolitan area networkField (computer science)Electronic signatureMusical ensembleSummierbarkeitInheritance (object-oriented programming)Set (mathematics)Interior (topology)Reading (process)Focus (optics)Traffic reportingSound effectEuler anglesGraph (mathematics)Computer fileVertex (graph theory)Closed setCondition numberEndliche ModelltheorieKey (cryptography)AuthorizationRepresentation (politics)Object (grammar)Element (mathematics)ArmParticle systemSymbol tableDegree (graph theory)Rule of inferenceActive contour modelGeschlossene MannigfaltigkeitOffice suitePosition operatorPattern language1 (number)DiagramProgram flowchart
Transcript: English(auto-generated)
00:00
And everyone is already here. So I'm Nigel. I've been working on multipath TCP and implementation for FreeBSD for a couple of years now. So today, basically, I'll be going through a little bit about what the current state is of the implementation.
00:20
But of course, multipath TCP, people don't generally know the protocol in and out. So I'll spend a little bit of time talking about the actual protocol itself and then an overview of the implementation in terms of what's changed from how do things work with standard TCP and what's changed
00:42
to enable multipath. And hopefully, if I have enough time, which I should have, I've got basically a simple topology at the end, something that when the next patch comes out, there's some documentation. And there's a simple example topology for setting up some VMs and doing some multipath stuff.
01:02
And I'll just show a little bit about how that works. So just on me, it's not too much exciting there, really. I did an undergrad in telecommunications engineering and networks when I graduated, did a couple years of network research, so in classification stuff,
01:21
in QoS stuff. And then I left for a while and did some totally random tangent career things before eventually coming back to the network research a couple years ago. And that's when I got back into this multipath TCP stuff. And at the moment, I'm completing a master's degree,
01:42
so post-grad. And I think it's called research enabling multipath TCP for FreeBSD or something along those lines. So on the implementation itself, given that we're a research lab, our first priority was how can we make something that we can use
02:00
to do more network research, particularly with multipath. It's quite new. There's a lot of different scenarios in which you can use it. There's not one simple solution for all of this. There's things to consider in terms of congestion control and how to schedule data segments and how to manage paths and all of that kind of stuff. So for us, it would be useful to have an implementation that
02:21
kind of makes it easy for us to push different buttons and pull levers and have different things happen. But it's not just about being a research tool. It's also something that should hopefully be something that people can use. So if you have a particular use case in mind at some point in the future, and if the FreeBSD multipath stack helps with that,
02:44
then that's a good outcome as well. And the last thing is interoperable with the current reference implementation and whichever other implementations pop up. So there is a Linux implementation at the moment. And being able to interoperate with that helps with standardization and that sort of thing.
03:05
So kind of a bit of background. And so I might as well say there's a couple of slides in here, which is very similar to what I presented a few years ago when I was here, but a lot of it has changed since then. So hopefully it's not too much that's too familiar and boring for people who are there.
03:23
Hopefully there's enough new people that it will all be interesting. So I started working on the implementation in around 2012-ish with some funding from Cisco. And that was kind of the idea of, hey, let's get something that we can use for research. So I was working on that primarily.
03:40
I was getting some help from Lawrence at the time because essentially I'd just begun kernel development stuff. So for me, it was all very new and getting some help with that in terms of designing and things like that was crucial. And so Lawrence has helped a little bit with that. So a patch came out, a couple of patches.
04:02
In fact, in March of 2013, so a little while after we started, I think it might've been 11 or 12 months. And those were pretty rough prototypes. So they did some multi-path stuff within a very kind of restricted use case kind of list. So if you did this, it would work.
04:22
If you did something else, you might get a kernel panic or you might get something crazy happening. After that, I had to switch onto another project. So I was working on that for a year and kind of doing the multi-path stuff in my spare time. So there wasn't a huge deal of progress over that period
04:42
as I kind of picked away at it. And then, so last year, around in the middle of last year, I started doing my masters and I was fortunate enough for three VST foundations to provide some funding for that, so for the first 18 months or so.
05:00
And Cisco's provided some more funding as well just for that last few months while I write a thesis. And so at the beginning of that, I was the last patch release, which was version four, we're calling it. And then there's not been any kind of new since then. And I'll go into a bit about why that was the case
05:20
just because basically I've gone through and redesigned a lot of the implementation. And so I was hoping by today to have a nice implementation that you could download a new patch with some documentation and some cool graphs and all of this kind of stuff from testing, but the testing has kind of gone on a little bit longer than hoped, as is the case often.
05:42
And so I'm still testing things, but that new patch should be out quite soon. Okay, so what is multipath TCP anyway? The easiest way to explain it, I guess, in a line is that if you have a host that has multiple interfaces or multiple addresses, it allows you to use those addresses
06:02
on a standard or on a TCP connection. And there's currently a couple of implementations out there already. So there's the Linux implementation, which came out some years ago, but within the last couple of years has become pretty feature complete and it's quite stable.
06:20
And then there's some commercial implementations. So the other most known one is Apple's implementation for Siri. I don't think it's used beyond that scope. And I believe Citrix and a couple of other companies have load balances or proxies that they use, which is multipath TCP. So why would we wanna use multipath TCP?
06:43
Well, there's a couple of advantages that you can get potentially from using multipath TCP. The first one is this idea of persistence and redundancy that you don't get with TCP generally. So you might consider TCP as being, you have two addresses, if one of those interfaces disappears,
07:02
you need to break that connection and that connection doesn't come back, you need to reestablish it. Okay, multipath has kind of an idea. It's called a break before make, where you can lose all of your connections underneath, but you can keep that connection alive for a little while if a new interface pops up. So you can resume the connection later.
07:21
So from the application, you don't need to terminate your TCP session. It simply stays there, MPTCP keeps it alive. Okay, so the other two here, reduce congestion and increase efficiency. Now, it might not necessarily be the case all the time, but if used in the right scenarios or if you use the right congestion control or so forth,
07:44
then basically you can reduce congestion. Say you've got multiple paths. Once a bottleneck, you can use congestion control to steer your traffic away from the congested path and not plug a particular path.
08:02
Oh, and of course, efficiency. Basically, if you've got this extra capacity there, TCP doesn't usually use it, of course. So why not employ that extra path if we can? Okay, so why extend TCP and not make something new? Basically, a lot of applications already use TCP.
08:21
So we don't need to modify them in order to use MPTCP. So we can add this extra functionality without changing our applications at all. Okay, and one of the big considerations when designing the protocol was, how can we make it work within the internet as it is today? So how can it be made to work with NAT, middle boxes which may not like protocols
08:41
which are not UDP or TCP? How do we make it compatible so that we can continue to use it basically straight out of the box with a new kernel? Okay, so here's like the basic simple scenario. And it's one of many, of course, but the simplest one we know is that we all have phones with multiple interfaces on them,
09:03
say cellular and Wi-Fi. And let's say we've got a TCP session, a standard TCP session on a mobile phone. Okay, if we move out of range of a Wi-Fi access point, then our TCP session is essentially gonna end at that point and continue.
09:21
If we wanna continue transferring data, we need to set up. A new connection. So in the multipath case, we can set up our connection. The multipath connection is aware that we've got multiple interfaces in this case. So we've got Wi-Fi and we've got a cellular interface. Let's say our Wi-Fi disappears, so we go out of range.
09:44
It's able to internally then just transfer the traffic to the cellular interface. Okay, so there's a little bit of terminology that I'm gonna use. One of them is subflows. I'll use that a lot.
10:00
So basically, if we look at this picture, we can see how multipath TCP works. So we have a process and we have a socket. Process says, give me a TCP socket. What we'll really get underneath is a MPTCP connection. And that MPTCP connection is gonna then manage a bunch of subflows underneath that kind of transparently
10:21
in order to spread data over multiple paths and so forth. So these green arrows here, there may be one or two or three in this case. We've got subflows sitting on the network. From the network's perspective, that just looks like three unrelated TCP connections from the processor's perspective here.
10:42
So the application doesn't know anything about this. It just thinks it's using TCP. Okay, so in order to set up some of this stuff and control these subflows and manage our connections, there needed to be some extra signaling. And so again, the least intrusive way
11:03
in terms of being compatible with today's internet was to use TCP options to send some extra MPTCP information. So in this case, we've got a new MPTCP option. Okay, and within that option, there are a bunch of subtypes. Don't know if you can see them, but they're not super,
11:21
it's not super critical to see them now, but essentially we've got stuff that sets up new connections, adds new addresses to an existing connection, provides some extra accounting information. So I'll talk about how there's an extra sequence space sitting on top of standard TCP, which is used to aggregate data. And we've got a few other connection close things there.
11:42
So in terms of setting up a connection, quite simple, it's simply piggybacks on top of TCP's handshake. So one host may send us in. Okay, it's gonna add an MP capable. That's the option that says, I'm capable of using multipath TCP. If the other box is capable of using multipath TCP,
12:01
it responds in kind. And then on the ACK, on the last ACK, again, MP capable. And at that point, the session would be considered a multipath session, even though we're only just using one address at this point. So adding in another subflow, well, there's a couple of ways to do that.
12:21
One way is to advertise and say, hey, I've got this particular address available. If you want, you can connect to it. And so I've got at the top here, a host sending an add or option. So we've established that connection. And now in one of the packets we're saying, in our options space, we're saying, okay, I've got this, I've got an extra address.
12:42
If you want, you can add this into the connection. You can try and connect to me. Okay, so we're doing that on our already established interface here. Okay, if the other box chooses to do so, it can send a SIM to that new address with an MP join. So an MP join strictly relates to adding more subflows
13:03
into a connection. And again, it goes through the handshake as TCP does. And at this point, then you can say, okay, now we've got two subflows between interface one and our host B over here, and interface two and our host B over here.
13:23
Okay, and you don't necessarily have to advertise an address. You can simply join from an address that you have. You can join directly into a connection. And there's tokens that are used to identify an incoming SIM. And so if you get a SIM that has an MP join on it, it's got some information about
13:40
which multiclass connection it belongs to. And at that point, host B in this case can say, yeah, I know I've got a connection and I'll continue to join this. Okay, so one of the crucial things about multipath TCP is the accounting. So we've got a TCP and we know that TCP is a via stream
14:03
and then we divide that up into sequence numbers. And then we use that to track our segments and do re-transmits and all that kind of stuff. What we've got now is that we've got multiple segments, multiple TCP subflows, and we need to then aggregate that data again at the receiver, let's say.
14:21
How do we do that? One, I guess, kind of immediate thought is, well, you can just take a TCP space and spread that out over multiple subflows. Can't do that necessarily because you may have a middle box that doesn't like big gaps in a sequence space. Let's say I've sent some data on one subflow, sent some on the next subflow,
14:40
and there's a big jump in the sequence space because of how it's been multiplexed. Okay, a middle box may not like that. So the solution for this was to add an extra level of accounting, so a data level sequence space which sits above the subflows and maps our data as it comes out of the send buffer, say, we map it into our individual subflows.
15:02
Subflows retain their own regular TCP sequence numbering, so they look like regular TCPs. And then later on, we take care of aggregating all of those segments together. Since we've got two levels of data sequence, or two levels of sequence numbering now, now we need to acknowledge up both levels,
15:22
so the subflows will continue to send acknowledgements for their subflow level sequence numbers, and the data level will also need acknowledgments. And just to kind of visualize this to make it a bit easier, so let's say we've got some data to send, so there's 10 bytes here, say, and we've numbered them one to 10, so that's the data level sequence numbering.
15:42
Now we wanna map that data into two subflows. So in this case, we've got subflow one, subflow two, and we're gonna map three bytes into each of those subflows. So the subflows now have their own sequence space, so subflow one is 50, 51, 52.
16:01
Subflow two is in a different sequence space. But importantly, we've still got our data level kind of sequence numbers preserved here. In this case, subflow two has, say, a shorter RTT than subflow one, so that that data has arrived before subflow one,
16:22
in which case we can act at the subflow level, because hey, that stuff's been delivered as far as the subflow's concerned. However, the data level is still out of order at this point, so we need to keep that in reassembly until we receive our byte from subflow one, in which case everything's in order.
16:42
We can do a data level act for seven, and we can do our subflow act as well for 53 on this subflow. This kinda just shows how it might look in a rough packet framing, so you've got your TCP sequence number, length, and so forth, and then you have an option
17:01
which specifies what the data level sequence is for this particular data segment, and it also includes length and stuff, which I haven't shown. Okay, so congestion control's kind of an interesting thing with multipath TCP in that now that we have multiple subflows, we can kind of look across all of these
17:20
and basically change our congestion windows based on metrics of different subflows. So say a particular subflow has a lower RTT or anything like that, then we can say, well, across all my subflows, this subflow's performing better. I'm gonna increase the congestion window by this much and hopefully send more data on that path.
17:43
And that's just by default, of course. You don't necessarily have to do it that way because, for example, we've got the default congestion controller, which says, well, in bottlenecks, I wanna be very fair to other TCPs, so if we have a standard TCP and two subflows here, we wanna make sure that they're not summing up
18:02
to a total greater than that one TCP, but if we don't have bottlenecks, we wanna be able to, say, steer traffic more towards the larger pipe in this case. So I've talked a bit about adding addresses
18:21
and data sequence numbers and congestion control and scheduling things. So how does that actually work logically or how does it look logically? Well, the session control block, it's a little bit like a TCP control block, in fact, very similar, and that's gonna take care of all the accounting, so things like what our next data sequence send is
18:44
or what we're expecting to receive next at the data level. But there's also these other kind of logical components that you wouldn't have in TCP necessarily, particularly this path manager. So the path manager's gonna be telling our session block, these are the paths that are available or these are the addresses that I have,
19:02
maybe you wanna join these, and it can signal and say, use this path now or add this as a backup path or add this into a striped round robin situation. We've got a packet scheduler, which basically takes care of, a write comes in, which subflow do we send on next?
19:21
We need to determine that at some point and packet scheduler does that. And so for the moment, the built-in packet scheduler is just round robining, but that can be other things. And the congestion controller, so do we do a couple of congestion control or do we just leave all the subflows to do their own kind of uncoupled new Reno
19:41
congestion control? That can all be defined by the congestion controller. Of course, these things are the intention, I guess, because we wanna be able to make this flexible so we can do more experiments, is to have these all as modular devices. So if you wanna experiment with different path management, so different ways of adding paths into a connection
20:02
or maybe not adding them into a connection, you can do that. Or if you wanna change how packets are scheduled, if you wanna, say, use paths with the lowest RTT first, then a modular packet scheduler can do that for you. And the same with congestion control.
20:23
So as I said, there's been a lot of changes between version 0.4 and 0.5, and it's been a long time between any news about any new patches. So why was that the case? Well, it was kind of a major design rethink. So in releasing 0.4, I basically went back and had to assess the implementation
20:42
as to how well it was working and how well could I, for example, maintain this into the future. So merging with head was becoming an issue, things like that. How much time do I need to spend merging things? How much parallel code do I need to maintain? There are certain advantages in how things were done in that initial patch.
21:04
But in terms of maintaining and kind of keeping MPTCP code separate from the TCP code, so previously it was very much entwined. There was a lot of overlap. Every time something changed in TCP in head, then I would need to make a whole bunch of changes
21:20
as a result of that. So the newer approach and perhaps not the most, well, in terms of performance, it may not be the best approach, but in terms of logically separating things a little bit and being easier to maintain, I think it was a little bit easier and better to change. And that required quite a lot of rewriting
21:42
of pretty much all of the code except for some option parsing. And another benefit of doing it in the way that it is now that it's a little bit easier to add support for things like modular congestion control and scheduling and things like that because I have more of an overall view of the TCP structures underneath.
22:02
So what does it kind of look like logically? Well, on the left here, you can see, this is what you might get if you brought up a standard TCP connection. You've got your socket, some protocol blocks, and you send your data that way. Okay, so multipath has basically been shinged in
22:21
in between the socket and the TCP layer. And what happens is that the multipath TCP control block contains a list of subflows. Each of these subflows is basically a socket, an internet control block, and a TCP control block within.
22:42
And so what does that look like in terms of how does that change how TCP behaves? So here is kind of a simple diagram of what TCP might look like. Let's say we get a data segment. Okay, we may need to reassemble some data or maybe in order, which case we can receive that
23:01
or send that up to the receive buffer. We'd need to, do we need to act that? Yes, we've received the data segment. We may update some accounting and then we can send our rack out that way. Okay, so how does that change now that multipath TCP is involved? Well, let's say we've just got one subflow and a multipath control block.
23:23
Okay, so our subflow, a data segment comes in. Okay, we can still act that at the subflow level, but we have some data that needs to be delivered to an application. So we pass that up to the data level. Okay, we check the data sequence numbers at that point,
23:40
reassemble if we need to. If it's in order, then we can deliver that data. Do we need to act what we've just received? Probably, in which case we choose a new subflow and say, hey, you, can you send a data rack for me now and send that out.
24:01
So in terms of the structures themselves, how they look. So if you were to create a TCP socket, you'll get something that looks like this. So on that far right, you've got some socket buffers for sending and receiving data. Down the middle of you've got your protocol blocks, so your inPCB and your TCP control block,
24:20
which is gonna be tracking all of your TCP statistics or your accounting and so forth. And you've got these protocol hooks down the side here, which say, my socket's gonna send some data, so let's call the appropriate TCP function to send that data. So how it's changed now is that
24:40
we basically try and retain as much as possible the structure of the TCP socket underneath, but what we're really giving you when you ask for a stream socket is this multipath structure. So a lot of it is replicated and based on what TCP was. So we've got our send and receive buffers now that we use at the multipath layer,
25:03
our multipath control block there, and functions for handling sends and so forth. But now if we, say, try and send some new data, we can check in our list of subflows here, say, run some packet scheduling, something like that, and say,
25:20
okay, I'm gonna use a particular subflow, and then we can call on that subflow a TCP function in order to complete that request. So I'll talk a little bit now about how the send and the receive kind of data structures have been changed a little bit. So TCP, let's say we've got a send buffer
25:42
and we've got a control block. UNA here is bytes that are being sent but not acknowledged, and send next is where we're gonna send next in our sequence space. So let's say we sent some data that hasn't been acknowledged yet. Okay, it eventually gets acknowledged. We can move UNA forward,
26:01
drop that bit off the end there, and so on and so forth as we work through our data stream. So as I showed before, basically retaining the socket structure of TCP but kind of opening that underneath multipath. So in this case, we've got a multipath send buffer,
26:20
and then below, we've got a couple of subflows here. Okay, each of those has their own send buffer and their own obviously TCP control blocks. And what we do is we can map data using the packet scheduler into different subflows. So in this case, we've created a map, and the map says, okay, you're gonna send this much data from the send buffer.
26:42
The data sequence starts at this point so that you can put that in your TCP option saying what the data sequence is. You've got this much data now, so you can go ahead and send that. Okay, so one subflow can start sending away. Let's say we get another write coming in, and now we wanna map all of that new data
27:02
to another subflow. So we've mapped that to the second subflow. We can see that this subflow here has sent some data in the meantime, hasn't been acknowledged yet. And now, just to show, we can basically map non-continuous data
27:21
onto a subflow, and the subflow kinda doesn't know the difference really. Let's put another map on that first subflow there. Again, the subflows kinda act independently, so as they send data, the acts come in. They drop it from their send buffer there independently. Eventually, we may get a take-a-level act,
27:42
at which point we can drop the data there. Both of these will disappear at that point, and that memory's free. Okay, and what I haven't kind of illustrated on this map here, on this diagram, sorry, is that this mapping that occurs from the main send buffer can actually be replicated across multiple subflows.
28:02
So let's say we've mapped this section of data to a subflow here. We can take that same bit of data, map it to another subflow, transmit them both at the same time. Whichever one acts as first at the data level, we can drop that. The other subflow can continue to try and send it and just drop it locally
28:22
when it's act at the subflow level. Yep, yep. Uh-huh, so I don't quite just repeat that one.
29:00
So you sent the first part, and it got dropped off. Yep. Now, before the rest of the first subflow finishes, part of the second subflow completes, and it's eligible to be dropped. Right, yeah, so there's no kind of, basically, it's cumulative, the acting at the data level.
29:22
You can't drop any data until, so if, for example, and this is part of the header line blocking, so what you're saying is that if this data is sent and received and acknowledged first, we can't actually drop it until all of this is done. Yeah, so that's part of the, one of the issues in scheduling, really,
29:40
in that you don't wanna create header line blocking for yourself by, say, sending your first data on a slow subflow, so sending the rest of it on something that's quicker, that needs to be buffered at the receiver, and you can't actually clear it out because you don't get acknowledgements for it. Okay, so talking about receive structures, then,
30:03
this'll kinda cover similar territory. So this is what it might look like on TCP. So you've got a segment reassembly list for segments that have come out of order, and let's say, in this case, data segment two is missing. One is being received. It's in order, because that's what we were expecting. So we can append that one to the socket buffer,
30:21
and that can get, since the application, okay, so how does that change now? So at the moment, we don't use a receive buffer on any of the subflows. That may or may not change in the future, but for the moment, let's say we're receiving segments, and again, we've got one, we're missing two,
30:41
and we've got four, okay? So there's a temporary structure here called a segment receive list. So at a point where you would generally append to your receive buffer, it's putting it in this separate list here. This little S here is basically saying, I've got some MPTCP signaling, okay? It's not relevant for the subflow to process that.
31:02
Let's say it's a data rack or something like that. What you wanna be doing is passing that up to the multipath layer to process that and respond to that. So in that case, we've got a segment with some signaling on it, so we're gonna enqueue that one up at the same time. Okay, so at the multipath layer,
31:20
we are using the receive buffer here, but we also need to do segment reassembly, and so this kinda relates to the header line blocking stuff. So we've got our second data sequence number, so DSM2. That's kind of arrived early. We don't have one yet, so we need to buffer that. We can't acknowledge anything at that point either, okay?
31:41
But now that we've received this, it's gonna be transferred from the segment receive list into a multipath control block, which essentially has a list of segments that are coming in from each of your subflows, and it's gonna process them a little bit like TCP do segment.
32:00
We'll process incoming segments if you're familiar with that. So we've got a bunch of subflows which will be appending segments into this list here, okay? The multipath thread will eventually run at some point, okay? And at that point, it will process your data level segments,
32:22
any signaling that's arrived, so that doesn't have any data on it, but it's got something that we need to process. That'll be processed at that time, okay? And reassembly is done if necessary. In this case, we've got our next expected segment, so we can append those to the receive buffer.
32:41
Those can be delivered to our application. What's happened here, though, is that we still haven't received segment two on our subflow here. I'm not showing it here, but what can happen is if that takes too long, that subflow may go into retransmit, say.
33:02
We may get new segments again, or at the multipath layer, we may say that this subflow is performing too slowly. Let's try and transfer DSM four and five on a different subflow.
33:21
So I'll kind of race through that. So sample topology. So with the patch, there's some documentation, and there's kind of like a baby's first multipath topology that's described in there. And I'll kind of go through what we might expect, what you might expect if you do the grab-the-patch, eventually, a simple experiment
33:41
that you can do with some VMs, just to see how things work and find out for yourself. So in this case, I'm just, I've got two hosts, two routers in the middle. All the routers are running dummy net to rate limit some of these connections. So subnet one and subnet two are rate limited
34:00
to eight megabit per second. Basically, host one is gonna connect to host two, transfer 50 meg of data. There's no packet loss in this network, so the queues are quite deep, so we do get a lot of RTT. And depending on what I've configured in the, path manager, we may get one subflow,
34:22
we may get two or three. So I'll go through that. So the first example is just a single subflow. So what does a single subflow look like? If we can connect it up, okay, we've got our rate limit. There's nothing too different from regular TCP happening here, okay?
34:42
And when we look at the throughput, it looks roughly close to eight megabit per second and nothing much exciting happens. So what if we make it a little bit more interesting? So now let's set it up so that host one connects to both of the interfaces on host two.
35:02
So we've got our initial connection, which is gonna be this blue one here. And then once that is established, this red line is gonna be joined into the connection. So the interesting thing here is that they're both kind of traversing a bottleneck. And again, I'm using round-robin scheduling,
35:22
so that's kind of the basic implementation that I've got at the moment. So essentially every time the process writes some new data into the send buffer, it's just gonna stripe between the different subflows available. Okay, so we get roughly four megabit per second, okay?
35:41
Well, that's what we were kind of expecting. They're both sharing that same link. They're both using uncoupled congestion control as well, I should mention, so they have their own congestion windows and the multipath layer doesn't really interfere with that. Okay, so what if we then try sort of an additive connection? So we've got these two separate
36:02
eight megabit per second links. We'll again establish this blue connection first and then add in this yellow connection afterwards as an additional subflow. And so we don't quite get what we would expect. Sorry, that's the wrong graph even on it.
36:22
So that's the per subflow throughput. And so we're getting about six megabit per second. We'd probably think we'd get eight megabit per second. I'll talk a little bit why that might not be the case here. But what we can see is that, okay, we're getting a little bit more than one of those links on their own.
36:41
So we're getting 12 megabit per second. Okay, so where things kind of get interesting and this diagram is a little bit colorful and there's a lot of lines, but basically the path manager by default, if you say an address is available to a connection, it's gonna try and join them all up together. So in this case, we've got two addresses on each of those hosts
37:02
and they're all gonna try and connect to each other. So we wind up with four subflows. So we've got across the top here, we've got down here, the red one as before, and then we've got this extra green one, which kind of connects the second address on H1 to the second address on H2.
37:20
So what does that look like? Well, it looks quite slow actually. So all of the subflows wind up doing about two megabit per second. We'd really expect to get kind of a cumulative throughput of say around 16 or whatever our two rate-limited paths are.
37:43
So why might this be the case? Well, if we just look at this top subflow here, so if we can kind of compare what's happening with this subflow across, I've only got a single subflow, now I've got two subflows,
38:00
now I've got four subflows, what's happening just with this one subflow in terms of the send buffer? Well, it's spending a lot of time not sending much data and that's why, because we've got such a long RTT, we've got a 32K send buffer by default, which is now being divided up each time across four subflows.
38:22
This long RTT is kind of absorbing everything. We're not filling the bandwidth that we should be filling up. And so basically these subflows are spending a lot of time with nothing in their send buffer not sending anything. And this is kind of like an interesting point in terms of the kind of things that you see or have to start thinking about with multipath
38:41
in terms of, okay, how do I handle all the aggregate traffic here? In this case, a 32K send buffer is clearly not enough because we're not gonna service our subflows with enough data. So a little bit of status.
39:01
Basically, I'm just doing the documentation and I'm doing some more kind of testing for the next patch release. So the previous patches have been a little bit buggy and being not so quite so easy to use straight up. The intention this time is to make sure it works quite well. It covers these kind of simple scenarios with round robining, with adding new addresses in
39:24
and basically not at the moment using coupled congestion control. But it should essentially work and you should be able to experiment with it and rely on it a little bit to keep working. So I should acknowledge a couple of institutions
39:42
at this point. So the FreeBSD Foundation, which has kind of done something a little bit different in terms of funding my masters. I don't think it's something that's commonly done. So they've allowed me to continue on with this work this year. Again, Cisco, who has provided funding on a couple of occasions for the multi-pass stuff
40:02
and of course, BSD-CAN for allowing me to come in and talk a little bit about MPT-SP again and hopefully pique some interest and maybe get some people interested in taking a look at the patch in the future and maybe providing comments, criticism or help in any way, which would be good.
40:23
And of course, there are some links here. That's my contact. That's the webpage where I host basically the patches and things so far. The idea is that after the next release, there'll be some kind of public repository where people can grab the source code. But I will update that as well
40:41
because it's a little bit out of date. But basically anything to do with the project is kind of on there, all the documentation, all the patches and all that kind of stuff. Were there any questions? I think we have questions. Enough time for that? Hi. You said you were using TCP options to make it work.
41:02
Yeah. How do you deal with the fact that many workers will drop packages with TCP options on the end of it? Yeah, yeah. So I didn't talk too much about the protocol, but this is one of the things that was considered. And so there's kind of fallback mechanisms that are being built in. So let's say we do try and open up a new MPTC connection
41:22
and that MP capable option is stripped off because it's not recognized. From that point on, it will just continue on as a standard TCP connection. Let's say we establish a TCP connection so the MP capable works. But then later on, one of the data mapping segments is dropped.
41:40
So that will be able to be detected and then it will fall back to regular TCP. So that kind of stuff, yeah. There's a lot of that and a lot of people have done testing. I haven't done much of that stuff myself, but it's kind of documented and they've spent a couple of years looking at that sort of thing and basically coming up with contingencies for that.
42:13
Yeah, so SAC works. I haven't tested it extensively with everything, but...
42:21
Sorry? Yes, they should work, but I haven't tested them. Basically, the TCPs underneath can work as a TCP works. Previously, so the multi-class stuff is just the kind of the sequence numbering and to take care of reassembling stuff at either end, mostly, so stuff like SAC and all that.
42:41
So SAC works still. All the retransmit stuff is done in a standard TCP kind of way.
43:01
So you've got two- I've been doing a lot of display too. Right. Okay, so in terms of congestion control work, so both senders can use congestion control. Is that what you're asking? Yeah, like there's a congestion control. Well, I guess you have more than two directions going on, but you've got a lot of bandwidth this way on one,
43:24
like a lot of bandwidth this way on the other. Right. Oh, right, in terms of moving data to an appropriate- Yeah, yeah, yeah. So depending on the congestion control algorithm that you use, they kind of all serve slightly different purposes, but yeah, one of the things can be you can use some detection or you can use RTT, say,
43:43
and try and use that to grow the congestion window more on subflows which have more bandwidth available, and you're going to send less data on your lower capacity link. You can do that with congestion control or you can do that with path management as well,
44:00
scheduling as well. So let's say you have a little bit of information about a particular path and that you may want to prefer, you can use the scheduler to kind of map all data to that one. Right, and then the situation where you have no info. And when you have no info, you kind of have to rely on the algorithm to work. So a bit more on estimate, let's say a subflow can't send,
44:23
but the acknowledgments cannot, in fact, do the same path. Can they still use another subflow, et cetera? So you want to use a path purely for acknowledgments?
44:41
Essentially. Well, if you're talking about at the data level, then yes. So if you're sending on one path, it still needs to receive its TCP level acknowledgments some way. So if that comes back via another path, but it ends up at your right interface, then that will work.
45:00
You can say, send your data segment acknowledgments on a completely different path if you choose. You can say, always nominate a particular path to send those on, if that kind of answers it. But if you're talking about at the TCP level, yeah. Let's say they're bidirectional at the beginning.
45:21
Eventually, one direction for one of the subflows is unavailable for a given. Then at that point, the acknowledgment for that subflow, because they found the same directory, it's like you have all these mini control blocks for these essential subflows. So you have mini TCP connections going on.
45:42
So then these acknowledgments come in back on that subflow one, your subflow one sending. OK, yeah, no. So basically, in that case, if you're not getting your acknowledgments back at a TCP level,
46:02
your subflow will go into retransmit and timeout or whatever. If you're talking about at the data level, if you can't send anything back on that path anymore, then it'll use the other one. It'll use the other subflow, yeah. And so what would happen is that if you're sending data, you're not getting acknowledgments back. Internally, you can say, well, this subflow
46:20
has gone into retransmit. Let's take all that stuff that was outstanding on that particular subflow and just send it on another subflow. But at that point, you'd be using only the bandwidth available for this? For whatever's remaining, yeah.
46:49
It can be either generally. So how I've done it is basically assume that, OK, let's get the client to connect in first. Let's say, in the issue, I guess,
47:01
if you might consider a lot of clients being behind that. So whatever, you can't really have the server connecting back into a lot of clients. So in that case, a client, if it's multi-home, it'll connect into your server. And if it's got another address, it'll just join that in as well.
47:21
Yeah, the server can advertise that. It has other addresses available, too. Yeah. No more questions? No? OK.