Packet Pacing – Rate Limit per flow (TCP \ UDP)

Video in TIB AV-Portal: Packet Pacing – Rate Limit per flow (TCP \ UDP)

Formal Metadata

Title
Packet Pacing – Rate Limit per flow (TCP \ UDP)
Title of Series
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
2016
Language
English

Content Metadata

Subject Area
Abstract
There is a growing need recently in the networking world for flow based rate limiting. Different use cases (such as video streaming) are implementing Software based rate limiting in order to maintain many connections at the same time. The rate limiting capability prevents bursts or micro bursts and ensures each streams is served with the required bandwidth. Offloading the rate limiting capability to the HW holds a big potential in the form of improved performance saving precious cpu cycles. During this talk I plan to go over the current solution that Mellanox has to offer. The different flows will be described in details; the control flow which starts from the socket and ends in the driver and NIC, and the transmit flow. The architecture includes changes in sensitive networking areas in the FreeBSD kernel such as the IP stack fast path, so critical decisions and open issues will be reviewed as well. The Packet Pacing feature traverse through different layers in the kernel and logically binds the upper areas in the network stack (i.e. socket, pcb) to a HW queue. Additionally we need to consider the potential of this feature when looking at basing its 'engine' in the TCP stack and allowing rate limiting to specific connections based on the window size and congestion calculations. As a result of these characteristics the purpose of this talk is both to present the solution and discuss its future.
Loading...
Dataflow Slide rule Implementation Multiplication sign Device driver Limit (category theory) Mereology Code Mortality rate Mathematics Kernel (computing) Office suite Implementation UDP <Protokoll> Software engineering Computer Term (mathematics) Device driver Limit (category theory) Degree (graph theory) Number Pointer (computer programming) Kernel (computing) Computer science Game theory Oracle
Presentation of a group Thread (computing) Digital electronics Distribution (mathematics) Code Multiplication sign Set (mathematics) Function (mathematics) Perspective (visual) 32-bit Mathematics Befehlsprozessor Network socket Smoothing Installable File System Data storage device Range (statistics) Maxima and minima Bit Überlastkontrolle Control flow Connected space Band matrix Arithmetic mean Befehlsprozessor Ring (mathematics) Order (biology) Right angle Quicksort Spacetime Dataflow Slide rule Implementation Overhead (computing) Characteristic polynomial Control flow Device driver Mortality rate Field (computer science) Iteration Band matrix Operator (mathematics) String (computer science) Software Ring (mathematics) Computer hardware Spacetime Integer Mobile app Overhead (computing) Distribution (mathematics) Standard deviation Characteristic polynomial Interface (computing) Weight Plastikkarte Computer network Mortality rate Limit (category theory) Cartesian coordinate system Pointer (computer programming) Kernel (computing) Software Personal digital assistant Network socket Window
Tuple IPSec Price index Open set Software bug Computer configuration Different (Kate Ryan album) Kernel (computing) Personal digital assistant Core dump Set (mathematics) Error message Installable File System Formal concept analysis Interior (topology) Parameter (computer programming) Bit ACID Orbit Arithmetic mean Digital photography Ring (mathematics) Hash function Oval Pi Order (biology) Phase transition Chain Reading (process) Spacetime Point (geometry) Slide rule Device driver Similarity (geometry) Number Queue (abstract data type) Data structure Communications protocol Key (cryptography) Network-attached storage Suite (music) Interface (computing) Weight Plastikkarte Usability Grand Unified Theory Line (geometry) Mortality rate Multilateration Limit (category theory) Hypermedia Error message Software Personal digital assistant HTTP cookie Table (information) Integer Flag Fibonacci number Transportation theory (mathematics) Length Code Multiplication sign Strut Set (mathematics) Instance (computer science) Parameter (computer programming) Function (mathematics) Mereology Stack (abstract data type) Uniform resource locator Mathematics Bit rate Network socket Addressing mode Resource allocation Measurement Type theory Computer configuration Pseudopotenzial Interface (computing) Configuration space Right angle Block (periodic table) Dataflow Functional (mathematics) Implementation Augmented reality Field (computer science) Sic Touch typing Gamma function Summierbarkeit Installable File System Default (computer science) Module (mathematics) Socket-Schnittstelle Addition Execution unit Multiplication Dependent and independent variables Letterpress printing Stack (abstract data type) Syntaxbaum Kernel (computing) Pointer (computer programming) Logic Address space
Mobile app Socket-Schnittstelle Implementation Digital electronics Interface (computing) Computer network Range (statistics) Limit (category theory) Control flow Mereology Mortality rate Mathematics Software Iteration Logic Network socket Kernel (computing) Network socket Ring (mathematics) Computer hardware Chain Spacetime Implementation
Group action State of matter Strut Parameter (computer programming) Open set Mereology Mathematics Mechanism design Kernel (computing) Network socket Logic Aerodynamics Information HTTP cookie Überlastkontrolle Term (mathematics) Connected space Type theory Latent heat Message passing Ring (mathematics) Phase transition Interface (computing) Modul <Datentyp> Energy level Point (geometry) Implementation Functional (mathematics) Control flow Device driver Limit (category theory) Field (computer science) Value-added network Twitter Read-only memory Intrusion detection system Computer hardware Queue (abstract data type) Energy level Computer architecture Module (mathematics) Multiplication Dependent and independent variables Interface (computing) Weight Projective plane Plastikkarte Computer network Mortality rate Limit (category theory) Cartesian coordinate system Embedded system Pointer (computer programming) Resource allocation Software Personal digital assistant Logic Network socket HTTP cookie Game theory Table (information) Window
Building Dynamical system Digital electronics System administrator Multiplication sign Mathematics Semiconductor memory Kernel (computing) Network socket Logic Query language Aerodynamics Information HTTP cookie Scripting language Texture mapping Block (periodic table) Term (mathematics) Complete metric space Statistics Connected space Latent heat Message passing Ring (mathematics) Buffer solution Interface (computing) Configuration space Modul <Datentyp> Energy level Dataflow Statistics Implementation Overhead (computing) Letterpress printing Control flow Device driver Limit (category theory) Value-added network Number Latent heat Read-only memory Ring (mathematics) Computer hardware Queue (abstract data type) Configuration space Key (cryptography) Information Interface (computing) Memory management Plastikkarte Computer network Mortality rate Device driver Limit (category theory) Resource allocation Logic Personal digital assistant Network socket Table (information)
Implementation Code Multiplication sign Maxima and minima Control flow Device driver Set (mathematics) Parameter (computer programming) Stack (abstract data type) Field (computer science) Number 2 (number) Bit rate Different (Kate Ryan album) Band matrix Kinematics Network socket Software testing Decision tree learning Twin prime Weight Interface (computing) Plastikkarte Überlastkontrolle Device driver Limit (category theory) Cartesian coordinate system Connected space Number Computer configuration Integrated development environment Octave Window
Oracle
so we start to talk about pocket pacing or rate limit participe or UDP flow I'm
pretty new here so I'll start by introducing myself my name is Odette Shannon I'm from Israel I work with Mellanox technologies my title is software manager only three years to FreeBSD consider myself a rookie I am a rookie got my first degree was Tel Aviv University in computer science I was also an officer in the Israeli Air Force like most Israelis we have a long affair with the army and I love soccer the earth championship just started I think the first game is in one hour and I'm missing it France against Romania it soccer not football no no Sophie from Germany I guess okay so that's the agenda short intro I'm not sure if everyone knows but packet pacing is about then I'll start the overview more details on the main flow and we will look at the kernel implementation I want to actually look at the codes it we put on the fabricator it's only a base for changes and important discussion was yesterday and I magically missed it and now enough dip is not here so it's kind of funny but because it's the OL idea that we need to have a general solution that can fit all I have a slide about the design principles that led to this current implementation although that should change and at the end just a few pointers on the Mellanox driver highlights again it helps explain stuff and then open it for comments and I think that might be the most important part of this talk because my basic intention was that we can we start to move the ball forward we put that our implementation fabricator or I think more than a year ago we kind of paused it and I think it's the time to go on and the comments that I collected are feasible I think it's it shouldn't be that hard to have at last an implementation of this packet pacing on
FreeBSD so short introduction we're talking about rate limiting TCP UDP socket based connections our implementations many regard to disappear but general enough to be UDP and even anything else the characteristics is that we want to control the maximum bandwidth we want to be able to set different rates to different flows we want smooth uneven distribution your idea is that can have minimal bursts and to the network and thus avoid congestion in the network and thus prevent the TCP window resizing so the general goal in mind I'm coming from the vendor perspective of Mellanox and yes just shared the same goal that we want to offload it and reduce the CPU overhead when we have the sort of the software solution any questions up to here so excuse my drawing skills but I
hope that once I start explain it it'll become very clear so we have the user space at up we have the kernel and we have the hardware that's that's a card so we start from the application the application creates a socket and let's say that it started transmitting traffic through that let's say TCP socket and now the application wants to start and rightly meet the traffic so using a sets of opt with the value of right limit it's a 32-bit value unsigned integer the application set this value to the socket and using a field that we now added to the circuit it now he'll hold the rate that's it basically from the socket now in the IP output so it's actually doing the fast the fast path I'm describing how it all starts once the application is using the set up to saturate so we changed the fields at the socket of the rate in the socket which was a 0 up till now and now it hold the the requested rate let's say 3 megabit per second and now we're moving on to the fast path meaning that until we're transmitting any traffic then we we don't actually even start start this flow so inside the IP output we have an if for every packet which detects when the rate changes once the rate changes and it I act all the driver I think it was discussed yesterday that maybe actor I octal won't be the way to go but I think that the basic will remain my octal presents the API that we want for this in this case we want an API for creating a new rate limits Q so the driver received the octal and immediately returns texturing ID think you think you're honestly and this ring ID is now being inserted into the Ambo and then on to the driver what the driver did when he got the I octal is like I said return immediately the texturing ID and afterwards created a thread so we can a synchronous Li create the requested tix ring with the requested rate limit so the first time of as we said contains the the ring ID that he got let's say I don't know ring 17 he got into the driver the driver sees that this M buff is the rate limit I'm buff that's the kind of traffic coming from a weight limit socket so the driver wants to divert the packet to the during ID q it's 17 so two Q number 17 so the first one and XM buff afterwards until this operation is complete we still don't have the Q which directly with Q I was created so the driver diverts it to the standard tix rings that the driver have and so the first exam offs will not be rate limited which is basically okay because it takes time to ramp up connection and this operation usually doesn't take so long once this operation and then we have the Q ready then all gamers coming with the string idea diverted to this Q and the hardware does the TX rate limitation yes next time I I can I have I see I have if to see if the rate change if you didn't change but there is an if that we added will you'll see the code in a second but the reason if we are adding to the IP output that needs to know if this is a rate limit flow if this is the rate limit flow it checks if the rate change or the interface pointer changed and if it didn't just take the rate limit the ringer ID which it already stores it's been in the NPC be on this implementation in the in peace what do you mean but goes back we'll see it in the code in a second but yes you're right I put aside some detail just to get the the first flow and my the next slide will be just going to the code I'll go through it pretty briefly because a lot of it will change but the basic concept I think will remain so that's the price we're paying is adding this ifs I think it's two or three ifs in the IP output about the interface pointer so that's also a thing a big problem with this implementation or with with packet pacing but we want to do offload is that we want to connect hardware resources to the stack above and once we have routing change and the house resources are directly connected to the interface then in order to be able to deal with it we need an interface pointer so like I said on the in the Fastpass we are trying to query every time if the interface pointer has changed if it changed then we will detach the ring ID from the old interface and attach a new ring ID to the new interface and that will be done for every flaw that is seeing this that the interface pointer has changed so that that's how we gained our flexibility regarding the interface I will talk a little bit more about the interface and the slides after that the code that we're going to see any more questions before I remove this Mona Lisa okay so code it is just a copy paste from the fabricator is it okay now can
you read it okay so the first is all day I have config stuff because we added rate limit capability to the interface I think that I just read the comment that Navdeep posted with the discussion you did on transportation yesterday so if we implement it as a function table in the interface we might not need the capability but again we need a way to query if we support this feature or not that that's by the bottom line it's not I don't care how but I think that was the basic way to do it options for the kernel to be built with or without rate limit we've left everything in the kernel so we we thought that it might be smart not to have the extra ifs if you don't want the the rate limit option Robert Watson was against it in his comments he was he said that he wants to remove Davis okay so now socket we're in the UI pcs okay dot see what we see here that we added a socket option and if we get this Akkad option to set a rate we get it a 32-bit value by the way that we did that with the same name exactly as the socket options that Linux already have to have the same naming once we get it we store it inside the socket so we headed a field called a so mark spacing rate to the socket and that's it we have the same forget zakat we added get sock opt I might go into why we need get soak up but you can guess for ourselves that we have a socket and one of the features of the socket I have would be rate limit and the user Ord mean might want to query it it's also a way to know if we actually did the sets up correctly because after some time I'll get to that later if the the ring is not created then we with zero this field and that's another way to know if the ring the harder ring was actually got created interface that the interface capability and here that's the structure of the request for ticks rate limit the basic thing that we need is the rate and ring ID in order to communicate regarding our requests and of course this structure can change as long at that we have this it's very minimal to our implementation but my change and interface that what we see here is that we remove the capability to act out the driver from user space so it will be only from inside the kernel code and now to the tricky parts in PCB up for our implementation it's a per flow I'll discuss later why we went for this approach some reallocation but this implementation should support each approach because I know the Chelsea has a different of pre allocating less rings like he suggested and having multiple flow per Q and it shouldn't matter I get to the numbers later you mean how much do we support queues you can open millions of keys on our cards but regarding rates so on our new card we will support about 500 different rates but you can attach to them again about a million queues but we will limit it to a number which will be around a hundred thousand maybe more that we think that with this number we can still achieve line rate of 100 gigabits per second so but again I have a slide for the basic numbers so we were the part of the NPC B which is a bit critical we added three fields to the NPC B it's the if' net pointer which of course will be discussed in length after my session until we decide what to do with it but that's something that currently we think that we need it and of course the rate and the ring ID in order to put it to the above and communicate with the driver the request to modify a rate or detach and here we see in the in PCP we see the three functionalities which are now the basically the API for this feature which is modified rate which we already have attached a new rate and attach rate and and I saw that regarding the discussion yesterday that it might be as a function table in the interface but again it doesn't matter where it is just that these are this is the basic functionalities that we now need it might of course expand later and here we have get right lock function that we saw that we need in order to deal with a socket from the in PCB and the reason is that we need it for that we don't know inside the TCP or UDP if we have a lock or not in the case of TCP we have a right lock in case of UDP we don't have a right lock and we don't have any way to know that that's why we added this function with an exclusive which means if we are the only one to have the right lock I won't go into the logic it's complicated enough that I will mix it up right now but it's it's the logic is is clear once you know what once you read it but I think Robert Watson wrote and one of the comments that he prefers adding another parameter to other IP output ECP output functions so you think that it's smart to declare whether we have a lock or not all the way down and not to have this complicated logic of course that would would simplify but again it's another sensitive addition to the network stack that's a problem with this feature because it deals with the very hard core of the our our fast paths in the network stack so I think maybe that's a phase 2 thing to discuss in in PCB dot C this is regarding I think the initialization of the NPC being just one and here I will go briefly over the three functions that we have because it's pretty it's pretty low it's pretty logical so we have the modify we have the new rate as a parameter that we want to modify so we take the right look we assert if the if the I F net pointer is is not and then this is the request that we saw before like that I mentioned that old only the rate and ring I did that the only structure that I need in order to discuss this feature with the driver so I said the new rate that I want for the driver and I said the ring ID which I already know that this is the ring ID because we are in a modify function and then i iack tell the interface which is stored in the in PCB i octal it with the actor that we added to modify rate and if all is ok then I changed the rate that he stored in the in PCB to be now the new rate and all is good the touch function that's when we have a new rate so I have a pointer to I F net and the rate that I want for it of course the in PCB same here I need the right lock and here it's the opposite I assert if the interface pointer is not null because it's it's it's a new rate it doesn't suppose to be to be exist so again we have the structure request now we need only the the rate because the ring ID still doesn't exist that while I octoling for where I octoling the interface that we got with the request
sorry first of all we take a reference on the interface that's an important thing here we hold a reference to the interface and that's how we beat all the races we can have between the driver which can be unloaded or any second or the socket can be dropped flow as long as we are holding the reference then we make sure that such a race L will not occur in any panic that's why if there is an error occurs then we release it I think that's a bug that Drury found when you reviewed it and then in the in PCB or this mean that the octa diopter succeeded so we stored the interface pointer the new rate and the rig ID because it was attached now we have this three fields which before that were all zero and they detach as of course a similar logic nothing important just that once we're done we need to bring everything to default inside the in PCB and then we are off telling it here there is an important comment we we heard it in the in the detachment so what he said what happened if the the module was removed so like I said as long as we are all doing a reference to their interface pointer then the flow in the kernel protects us though I have dead is called so all the callbacks are replaced with stubs it's something nice that we found things that doing the photo yesterday it was mentioned as sometimes it's nice to see a code that is already there so that was a very rare instant that it happened we tried to make a race to see the panic and we couldn't and then we found out that all the functions are replaced we stopped so there is no problem to iocked on the driver because it's being replaced with stuff and then we release and the detach part is done they get right lock like I said won't go into the logic but if it's a read lock we need to aware about that the basic is for it to make sure that we are holding the right lock on the socket before making changes another critical part is the IP output up till now everything ok mind breathing through the code is it fast is it boring ok it's just important I think it's not long it's pretty much I think at the end of it will be after IP output but it's important to grasp all the tricky points for this implementation hold I'll try to discuss it a little bit later and I hope that in a very short discussion during the next few weeks we will resolve everything and we'll get to a one final solution so IP output so we take the the the rate we take it from the socket that's the place when we find out once it's changed from zero to anything else and then we have and if which this if basically says of course checking for nas but this is checking if this is a rate limit flow or not so as long as it this is a regular socket not a rate limiting socket we will not go into this if if it is the rate limited socket we're going in that's what it means you see the capability and if we have ring ID or a rate it means that it's a rate limit socket so once we're in the flow of a rate limited socket so like like you see it's two ifs for traffic that is not really limited that now are added to IP output if we want to have this feature not if depth in the kernel so now that we are inside of this if if you remember my drawing from before we have two questions we asked if the interface if the if' net pointer has changed or if the rate has changed if the red chains are starting from from this because it's easy then we calling the modify function in the in PCB if there is an error then it's our responsibility now to zero the field in the socket because the socket immediately changes the field in the socket and the socket of course is not dealing with any I octal that's all happens once we start transmitting packets and going back to the question if the interface is change so if it's the first time that we want to have a rate limit socket and the interface change and it was personal and now from this means that the one in in the in PCB was null and the new and this is the new interface so we take a right lock and here we're checking if the interface that we had is not null meaning that this is not a new right limited socket then we need to detach the ring ID from the old interface because their routing is now changed and we need in any case now we are doing attach for the new interface pointer with the rate of course if error occurred we're zeroing the mark spacing rate and if all is ok then it's time to transmit the Amba and another tricky part is the am buff so here what we did is we abused flow ID field because we didn't want to add any field to the above there was some discussion going on before that also orbit Watson and even if deep said that in some case we might want to add the field to the above but things that yesterday might fire they might found a field where we can abuse a field in EM buffed which is used only on the receive path so we won't need to change that as well but basically the idea here is that we want the ring ID in our case inserted into an ember I think in Joseph's case they need some kind of a cookie to tell where to steer the flow to your question you're gonna if it's a multi flows going into a queue so it's a different kind of cookie but it's a stamp that anyone should I think used to his own implementation I don't think there is any contradiction here so we added the hash type just to differentiate the flow ID field which is basically supposed to be used at for RSS and here we've used it for something else because we didn't saw any use and ticks I think I explained enough about it because it will probably change right were we doing it but the idea stayed that we have the ring ready now in the above going into the driver so the driver can immediately they read it to the right immediate packet and thus we because we're doing it in that way we don't need to add so much logic and holding all kind of hash tables inside the driver and we allowed the performance to be maximum which is the whole idea of this packet pacing because we want it to be offload to the CP ip6 same thing and here of course also deep limitation should be modular into packets because I think it's almost the same logic the EM buff that's the edge type that I've told you that we added the static option the field inside the socket and the three I octal with which are the api's which might change instead of I octal and be inside the function table of an interface but that's the idea moving on any questions regarding the code yes we didn't measure it it wasn't on the table so no but and regarding latency that just logically I mean it's two ifs so it shouldn't add latency as I said it will take some time before the packet to start rate limited so maybe once it transmitted my damages yes yeah that big rejection on that means that the we it's the next slide basically or one afterwards but basically the the logic
here that we wanna we have to connect
the the upper parts of the network such as sockets to a hardware resource our first implementation was directly connected them so the socket will know of a hard-wearing ID and and we agreed that it's wrong because it's too much knowledge for the circuit on the little details and again if you start to think about stuff like corrupting change then we need to enter the logic into the sockets to handle and manage all the routing chain that's where we started having problem and that's why we thought that we need to know about that the interface and I'll talk about villain a lag in a second just to mention that it as a comment because it's something that it might expand to and why this design answers it or as a principle
okay it would no no no it would it's it depends on implementation again our implementation does one perk you but even in Chelsea your case where you have multi connection like you want perk you so it depends how you do the Q if I believe what they have its accuse it no straight limit no matter how many connections it has or might be as the limitation but you have ten connection and it limits it to 10 megabit that every connection will be limited to 10 megabits it's not like that you need to calculate it somehow and what do you want with this connection you want each connection to be limited to 10 megabit or are you saying I don't care I want all of them to be bundled all together to 10 megabits and inside it one can be 0 and 1 can be 10 I don't care so in our case we didn't go into such a scenario so it's hard for me to answer more than that yeah cookie for this type of I mean it's it's doable but that I think yeah this design is the changes I know maybe we need to think about it right now and we can have all of it together but that will be the simple case to this case I think it shouldn't be hard to solve it I mean I know deeply group again I know this scenario I think it's easily doable like drew commented if we just and maybe add the parameter or something like that to to be able to do grouping think that this ring ID is now in a grouping State and everyone to Ted it was group just uses the same tagging of ring IDs and the drive automatically directs all of them to the same queue so it should work don't you yeah again this it it depends on underneath but it's adding another logical layer of grouping to the flows believe it so logically easy I think it can be added and inspiratory sends all this implementation is is new we did some kind of implementation with our previous card we still didn't do it with our new card that's why we want to revive all of this discussion and take these comments and it might be a good point to influence it though what you're saying I think it won't be so hard to do a phase two of it but that's the state that you're thinking that's why I'm saying this will change and that will change that's basically Plymouth patients that we put on fabricator and I hope that now we we can move forward again because I don't think that their rejections are are so hard that just a summary of the sensitive part that we changed do you need me to go over it pretty quick or because I think you kind of got the idea
so the principle of this design is one which we talked about is the fact that we have we got to have a logical connection with in the socket the higher levels and hardware resources because we want to do offload and that's basically I think a new thing to freebsd that's why we're having such a long discussion for it the glory started with more than two years ago three please go I don't know so but I think now we're there because we think I think it's not a trend it's when we want to off road everything so I think because in Mellanox performance is our game and everything we we do I think with almost all of our customers is talking about off road we have big project now we have the open V switch if you're familiar with the virtual switch then everything everybody talks now how to off-road everything in unto the card and all the vendors are trying to do it so I think it all points out that it's probably not a weird feature that we need to do some somewhere somewhere else but that's the we need to find a decent solution that doesn't break all the freebsd principles but we do need to do some sensitive stuff in the first pass in yes but then yes that was the idea we did we wanted the general solution that what our architect did we have both but yes yes yes and I think we will change most of what we just saw in the NPC B because the API that will move probably into the interface function table and the structural fields again might move so we need to be on it yes yes yes so currently not yet but currently the responsibility with discipline is still on the application to sense all the congestion and deal with it if it's more flexible exactly I think or let the TCP do it I think if it evolves into having a strong TCP engine and let the TCP with its congestion window control the different rates and you have your mechanism built in the TCP I think that's yes yes that I know what this interface module RT and another principle here we want like I did talk about it before that I want to simplify the dissolution and not to have a lot of logic in the network that's why I need the drive net pointer we gave the simple example of a routing change so how can we handle that and also if we have villain and nag it it's also it connects pretty easily so I hope we're seeing but it's supposed to be simple if we have the socket and we said that we have the our interface pointer that we added this interface pointer can be also villain so if we are I am Telling the this we use the octal or interface function table it doesn't matter we're using it on the interface the interface it doesn't have to be straight up the the driver interface can be a villian interface and then all we need to do is just add the same logic into the villain which is a simple logic just to go ahead and remove this dioctyl forward to the actual driver interface if it's if it's a villain this is the face can also be a leg and if it's a lack then we need to we already have some architecture that we saw that it is
possible to deal in a case of land then we need some logic in now because the leg can holds two driver interfaces and and and then when we must have some logic in the in the lag but again it's it's transparent to the circuit so the socket iocked owes the interface benefit and the interface can be anything but then the lag needs to handle it with some tables that tells if the resources are here or here but somebody needs to know it but but it's solvable so that's why we we think that that's a way to go another important principle and this implementation we saw that it's dynamic because our end in mind is to post support more than 100,000 connections then we want to avoid pre allocating all the connections and the overhead and buffers and completion queues or whatever that is associated with every connection and let's see we have the not sure if we keep dragging the DRB ah we might lose it we have a buffer on the connected three implementation or previous card we started with our standard key was 1,024 building blocks each building blocks in 64 bytes so it's 64 kilobytes and this queues we started with 128 by playing with it we found out that 500 is the better number so I think that's what we went for at the end not sure if we will change it now so it's something around the 32 kilobytes of the transmit buffer yes yes so the memory stamp is huge that's why we don't want to start a driver with 100,000 times 30 mm to watch and we talked about accuracy and flexibility and during the discussion we saw that having this dynamic allocation gives you more flexibility for a lot of stuff so that's why we need specific flow information and I think just you need specific flow information and they have something that is pre allocated so basically there is no way to not do it just the discussion is to where put this information and what the specifics of this information but I think the design is starting to formalize the so our driver highlights so we talked about capabilities for our driver it sticks ring per per rate limited TCP flow we talked about we can support millions of queues and it works we tested it with million queues it doesn't matter it just a lot of overhead to handle so many resources so that's why we got to the number around 100,000 we also think to me we we have an implementation on our all the cards that we did and we think about we just need to further investigating decide which interface but on the sis controls we already have ways to manage the active rate limit values and our implementation we'll see in a second but you need to define ahead like the administrator all the different rates that he would like to use ahead so before doing the static opt you already need to have the definition for the rate limit so the driver the hardware of the card we'll know to rate limit this value and then couldn't associate as many rings as you want to this rate limit value we have also the in the sis control a way to queried all the capabilities a limitation of course no statistics you can see how many cues you have on each rate and for each queue it's like a huge print once you have thousands and that's why we have all kinds of scripts to take it out Excel document could become huge but for each queue you have many packets transmitted and stuff like that again we're on the driver highlights so upon an dioctyl then the idea is not to break because we are have to cross some layers to logically connect the stack and the driver then we're trying we're keeping the fact that the driver will always return synchronously and do all the a synchronously stuff like of the creation and configuration at the end the background and of course on the first pass we're matching the the texturing that we got from the end off and passing it to the queue the third the main changes to the driver to support it some
numbers so we already have implementation on our connecting three card so we have up to 45,000 and we can make it configurable like with a tunable but we limited it just to a number that we think that with the burst that we have will support a line rate and for the kinetics for it should be one or doesn't might be more on kinetic three we have we support 120 different rate limit values per port and the kinetics for it will be around 500 on connecting three we support the rate limit values will be from 250 kilo bits per second up to 50 megabits it will expand on connected foreign the gorilla the granularity willie i think about 2050 250 kilobit for for every weight so you can say it 250 500 750 1 megabyte etc and birth side that's also an important parameter so on the kinetic 3 which is in plain implementation that we already have we found two magic numbers for a low burst and a high burst the Rober see three packets which basically is 4 kilobytes but then it it been translated to packet so it will be usually three packets sometimes two packet and the high burst will be five or six packets every time so in connecting Street to achieve line rate with so many connections we needed the high burst but we can assure that it won't be any more than six packets more comments more
questions the first thing that Lawrence did didn't were even want to use that the ascetic opt interface it was just the easiest way for us to show this feature but immediately started writing code into the TCP stack and set everything from there according to stuff that it gets just interface but yeah the I octaves are there then the mind will be in your right then in mind I think we knew it from the beginning will not end in ascetic up to an application it's nice but it will be much much stronger feature if it can be used by the DCP so we can control its congestion window and you won't see the window go up and then breaks into half or half and half and then you know what do you mean I'm trying to remember with our testing but we tested like thousand of changes per second and it was not a problem I'm trying to think if there is any limitation that it might hurt us because basically when you change it's at least through the set a cop doesn't matter just change the field in the socket and then you start the first path and IPL to start arctos and then it just depends on the driver so changing rate it's easy if now it's a lot of new rate limits or a burst of it so it might take the driver a long time to create everything I know that detaching Mackay at the end closing forty five thousand on the our old cart that took a lot of seconds when you make destroy the driver I don't know I didn't test it again stamina it's hard to test it's like in our environment in the lab we need a lot of receivers to get line rate with so many connections okay anything else Thanks
Loading...
Feedback

Timings

  566 ms - page object

Version

AV-Portal 3.20.2 (36f6df173ce4850b467c9cb7af359cf1cdaed247)
hidden