Measure Twice, Code Once

Video in TIB AV-Portal: Measure Twice, Code Once

Formal Metadata

Measure Twice, Code Once
Network Performance Analysis for FreeBSD
Title of Series
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date

Content Metadata

Subject Area
The networking subsystems of any operating system have grown in complexity as the set of protocols and features supported has grown since the birth of the Internet. Firewalls, Virtual Private Networking, and IPv6 are just a few of the features present in the FreeBSD kernel that were not even envisioned when the original BSD releases were developed over 30 years ago. Advances in networking hardware, with 10Gbps NIC cards being available for only a few hundred dollars, have far outstripped the speeds for which the kernel’s network software was originally written. As with the increasing speed of processors over the last 30 years, systems developers and integrators have always depended on the next generation of hardware to solve the current generation’s performance bottlenecks, often without resorting to any coherent form of measurement. Our paper shows developers and systems integrators at all proficiency levels how to benchmark networking systems, with specific examples drawn from our experiences with the FreeBSD kernel. Common pitfalls are called out and addressed and a set of representative tests are given. A secondary outcome of this work is a simple system for network test coordination, Conductor, which is also described.
Code System programming Speech synthesis Line (geometry) Table (information) Tunis
Statistical hypothesis testing Slide rule Statistics Open source Parameter (computer programming) Number Workload Benchmark Internetworking Theory of everything Computer-assisted translation Electric generator Cellular automaton Electronic mailing list Code Instance (computer science) Benchmark Digital photography Workload Repository (publishing) Right angle Object (grammar) Cycle (graph theory) Resultant Operating system
Statistical hypothesis testing Context awareness Source code 1 (number) Client (computing) Mereology Wiki Optical disc drive Mathematics Benchmark Bit rate Single-precision floating-point format File system Cuboid Distributed computing Control theory Theory of everything Extension (kinesiology) God Software developer Open source Benchmark Statistical hypothesis testing Radical (chemistry) Type theory System programming Control theory Right angle Quicksort Resultant Directed graph Point (geometry) Unitäre Gruppe Slide rule Server (computing) Open source Virtual machine Hidden Markov model Event horizon Statistical hypothesis testing Number Computer hardware System programming Medizinische Informatik Graph (mathematics) Key (cryptography) Weight Projective plane Code Plastikkarte Computer network Software Personal digital assistant Game theory
Statistical hypothesis testing State observer Computer program Java applet Multiplication sign Duality (mathematics) Semiconductor memory Befehlsprozessor Analogy Core dump Monster group Multiplication Electric generator Structural load Sound effect Bit Benchmark Arithmetic mean Befehlsprozessor Process (computing) Ring (mathematics) Right angle Queue (abstract data type) Quicksort Cycle (graph theory) Resultant Row (database) Functional (mathematics) Virtual machine Coprocessor Product (business) Power (physics) Number Workload Goodness of fit Cache (computing) Computer hardware System programming Code Plastikkarte Core dump Line (geometry) Coprocessor Single-precision floating-point format Cache (computing) Software Network socket Computer hardware Multi-core processor
Statistical hypothesis testing Link (knot theory) Open source Computer file Structural load Diagonal State of matter Logarithm Set (mathematics) Online help Client (computing) Statistical hypothesis testing Number Phase transition Single-precision floating-point format Set (mathematics) System programming Distributed computing Damping Theory of everything Social class Control flow graph Source code Computer file Moment (mathematics) State of matter Code Coordinate system Client (computing) Bit Device driver Statistical hypothesis testing Statistical hypothesis testing Radical (chemistry) Personal digital assistant Function (mathematics) Phase transition System programming Right angle Quicksort Window Resultant Library (computing)
Point (geometry) Statistical hypothesis testing Mapping Computer file Graph (mathematics) Patch (Unix) Source code Artificial neural network Virtual machine Device driver Polarization (waves) Statistical hypothesis testing Formal language Number Revision control Bit rate Gastropod shell System programming Energy level Task (computing) Boss Corporation Mapping Interface (computing) Software developer Projective plane Memory card Mathematical analysis Code Plastikkarte Line (geometry) Benchmark Statistical hypothesis testing Type theory Word Kernel (computing) Software Configuration space Video game Right angle Pressure Resultant
Statistical hypothesis testing Source code Matching (graph theory) Source code Code Bit Travelling salesman problem Mereology 2 (number) Coefficient of determination Term (mathematics) Right angle Quicksort Game theory Directed graph Directed graph
Statistical hypothesis testing Algorithm Real number Multiplication sign 1 (number) Set (mathematics) Online help Mereology Coprocessor Statistical hypothesis testing Befehlsprozessor Encryption System programming Software framework Algorithm Weight IPSec Code Plastikkarte Maxima and minima Exergie Degree (graph theory) Process (computing) Software Personal digital assistant Encryption Quicksort Figurate number Operating system
Statistical hypothesis testing Rounding Standard deviation Open source Link (knot theory) Authentication Virtual machine Maxima and minima Median Water vapor Plastikkarte Average Statistical hypothesis testing 2 (number) Roundness (object) Encryption Software framework Theory of everything God Authentication Graphics processing unit Code Plastikkarte Statistical hypothesis testing Software Configuration space Encryption Resultant
Authentication Standard deviation Trail Asynchronous Transfer Mode Multiplication sign Authentication Median Maxima and minima Code Sound effect Bit Maxima and minima Basis <Mathematik> Number Band matrix Advanced Encryption Standard Computer hardware Encryption Energy level Right angle Encryption Asynchronous Transfer Mode
Statistical hypothesis testing NP-hard Standard deviation Slide rule Computer file Algorithm Code Real number Maxima and minima Median Set (mathematics) Average Rule of inference Statistical hypothesis testing Advanced Encryption Standard Operator (mathematics) Computer hardware System programming Software framework Pairwise comparison Mathematical optimization Rule of inference NP-hard Software engineering Firewall (computing) Moment (mathematics) Mathematical analysis Code Bit Limit (category theory) Open set Disk read-and-write head Compiler Order (biology) Right angle Multi-core processor Resultant
Slide rule Dataflow Link (knot theory) Variety (linguistics) Multiplication sign 1 (number) Virtual machine Rule of inference Number Wave packet Causality Single-precision floating-point format Core dump Control theory Software framework Multiplication Standard deviation Moment (mathematics) Code Core dump Variable (mathematics) Benchmark Open set Single-precision floating-point format Arithmetic mean Word Radius Software Personal digital assistant Revision control Website Right angle Quicksort Table (information) Multi-core processor Freeware Electric current Probability density function
Open source State of matter Multiplication sign Firewall (computing) Maxima and minima Primitive (album) Infinity Event horizon Revision control Robotics Operator (mathematics) System programming Error message Raw image format Arm Theory of relativity Block (periodic table) Gender Model theory Code Planning Core dump Usability Line (geometry) Benchmark Word Right angle Table (information) Multi-core processor
Statistical hypothesis testing Presentation of a group Scripting language Code Model theory Multiplication sign Covering space Source code Sheaf (mathematics) Boom (sailing) Set (mathematics) Disk read-and-write head Mereology Computer Theory of everything Extension (kinesiology) Scripting language Enterprise architecture Fourier series Bit Variable (mathematics) Benchmark Statistical hypothesis testing Data mining Process (computing) Software framework Normal (geometry) Right angle Quicksort Simulation Resultant Point (geometry) Slide rule Observational study Virtual machine Rule of inference Statistical hypothesis testing Number Revision control Performance appraisal Population density Term (mathematics) Internetworking Operator (mathematics) System programming Mathematical optimization Multiplication sign Pairwise comparison Default (computer science) Weight Projective plane Mathematical analysis Code Variance Database Line (geometry) Word Kernel (computing) Loop (music) Personal digital assistant Experimentelle Versuchsforschung
Point (geometry) Statistical hypothesis testing Slide rule Open source Ferry Corsten Multiplication sign Set (mathematics) Disk read-and-write head Information privacy Mereology Dedekind cut Term (mathematics) Operator (mathematics) Energy level Social class Control flow graph Area Metropolitan area network Source code Standard deviation Graph (mathematics) Military base Memory card Projective plane Code Core dump Client (computing) Computer network Bit Open set Statistical hypothesis testing Statistical hypothesis testing Single-precision floating-point format Arithmetic mean Process (computing) Software Network socket Network topology Revision control Control theory Right angle Tunis
system is that while they were coming with
his speech that so whenever you in a straight line Austin because then I can do what I normally do is what back-and-forth while taught and so welcome to measure twice could run once for the you've ever done carpentry measure twice cut once and I would walk more morbid dance but tables in a way to prevent me you from getting back and forth across the stage and this is work that i've been doing that over the last actually useful in this little under a year which in Thompson from that date that trying to look at the various things in our performance in previously
and of benchmarks are hard turns out to was a marketing department then they're really easy because it is made of numbers you put them on a slide I do not work for marketing so I've mostly did not make might make up my numbers but I did you can call me on because I've also my numbers and to get have repository of why the heart wall there's all these questions we have to ask and we have to answer them correctly to get benchmark like what are we trying to measure how are we going to measure it had we verify our measurements people often get through the 1st 2 questions pretty well sometimes the dual and of the 1 on because the article better measurement why would I have to run that you nothing more than once union by statistical significance and can measurement be repeated surround the
argument the repeated right that's really important in many people have made measurements of for instance cold fusion but no 1 else can you know do that's a measurement and notably for 2 with a brain the will and can we replicate somewhere else and is measurement relevant this is a question that many people do not ask themselves like I've measured this thing and it does glottal cycle that's great and identify presented at that's great but we didn't care about that so it has to be relevant to what you're doing and how we generate workloads right so you know there's all kinds of ways of generating workload Synthetic non-synthetic you know for those of us who do a lot of networking stuff which is what I mostly do talk is about the it's really hard to simulate the internet I mean you can get a lot of cats in a room and photographer but other than that it's difficult to simulate internet and so have had a generator workload that's going to be representative when you puts notice you're usually doing this to you wanna put something out as a product or as an open-source operating system it's going to be used by someone other than you but you were generation and how you generate that also becomes important and the people by the way in the world who sell you fabulously expensive objects with which to generate workloads and and if you don't always trying to generate you'll spend a lot of money and not a very good result and even if you do this this and a lot of money But and his lesson really like and so most people know what a Heisenberg is right so I looked for the but the cat is alive by don't look for the about the cat is dead and intestinal cell highs and testing right so we set up a measurement were running use of workload we've got some this detecting what I with what looks like but that detection suffer itself may actually disturb the thing you're trying to measure and if you're doing that you are going to pull out all of your hair that's a joke in again and so here is a long list of reasons why benchmarks
are now starts hitting the right but some points so network benchmarks are harder but for a smaller happily number of reasons that asynchrony the key reason right so if I'm trying to test something on a local system and all the hardware is working properly which mostly because then I can run that test repeatedly without worrying too much about asynchronous events interrupting anybody in networking a lot of stuff is asynchronous and so we have to worry about that asynchronous those that were about lost because of most now working as best-effort delivery why love working American because it is Kirk says his curious another type of her types of file system so say on give curl someone's data they'll never trust you again a possible but it turns out and networking I can drop the packets game nite and he'll keep giving me the like 50 % packets in a way I don't take more tickle tickle so best-effort delivery actually makes it difficult to to come up with good no-regret parts in our test because you've you know you and I guarantee they're all get there and so there's another thing to counter the for about and if you do silly things like you know your your uh request weight rate of a hybrid you'll discover just have best-effort best-effort and their lack of open source test tools I do a lot of open source like God I'm worried open-source clothing all week and so there's like a really good open-source sold many people for some reason I know I've done this to you like it's it's network thing on a bill like a client and server the really simple and context and everyone gets that far clean and then they put it on while the eastern part it somewhere else doesn't you have so there's no 12 of these things many which no one's actually ever verified if they were correct so my favorite 1 of this was the early versions of net per which was 1 of his clients were testers and HMM have petitioned commodity Losada's district actually is like you realize that those results like always off by 30 % 1 where like well know he's like yeah here's some math and what was the gossiping that and so there's like is of interest hassles and you know we all know that open sources of uneven quality right so you have to pick the right tools and and this problem of distributed troll so I am entitled more that and in some upcoming slides you know if you happen to have a test lab and you haven't had some millions and think we call them the right hand but you know and you can get them to move things around and all that kind of stuff and you can say well you know wanted this today in any of these things around what is this today or you by very expensive box it's like guiding light which switches and I'd have 1 of the but you got control you distribute systems the test again in a single system test case you can sit in front of the computer whatever that computers are you could log into a terminal or whatever and you can run the thing you have to tell this person to talk to this person will this person listens in this person watches in that kind of 2 patrol distribute systems makes networking benchmarks more difficult to control and here's
a typical lad this happens to be a lab hosted syntax if like tanks this year somewhere and destroying his last name I really wish he raises hand is like is the and co-owner of syntax to have hosted the freebies the projects nor test lab apart from start-up Tesla here in Canada for the last for many years and so lot of this is wired up by him and another guy small holes who are amazing remote hands and you'll discover how amazing remote hands on you don't have them in graph with 2 guys actually know what you're doing so this is a typical test set up for packet forwarding testing which are initial but that that data source northern names where you could find on wiki page of the document on the external wiki for produced the sources sinks of extended Codd's from this company Chelsea 0 developers there and that is very helpful and we control network or using the intelligent 1 gets just talk events anymore and we got this arrested techniques which are 1 of the top so we can do we need to go through the device under test the CDG a lot more we can go you know over this network and with the device under test and see what we get so this is a typical unitary system lab set up the people who work for large companies that do a lot of networking have really awesome labs and like it's a lot money sort of I don't have a couple hands the excellent and some really find out about foundations per like twenty year 20 20 some odd machines of high-performance networking stuff into the center and all other cards have been generated by the vendors so now whenever a vendor comes to me and says we got a new next I say give me to right because it turns out that in our testing you really want to at least and history burned out a couple of cards I policies are asking for 3 but usually get to this case we've got more or yes suspect by 2 4 8 16 sometime after so this is a typical set up just for 1 test this is not a whole lot of the test run
so here's another thing that's important benchmarks which again people without what did you benchmark with right what was the hardware you benchmark on the and really specific because it turns out if you've been any of the discussions the last couple days about sort of Newman multi-core review see any of this stuff you realize from generation to generation it really matters you know this model number actually matters because that's how all the harbor so in all other tests I've done and the present-day we got a source and sink these dual-socket TenCore 220 gigahertz I think they actually heat the room pretty well as young monsters and then we put you know a 4 a fast machine in the middle reasoning Chelsea 35 5 twenties still working and interest 71 24 and the reason that this appears and just to show while you got really cool hot it's more while he was replicated as you correctly this exactly with this you have to pick up the hard or you can we figure out maybe analogous harbor teacher analogous results were analogous to that and a problem so let's talk a little bit
take a little side trip into coffee 1 can only imagine the recording of his life and and talk a bit about modern so 1 of the reasons we set up syntax and we got people don't harbor get foundation for observers and power and cooling is you know take the hardware is still somewhat expensive and I like to say that it's not gotten and cheap enough to be within the means of even the smallest nuclear power so we're running tending however you get some numbers to deal right so attending if 14 in 14 . 8 million 64 byte packets per 2nd year about 200 cycles of 300 years to do it this is very interesting problem because I don't know about you but was last time you found when your functions took less than 2 100 nanoseconds returned cycles are sixties and I know the existing from the we come across as we get to these new machines but the cost of a cache miss is way more expensive than anything else it's going to go wrong in the system right so we used to be taught to program and many many of us were taught to broaden such that you that optimized for CPU cycles CPU cycles of but not quite free I mean I know java programmers think a with memory but the cache misses it would cost me if you blow out a catch you or network performance will suffer and I can show you that happening and other things matter on modern hardware multi-core uh so you know used to be you had 1 processor with 1 core and that was not so bad and you had 1 process were to cause for and then you can buy an 18 for you know processors from Intel which has a complicated little ring network and sort of terrifying and multi-core matters you need to think about what's going on a multi-core machine because where you put your workload is going to affect your benchmark results and if you don't look out for that you're going to you're you're going to confuse yourself out multitude so all of the network cards we get intended network to do 10 days is you most you often will use multiple cues and sometimes you can do what you would really catch so they're like what go silicon good will spread the load so how you finding things that last line between memory you're choosing your course has a very profound effect on the results you can see from benchmark to these things to keep in mind we're going for so I mentioned the
problem distribute systems will I love about open source told out this is you know all of us have a scratch scratch the itch initiative scratch online was reduced in this work with this which is this coordination problem I don't have to have you know I often do you know 15 terminals open where I have a bunch of command lines ready and I had to go return literature from fast that's just ridiculous right but I've done that before so I wrote single conductor notice that Python libraries pure Python which some of how that's great and like it's not that hard anyway and so you a conductor and for those who know me in the audience not train conductor and this is you know this is the conductor right so a conductor and 1 or more players so you can have as many players as you like and they all talk to each other in the conductor is the 1 that sets the 2 right you're going to do this and in this in this and and the test system has 4 phases start up around collecting reset right in the setting is kind of important this is another thing people often forget when they're running multiple tests which is what set up the test but now got bunch to state that crude because I ran the test and then collected the results and now that state influences the next class 0 by the way she test more than once just in case you didn't and you can 1 have statistically significant numbers instead of just 1 so this is a system we did up in you know bite on open source target of all links up later that there were 2 nice is by doing this 1 it's a different type and return a bunch of windows and the moment you publish something into open source 5 people come conform go you know we had that have I got Nelson diagonal from someone this this note is not here but I have some Melvin some of the conferences some of their work was say 0 we did this for TCP like really where while we were not ready to release the labeling that doesn't help me but but it does give people that I like this is gets people to sort of release these internal things like nobody would ever want there's none of the books and really do and so produce
little bit digression through a conductor here's conveying this is where the conductors like where my clients further can files uh that is 1 that is the trials uh but this is testing so testing the test and his clear
configure a is the words in there and you'll notice that this is a ridiculously horrible version of just a bunch of shell commands so I should probably a top shell scripting language but I got a little too far into the line of reciprocal steps here and is primitive but nobody else had 1 until I published this and similarly for people like people like I I would that I pretty simple you know where we where we find our conductor refine the master that what we do is start up you know why do we do obtain we started test so good 1 what's another reason yes are catch you don't wanna be testing whether or not you live your cash least not in this particular and don't run thing collecting a bunch this is on 1 of the devices and a test so 1 of the things that you can do once you've got something like this even not only run the test on you know from source to sink but while the machine is running you can start collecting uh performance analysis so this is doing a much of what you for the instructions retired on the system as is trying to forward packets so then you can find hot point in the kernel which I can tell you with and there so and then we collected so this is just an example there's a whole bunch of these all another corner although were all you know the polarity and all the work were doing when we're doing the performance of a conductor is open source and get of and there's a project in the parole under to and 3 might get have like because I can get in and then all of the tests and results and configure files for the things I'm going to show you are in something called network that is a little net per project and you can clone and you can see what I've done wrong because I really like people especially that so baselines but many
people when they wanna run a benchmark the 1st thing the test is the thing they know their boss wants right the boss wants to know that any performances 20 per cent faster than you know TCP does this a that whatever it is they're testing for a specific thing but they don't establish a baseline right and establishing a baseline turns out to be really important because then you don't have it otherwise you have nothing to measure against you just give someone a number it's fast rate faster than so in establishing some of the baseline measurements all talk about we used actually type fresh the 3 of them which by the way is now maintained by a former developers that makes my life easier because it can get and put in patches I paraphrase it to based task that seems to gives reasonably statistically significant results as opposed to number and year old that company for them Indian Perth so a is 0 and this thing called map which you may have heard about it if you haven't we talked about at every b conference for the next 5 years we will hear about it also in this book that I worked on some then gives you direct access to the very lowest levels of a network interface cards device driver what that allows you to do is to drive that card pretty much at wine rate without interference from the knowledge that right so if you were doing packet type tests something like for something you really want because you can just plumb packets are outlined rate at 10 and 48 and tested this on 10 40 data an and that's a really good way will use the living crap some recorded but anyway to abuse the living crap out of device under test whereas TCP has so much other things going on and what you really find is it's all good or everything is fine hard it as we stand for reducing pressure everything is fine let's because all the machinery is smooth out all the rough edges or that you want put the rough edges that use them for so
here's a baseline TSP measure and this is just a host-to-host between switch and no foreign going on through the host you notice that I per supporting is like every 2nd for 10 seconds that's enough get really consistent 9 . 4 1 game there is for 2nd so talk is over we can don't now and so the dog is not our we can vote so this is the baseline we just turn things off now it doesn't work against that in terms of what we get host-to-host when we're not doing for a lot of the tests that I've been doing lately look the forwarding path of previously which is something that people have not looked at as much recently because people assume you don't use previously is sort of Aurora and you know directly in the in the back at that but a lot of people that today and the better it gets the better we will be assessed and
so we sometimes when 4 bits per 2nd the I really should put commas in here and see the source to the sink they don't match is that and that's because this is the package and this is rod packet performance right and so somewhere we're losing packets and part of the exercise is to find out where so why we
see this is eventually to speak very quickly receptor up full-size packets which are much easier for everyone process uh including you know the neck and the operating system because you're not dealing with 64 bytes of baggage and users minimum-size packets and advice on a test set quite about but but we hadn't done the baseline would be like well you know if we death the baseline accepted that we were done we would not know very much and we think it's all everything is fine which is not 1 of the interesting things I find about the minimum size packet truck so many in our testing system is is especially expensive ones there and many people who build network cards is selling into your really big on like we can do X minimum-size packets per 2nd that's like that's great but there's only 1 real use of minimum-size packets act now if you happen to be I don't know feeding 40 degrees 40 Gigabit of TCP to people's televisions you probably care about the Accra coming back and but this is not this is sort of the worst case but is also is not always the most interesting will didn't take a look
at that and a little bit in and work but unknown how fast the speed it now and so that was the baseline measurements we did just for Tuesday and for packet forwarding on talk about some more recent work that we've done since so as we did useful work on foreign dispensers clearly the spellchecker catch that so this 2nd it's algorithms so we have a 2nd previously uh we use it and many people use it but you know we know the the active set that IPsec and encryption are computationally expensive and often offloaded coprocessors if you saw Marx talked of before lunch and he saw the work that he's done to bring the AS and I instructions in the previously as a way of accelerating the encryption part so 1 of the things to do once John market government had was to then see well how much does that help or my real question having looked at set which is actually pretty good but must now and the knows what is the weight of the framework right so when you introduce some framework like it's or a T P star or whatever you put some extra softer around things to make it work before you go figuring out how fast and that's screamingly fast new instruction from Intel make things go you got a sort of figure out well what happens when we're not doing anything at all and that is not an etiology not mean no exact timing
uh Union all stuff so here's our measurement methods for this this is a to host um with either transport or tell for a sec depending will doing them and using the same machines sunlight uh we use that Frege duties be testing already upset but transport tunnel between those 2 hosts between links 1 rather for again all the results all configs all at the per idea so to a set up free obviously a conductor because everyone else hasn't released their some open source software and then in a very simple test 10 rounds 10 seconds each to try and make sure I'm not completely aligned just 1 result so what we get for
baseline and remember this is too tended next so using no encryption we actually had no encryption which someone broke going just 1 tests and uh so I put the all encryption back why did I put no encryption back because I don't want anything interfering 1 of the speed of the baseline framework I want to know what is the cost of just turning on and so no authentication encryption and 1 of things and we use it set is you lose a bunch of the things and make 10 carry cards go fast thinking God's go fast not because there's a little nonlinearity and because the water cooled which they will be eventually using the cards with like old-style graphic cards and learning from that like that LimeWire on them and bubbling water and so we turn off TCP segment offloading and uh you know harbor checksumming and uh large receive offload it turns out the cards are not so screamingly fast anymore so this is the result of running all the and the result of running not null but just and you know the tender gods with none of the features is only about uh I think it's like 40 per cent when this story so so this is the baseline we get with it said on a 2 . 2 2 . 4 gigabits per 2nd between 2 hosts that run into the you know all they all nite
and so with that said we got authentication and encryption you can pick 1 or the other or both and you should always pick both just saying but you know so what was the effect of turning on something like track shall 1 which is actually kind of expensive computationally this transport mode is no encryption is hundreds of megabits a 2nd on attended 1 right so we are less than 10 % of the effective bandwidth between the 2 that is not an but rather get too wide today actually to get to the wire can talk to users and and then
1 of the new modes that I John markers that it is this a yes and i stuff and 1 of the out of his takes advantage of it is a STC and his gal countermanded that actually I'm polytheistic Galois thank you that so this is running tunnel mode where you know your complete packet is encapsulated there's a whole mother had on the front everything is hidden from everyone but it and instead going as referred to as the level of encryption and authentication what happens when we go between using O'Hara support which if you thought that authenticating was expensive and crippling super super expensive but really secure and you get with like 5 times the number of bits through once to turn on the harvest of or actually within half the speed of our original based on a regular basis I was 2 . 4 at max 0 you to 1 . 3 at max with the a size support so but this is not the end this is just the beginning this is well what we have what we get and now the next thing we're Oregon and I presented a is what else can we do but 1 of the things that are in there well
leave at the overall picture while rant about that premature optimization so I don't know if you noticed but it's hubris is an issue for software engineers except me and so we all think were smarter than the compiler the hardware everyone's ever looked at anything and and often people look at a very narrow chunk of code and so I can make that faster than light but does it matter is going back to the relevancy question about the 2nd the 1st real slide and in order to know whether things are relevant you have to have this kind of set of measurements 1st right and then you know well again we know the harvest you know hobbyhorse this operation is is much faster we know that it's still not even up to what knowledge which you know OK we know we have to do some work that's good will soon you know all is not at the the speed that perhaps it could be so but we wanna find out what is right and what I showed you that contain file the rise in the test is was starting at PNC with the performance monitoring counter system and I've done a bit of analysis also with the trace and that's the kind of set this can tell us why now that we've got a framework in place and so somebody run the tests and its audience we use you can start digging down a lot and and maybe the wife is you know sometimes you look at the harbor you would which going on you just find out that you have reached the limit of the hard work I do not for a moment believe that this is the limit of hard work and I think that there are limitations in this opera but to find that outcome to the BEST is the con in the fall of the Washington DC and you'll see with the results of the white tests so must exist in this
work we done that earlier uh Jim his team work a great deal with Pf working Piacenza uh so we want to try and figure out what is the you know overall performance of P sensors radius the open these PF which is where Ts near where previously have really comes from but there's been a great deal of work in particular to make multi core on previously and then I this is other operating system which I dare not speak its name so far all rules are given in the paper we presented a dubiously come 2015 I'm not gonna make you re firewall rules on a slide and that mean and so we went
through this and actually generate a bunch of stuff we work through this and whether a bunch of serious right and that's the kind of thing you're designing new benchmark you just pick 1 thing look numbers 5 right you know like this number is 5 1 all the other variables and control in the following way so 1st introduced and single core no filtering with no filtering means is there's no real role we just turn the thing on and you know PN variety tables words or what have you it looks at every packet so that the framework is being touched every time which is definitely over right you touch a packet you looking to buy you 1 dollar to catch you you think things happen and bad things happen to that people so that it goes so this is a packet per 2nd measurement is a very common network transport the thing that instead of measuring just Roberts 2nd site number of packets per 2nd and in this case turned out and single core without filtering and clear sense is faster monotonously with faster and previously the which is faster than sent off her radius these all 3 ahead of Linux been but then we turn filtering on going and single core filtering current doesn't do so well but does coming just behind Linux which is how we like to really position the BEST right we're just behind linear I so again you want filtering at senses that of misty and you see that we get the standard deviations in packets per 2nd so serious really wild ones that have been able to be in this submission and so in this sense of the season dust produced and this is current as of February should probably be a slide on February 24 2 so
you know we talked about modern harbor very early on in the talk right turns up multi-core really matters so but unless you know I don't want you so we turn multi-core and things get very bad for the beasties train and because sent us basically is ahead of various and happier since only behind Linux and the Free BSD is not just behind Linux and obesity is really not just behind right and what you'll notice here interesting there is not applicable and in to the speed right we know that is right we know the pdf and obviously is not multithreaded before was not when 5 that's from these with and but they show that matters they do get about whom thousand extra packets so massive and so the multi-core without filtering this is where touching you know the framework has the packet but there's no rules being executed we looking the roles that assist flow through machine a couple accountants go up and you catch link invalid where the neutron filtering where that's so the I'm Nokia sense is not so close to Linux previously is really not close to 1 x or Piacenza and obesity is enclosed anything but the ground but it's a very bad moment for everyone in sports In fact with existing here today and that only the case is that even turning on multiple scorers seem to have very slightly negatively impact performance I don't think this is that if statistically significant but it was on like you've got no multi-core sort of thing you know where what's going on so there might be something off them but they also notice this right so the single court case with filtering on the sentence and not at the top what you hear clearly the I Table stuff has been often not well optimized for the multi-core case which other than a very small number of people and certainly not people build like a home run everyone's level course as I watch has multiple causes that that is actually true for what's right to know my toaster does not yet so you know what is what is the lesson to take away from Friday well this gives us some
answers more questions we now know what the state of play it straight what is the field look like if we want to improve you know ts in any of the cities of the steel the version that we now have previously we know where we stand and related real in relation to other systems and we know that we have work to do and even if we won the winning means that we and unless we were doing exactly line right pitch I would have considered to be a statistical error so we know we have work to do so the answer is but more questions which is what always happens when these block which always happen when you do events from right you can see in the benchmark year like get no more questions are like really takes so you might not wanna do more work but there are always more questions that turned out of multi-core matter is getting multi-core right ie fast multi-core primitives matters even more right so that you know that top and has all core new system is an 18 courses uh with 2 rings in between the courses and if you think that is where we're going to stop you're wrong for anyone who saw was that that there's some yesterday size and the arms stuff there's a 40 to those 14 32 448 . 44 I missed it but at the time of the political really matters still doing it fast really matters but the y z the tables the fastest right so 1 of the things that I don't really need to do next week is basically me Muslims volunteer is to dig into you know why the Linux stuff is faster is it do the use RCU you because they can get away with it I mean because IBM Watson and there's there's something they're doing their that's new just better and can we learn from them I would use words so and you know what is previously lag sense when sensors based on few here's the so what if you know where the people who were working on something really was you know far all due to improve their performance that's actually pretty important not to use the full
picture when the less robot gender this is very good really happy because he has like but is more useful relative ordering next but you can see that this is everyone lined up right so that give sense only is the previously 11 as of February and then send us 7 which is the lens aperture right and then I get picture this is what it looks like when you're trying to do your esophagus firewalls on open-source operator which is something we're not giving up this was quicker to me the other day still has 3 firewall system right have get that's my plan that's exactly the question was omega we're going like yes because I know that I can do so much better than no model I'm I think I'm fixed whatever we decide to use so and this
is what we call a longitudinal study so I have a thing for a performance analysis 1 might say an unhealthy interest and and you know that the harbor and that's what's operator to run it but the idea is to make this into a continuous longitudinal study will publish different things about different bits the kernel like it really talk about why it's in the stuff you start today when we get to the the Beastie kind uh presentation which has been accepted so that means I have to do the work on the idea is to try to report this several times a year that has since I went up at a couple of abuse the comforters users make sense I wanna cover more subsystems so 1 of the things that I plan to do for the VBE the con presentation is not just look at the wise 2nd try more performance improvement there but if you are density exceeding that we have packet forwarding and fast packet forwarding you might ask yourself why you would ship a system that has the not that says fast turned off I frequently ask myself the question and so 1 of the things that I know that between now and then the followers of my goal is to basically collapse and to take the things that were made fast in the fastest move them up where they should be in the slow case and take the fast case out and all yes I mean take test this out so I will argue that a single word I will look at is I give w right and maybe a eventually elegantly filter someone doesn't move before after people keep saying but the idea W is another comparison like to me because I get architecture is completely different to the way the PF works so if you've read the sets of code and I did because there's a section in 1 of the chapters on the 2 them and their approach is to have a new packet filtering a completely different right how they decide which facts keep which facts to throw away and where they for things into rules all sets variance in the see have those design tradeoffs gets played out under the so
I said I'd tell you where to get but so the net from work which is the scripts and the results are all in this this thing here is called a perfect and which actually I admit was a really bad name but it was late because there's a million things in the Internet called perfect and and I'm running has much of that is conductor so again in a mental thing uh Piacenza thing you already have reduced is well and and and the thing I really thought about if you decide you would do something like this don't but if you really decide again that you want to know you really need to read this book and actually were and Thomas use also here and point out that this book was done in 1991 is about to out a new version in 2015 this is in the book on Computer Systems Performance Analysis Rothstein is a really good right you can read this book before bed and not get yourself in the head when you drop a nice overview fall asleep it's well written it's easier to read while set the stats for other than that it's really free and and adding but Professor James uh is very much a working person so a lot of the examples that he uses some of them were database sort of Corey optimization then Cory Cory measurements that but a lot of network-based so if you're interested in particular in our performance analysis this this is to a great great right we have about 15 the project arose from 15 minutes for questions any questions where the fact that this is an so the question was do we consider and other and at that time that I started doing this uh and was a little new and I did not want to broaden it but there's no reason not to and we're volatilities other is also a completely different system and I don't know you're using the steepest reason I want complain this is anyone knows it's actually multi-core multithreaded give money to be the rights of other interesting question we we might find it operates about like we have enough of the other questions while it was made in the sense of the word is used package and we can write that there is prejudice further questions and you yes roughly what columns or yes so I have done that to that there is a reason I wanted taken lessons for the question was it never run any benchmarks on the fast versus non FastPath that and and that was actually a previous version of the slides the uncertainty in the new IPsec stuff that yes turn fast packet forwarding and you don't need certain features that were to make it turned off when you turn it on which is kind of you know does the fast packet forwarding path has 2 things it has a reordering of the way you decide what to do the packet that's something that should be making it faster but also turns off a bunch of things that we checked in the normal fluorine case so the question is you know how much can we get out of the rearrangement versus how much it got like what was you don't need those features with that filtering stuff you don't that and so we have a that and then uh a bunch of analysis on that with Cindy scripts as well and so the next thing will be basically can a combined 1 b as fast as fast Fourier and for some of you might say in yes so so that so that the 0 present a question about the people who demanded these numbers demanded 0 bacterium for backlist and that is not an an unreasonable thing to as were released years 0 a revivalist because you're getting what usually is expressed as the effective folding in the affective because it's how much you can do for the machine actually boom and so that is a good thing to test for I have mostly just recorded like what we're losing or you know you think you run it up and then you run back sort of try to get close and I think my tests are not gonna point 1 I it's within 5 % firing uh but yeah I I think that's an important part of the set of variables control for the fact that so package is in use source tools tools now not that I use a lot of of this you do use a lot better so actually if you look in the the net per Freebo up here and you'll see there's a bunch of scripts are used a package and stuff it did not get built by default when you build the we need to build world and the it to produce the things and for those of you the PDK there's also a package and for DVD k reconnection vinyl colleague of mine from 1 and which also does a similar job using decay that happens to be something that you're using in-house by using previous a lot of them at the end of the loop of literally decay stuff properly use it for other things but also I've done a few extensions and so has a Chad to the package and stuff in terms of like in terms of randomizing your words and you know being able to do a few extra cool things it'll make that packets not be completely regular test for this new users in the new house so so that 1 of lines took was ended up if I go move back at to
that my otherwise have like from
problem area of so 1 of the
1st baseline data protection is not in the set of slides is these 2 right yeah sorry you repeat the question so traceability had I control for the fact that might be this 1 or this 1 that's broken by 1 of the things we may have found is that the the head of the tree is a bit slower than the release and not because of the body so that something else I need to trace them and we found that by looking at what we saw here right so we take this we make this the 1st baseline because we know that it's just going host and in terms of completely taking it is the right way to do that is something that olivier shower somewhere raise hand there are and you look at some of the stuff that will used in that here our project where he's tracking over time what happens at different places the graph there really great and that's kind of like the next step for this kind of stuff so then we can say well you know ahead of this data ultimately what I would like to be able to push a button and be like OK it's 4 weeks to release and let's see what happens when we do the standardized tests for networking and then we'll will be able to go back in the late faster faster faster what's faster faster what's right and so so what's will of there we're asking all all the way through the 1st in class I see what all the trees so that that's true if use the part of his right the essentials in well yes in both the word of or yes but also those also the becomes packets in the harbor come from our city trusting Chelsea on Indian Point process which is another point now you know the reason I'm using open source is because we use them for but this is super expensive right so you're paying a quarter million bases to US to get you know that kind of level accessibility and if I were building an ID card I would probably have an exit but if I'm working on open source operators approved it's going faster than you know unless if you have a spare Xia there is a lab where the last that there's is a ladder that way and they would let and then and then you need to be the main means every year or 2 because that's not to say that yes this is the result of that I really don't think that that other questions yes but I you have


  460 ms - page object


AV-Portal 3.20.1 (bea96f1033d39fbe77f82542458e108105398441)