Measure Twice, Code Once
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Untertitel |
| |
Serientitel | ||
Anzahl der Teile | ||
Autor | ||
Lizenz | CC-Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben. | |
Identifikatoren | 10.5446/18660 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
00:00
CodeTUNIS <Programm>BeanspruchungBenchmarkTOESystemprogrammierungKontrollstrukturOpen SourceStatistischer TestDatennetzSocketEinfache GenauigkeitCoprozessorSpeicherabzugDualitätstheorieHardwareBefehlsprozessorCachingMultiplikationWarteschlangeMengePhasenumwandlungLastTreiber <Programm>LogarithmusElektronische PublikationFunktion <Mathematik>AggregatzustandSignifikanztestClientKontrollflussdiagrammQuellcodeGraphNeuronales NetzMAPGerichteter GraphChiffrierungAlgorithmusRundungAuthentifikationPlastikkarteStandardabweichungMittelwertMedianwertLokales MinimumATMAdvanced Encryption StandardNP-hartes ProblemOffene MengeSchlussregelFirewallSchreib-Lese-KopfPaarvergleichStrom <Mathematik>VersionsverwaltungBenutzerfreundlichkeitRohdatenUnendlichkeitMultiplikationsoperatorÜberlagerung <Mathematik>LeistungsbewertungComputerSimulationVorgehensmodellExperimentelle VersuchsforschungSkriptspracheFramework <Informatik>Metropolitan area networkDedekind-SchnittZweiResultanteIPSecMultiplikationsoperatorTabelleNabel <Mathematik>Wort <Informatik>SignifikanztestVirtuelle MaschineFirewallATMBeobachtungsstudieVerschlingungMereologieSoundverarbeitungSoftwareRPCBandmatrixCoxeter-GruppeDatenverwaltungBitPunktRelativitätstheorieRechter WinkelZahlenbereichMehrkernprozessorGewicht <Ausgleichsrechnung>Figurierte ZahlFramework <Informatik>Quick-SortTreiber <Programm>CASE <Informatik>TransportproblemGeradeSoftwarewartungMAPEinfacher RingMomentenproblemARM <Computerarchitektur>Open SourceSystemaufrufKonfigurationsraumAnalysisGlobale OptimierungAbfrageRohdatenInverser LimesE-MailComputerarchitekturHinterlegungsverfahren <Kryptologie>GeheimnisprinzipSchreiben <Datenverarbeitung>GarbentheorieMengeMailing-ListeDatenbankAusnahmebehandlungStatistikOrdnung <Mathematik>RechenschieberNetzbetriebssystemBrennen <Datenverarbeitung>SoftwareentwicklerTermRouterFilter <Stochastik>NetzwerktopologieThumbnailSpeicherabzugMaschinenschreibenStandardabweichungRepository <Informatik>VariableBitrateKernel <Informatik>Prozess <Informatik>Data MiningOffene MengeSchreib-Lese-KopfGebäude <Mathematik>DefaultSystemprogrammierungHardwareStatistischer TestSchnittmengeComputerunterstützte ÜbersetzungAppletCodeProgrammiergerätReelle ZahlGüte der AnpassungSpeicherkarteEinfügungsdämpfungEreignishorizontBenutzerschnittstellenverwaltungssystemComputerSkriptspracheBenchmarkInternetworkingPaarvergleichSchlussregelDokumentenserverVersionsverwaltungDreiecksfreier GraphAggregatzustandProjektive EbeneBeanspruchungQuellcodeDatensatzStrömungsrichtungBefehlsprozessorInterface <Schaltung>ProgrammfehlerPatch <Software>Objekt <Kategorie>EinsFunktionalRadikal <Mathematik>UnrundheitMathematikerinKoordinatenWikiSynchronisierungQuaderCoprozessorPlastikkarteInternettelefonieWarteschlangeGraphikprozessorServerDämpfungDateiverwaltungProdukt <Mathematik>GamecontrollerClientElektronische PublikationEinfache GenauigkeitCachingSocketGenerator <Informatik>Nichtlinearer OperatorArithmetisches MittelZeiger <Informatik>AuthentifikationChiffrierungLoginLeistung <Physik>Metropolitan area networkAlgorithmusBildschirmfensterDatennetzKontextbezogenes SystemDiagonale <Geometrie>MathematikHidden-Markov-ModellHilfesystemSchlüsselverwaltungZellularer AutomatWeb SiteProgrammbibliothekLokales MinimumRadiusGrundsätze ordnungsmäßiger DatenverarbeitungTypentheorieMinimalgradKlasse <Mathematik>SpieltheorieLuenberger-BeobachterAnalogieschlussUnitäre GruppeGerichteter GraphMaßerweiterungFormale SpracheMonster-GruppeEDV-BeratungBasis <Mathematik>KommandospracheDispersion <Welle>p-BlockFehlermeldungFormation <Mathematik>ComputerspielPrimitive <Informatik>CompilerLoopWasserdampftafelWeg <Topologie>FlächeninhaltPhysikalischer EffektVakuumpolarisationGesetz <Physik>BestimmtheitsmaßSprachsyntheseHalbleiterspeicherDichte <Physik>Digitale PhotographieParametersystemVarietät <Mathematik>InstantiierungMatchingGeschlecht <Mathematik>TaskRobotikDruckverlaufTravelling-salesman-ProblemEndliche ModelltheorieAutomatische HandlungsplanungStützpunkt <Mathematik>System FVerteilte ProgrammierungGraphNP-hartes ProblemDatenflussDichte <Stochastik>NormalvektorFourier-EntwicklungBaum <Mathematik>Software EngineeringZählenExergieDatenmissbrauchVarianzWellenpaketFreewareLastPhasenumwandlungComputeranimation
Transkript: English(automatisch erzeugt)
00:00
She just told me it's time to put a talk. Wow. Thank you for coming. This is quite the audience. So we're not videoing this, right? Just the slides. Awesome. Because then I can do what I normally do, which is walk back and forth while I talk.
00:23
So welcome to Measure Twice Code Once. For those of you who've ever done carpentry, measure twice, cut once. I would walk more, but Dan has put tables in the way to prevent me from getting back and forth across the stage. This is work that I've been doing over the last, actually, I guess we've been doing this a little under a year,
00:41
with Jim Thompson from Netgate trying to look at various things in network performance in FreeBSD. So benchmarks are hard, it turns out, unless you're a marketing department. Then they're really easy, because you just make up numbers. You put them on a slide. I do not work for marketing, so I mostly did not make up my numbers.
01:01
But if I did, you can call me on it, because I've also put my numbers into a GitHub repository. Why are they harder? Well, there's a whole bunch of questions we have to ask, and we have to answer them correctly to get a good benchmark. Like, what are we trying to measure? How are we going to measure it? How do we verify our measurements? People often get through the first two questions pretty well.
01:22
Sometimes they do, anyway. And then they get the third one wrong, because they're like, well, I've got a measurement. Why would I have to run that thing more than once? What do you mean by statistical significance? Can our measurement be repeated? Sorry, wrong button. Can our measurement be repeated? That's really important. Many people have made measurements of, for instance,
01:41
cold fusion. But if no one else can do that same measurement, then no one's going to believe you, or no one with any brains will believe you. Can we replicate it somewhere else? Is our measurement relevant? This is a question that many people do not ask themselves. I've measured this thing, and it does blah. And it's like, well, that's great.
02:01
And now I've added 5% to that. That's great, but we didn't care about that. So it has to be relevant to what you're doing. How do we generate workloads? So there's all kinds of ways of generating workloads, synthetic, non-synthetic. For those of us who do a lot of networking stuff, which is what I mostly do and what the talk is about,
02:22
it's really hard to simulate the internet. I mean, you can get a lot of cats in a room and a photographer, but other than that, it's difficult to simulate the internet. So figuring out how to generate a workload that's going to be representative when you put some, because you're usually doing this because you want to put something out as a product or as an open source
02:40
operating system that's going to be used by someone other than you, workload generation and how you generate that also becomes important. And there are people, by the way, in the world who will sell you fabulously expensive objects with which to generate workloads. And if you don't know what you're trying to generate, you'll spend a lot of money and not get a very good result. And even if you do know what you're doing,
03:01
sometimes you'll spend a lot of money and not get a very good result. And here's the last one that I really like. So most people know what a hyzen bug is, right? So when I look for the bug, the cat is alive. When I don't look for the bug, the cat is dead. In testing, we also have hyzen testing, right? So we set up a measurement, we're running, we set up a workload, we've got something
03:20
that's detecting what that workload looks like, but that detection software itself may actually disturb the thing that you're trying to measure. And if you're doing that, you are going to pull out all of your hair. That's a joke I make in every talk. So here's a long list of reasons why benchmarks are hard. I will start hitting the right button at some point.
03:42
Network benchmarks are harder for a smaller, happily, number of reasons. Asynchrony is the key reason, right? So if I'm trying to test something on a local system and all of the hardware is working properly, which it mostly does, then I can run that test repeatedly
04:00
without worrying too much about asynchronous events interrupting me. But in networking, a lot of stuff is asynchronous, and so we have to worry about that asynchrony. We also have to worry about loss because most networking is best effort delivery, which is why I love working on networking because as Kirk says, if he's here, he's in another talk, and Kirk talks about file systems,
04:20
he'll say, if you ever curdle someone's data, they'll never trust you again on a file system. But it turns out in networking, I can drop your packets day and night, and you'll keep giving me them. 50% of your packets went away. You're like, no, take more, take more, take more. So best effort delivery actually makes it difficult to come up with good network benchmarks and network testing because you are not guaranteed
04:43
they will all get there, and so there's another thing to count, another thing you have to worry about. And if you do silly things, like ramp your request rate up very high very soon, you'll discover just how best effort best effort is. There's a real lack of open source test tools. I do a lot of open source.
05:00
My God, I'm wearing open source clothing all week. So there's a lack of really good open source test tools. Many people, for some reason, and I know I've done this too, you're like, oh, I gotta test this network thing. I'm gonna build a client and a server that are really simple and they count packets, and everyone gets that far, and then they put it on, well, they used to put it somewhere else, now it's on GitHub, right?
05:21
So there's 12 of these things, many of which no one's actually ever verified if they work correctly. So my favorite one of this was the early versions of Netperf, which was one of these client server testers. A mathematician come up to me who also does crypto actually. He's like, you realize that those results are always off by 30% one way or the other.
05:41
I'm like, well, no. He's like, yeah, here's some math. I'm like, wow, okay, I'll stop using that. So there's a lack of these open source test tools, and then we all know that open source is of uneven quality, right? So you have to pick the right test tools. And then there's this problem of distributed control. So I'm gonna talk a little bit more of that in some upcoming slides.
06:02
If you happen to have a test lab and you happen to have some minions, I think we call them remote hands, but you can get them to move things around and do all that kind of stuff, and you can say, well, I wanna test this today and I move these things around and I wanna test this today. Or you buy a very expensive box where it's like guiding light with switches, and I don't have one of those.
06:23
You've gotta control your distributed systems that you're testing. Again, in the single system test case, you can sit in front of the computer, whatever that computer is, or you could log into a terminal or whatever, and you can run the thing, and you don't have to tell this person to talk to this person while this person listens and this person watches and that kind of thing. So control of distributed systems
06:41
makes networking benchmarks more difficult to control and set up. Here's a typical lab. This happens to be a lab hosted at Sentex. If Mike Tanska is here somewhere, I'm destroying his last name, I really wish he'd raised his hand, because Mike is the co-owner of Sentex who have hosted the FreeBSD project's
07:02
network test lab, high-performance network test lab here in Canada for the last, for many years. And so a lot of this is wired up by him and another guy works with him, Paul Holes, who are amazing remote hands, and you only discover how amazing remote hands are until you don't have them. And they're like, wow, you guys actually know what you're doing.
07:21
So this is a typical test setup for packet forwarding testing, which I'm gonna show a little bit of that. You got a source, you can ignore the names, or you could find them on the Wiki page, by the way, all this is documented on the external Wiki for FreeBSD. I got a source to sync a bunch of 10 gig cards from this company, Chelsea O, whose developer is there,
07:42
and he's very helpful. And then we've got a control network where we're using the Intel one gigs to just talk to the things. And then we've got this Arista 10 gig switch up at the top, so we can either go through the device under test, you'll see DUT a lot, or we can go over this network
08:00
and remove the device under test and see what we would get in this case. So this is a typical three system lab setup. People who work for large companies that do a lot of networking have really awesome labs, and like I said, a lot of remote hands to work on them. I don't. I have a couple of remote hands that are excellent. And some really, we probably got about,
08:21
foundations put something like 20 or 20 some odd machines with high performance networking stuff into the Sentex lab. And a lot of the cards have been donated by the vendors. So now whenever a vendor comes to me and says, we've got a new NIC, I say, give me two, right? Because it turns out that in network testing, you really want two at least.
08:42
And considering I've burned out a couple of cards, I probably should start asking for three, but usually we get two, in this case, we've got more. Yes, exactly. Two, four, eight, 16, some power of two. So there's a typical setup just for one test. This is not the whole lab, but this is how a lot of the tests were run.
09:01
So here's another thing that's important in benchmarks, which again, people leave out. What did you benchmark with? What was the hardware you benchmarked on? And really specific, because it turns out if you've been to any of the discussions in the last couple of days about sort of NUMA and multicore, or if you see any of this stuff, you realize from generation to generation,
09:20
it really matters, this model number actually matters, because that's how all of the hardware is arranged. So in a lot of the tests I've been working on, and then what I'm gonna present today, we've got source and a sync, or these dual socket 10 core 2.8 gigahertz, I think they actually heat the room pretty well, Xeon monsters, and then we put us, you know,
09:43
a four core but fast machine in the middle, we're using Chelsea OT 520s, dual port 10 gig NICs, and an Arista 7124. And the reason you put this up here isn't just to show, wow, you've got really cool hardware. It's more, well, if you wanted to replicate this, you could replicate this exactly with this if you happen to pick up that hardware,
10:02
or you could at least figure out maybe analogous hardware so you could see if your analogous results were analogous because if they're not, that's a problem. So let's talk a little bit, take a little side trip into coffee land. You can only imagine what the recording of this is like.
10:22
And talk a bit about modern hardware. So one of the reasons we set up Sentex and we got people to donate hardware and we got the foundation to put in servers and power and cooling and yada yada yada is 10 gig hardware is still somewhat expensive. I like to say that it's now gotten cheap enough
10:40
to be within the means of even the smallest nuclear power. But when you're running 10 gig hardware, you've got some numbers to deal with, right? So 10 gigabits, you get 14.8 million 64 byte packets per second. You get about 200 cycles of 300 gigahertz to deal with the packet. That's a very interesting problem because I don't know about you,
11:01
but when was the last time you found one of your functions took less than 200 nanoseconds? Or 200 cycles, sorry, not 200 nanoseconds, 67 nanoseconds. Another interesting problem that we come across as we get to these newer machines, the cost of a cache miss is way more expensive than anything else that's gonna go wrong on the system. Right, so we used to be taught to program
11:22
and many of us were taught to program such that you optimized for CPU cycles. CPU cycles are not quite free. I mean, I know Java programmers think they are, along with memory, but cache misses are what cost now. If you blow out of the cache, your network performance will suffer
11:41
and I can show you that happening. Other things that matter on modern hardware are multi-core. So, you know, it used to be you had one processor with one core and that was not so bad. And then you had one processor with two cores and four cores and now you can buy an 18-core processor
12:01
from Intel which has a complicated little ring network in it that's sort of terrifying. So multi-core matters. You need to think about what's going on in a multi-core machine because where you put your workload is going to affect your benchmark result. And if you don't look out for that, you're going to either, you're going to confuse yourself badly.
12:21
Multi-queue, so all of the network cards. The way you get a 10-gig network card to do 10-gig is you often will use multiple queues. I mean, sometimes you can do it with one queue, but you really can't. So they're like, look, we got all this silicon. We'll spread the load. So how you line things up, last line, between memory, your queues, and your cores
12:42
has a very profound effect on the results you're going to see from a benchmark. So these are the things to keep in mind when we're doing network performance benchmarks. So I mentioned the problem of distributed systems. And what I love about open source, which I'll point out in this,
13:00
is all of us have a scratch to itch, an itch to scratch. And mine, when I started doing some of this work, was this, which is this coordination problem. I don't want to have to have, even though I often do, 15 terminals open where I have a bunch of command lines ready and I have to go return, return, return, return, return, to run a test. That's just ridiculous, right?
13:20
But I've done that before. So I wrote this thing called Conductor, which is a set of Python libraries, it's in pure Python, which someone was like, wow, that's great. I'm like, it's not that hard. Anyway, so you have a Conductor, and for those who know me in the audience, not a trained Conductor, this is, you know, this is the Conductor conductor thing, right?
13:40
So there's a Conductor and one or more players. So you can have as many players as you like, and they all talk to each other, and the Conductor is the one that sets the tune, right? You are going to do this, and then this, and then this. And the test system has four phases, startup, run, collect, and reset, right? And the reset thing is kind of important because this is another thing people often forget when they're running multiple tests,
14:00
which is, well, I've set up the test, but now I've got a bunch of state that accrued because I ran the test, and then I collected the results, and now that state influences the next round. Oh, by the way, you should test more than once, just in case you didn't know. You kind of want to have statistically significant numbers instead of just a one-off. So this is a system we did up in Python open source.
14:23
It's on GitHub. I'll put all the links up later. There were two nice things about doing this. One, it saved me from typing return in a bunch of windows, and the moment you publish something in open source, five people come forward and go, you know, we have that. I got mails from there. I got mail from someone in this, nope, he's not here,
14:42
but I got mail from someone at the conference or someone they work with saying, oh, we did this for TCP. I'm like, really, where is it? They're like, well, we're not ready to release it yet. I'm like, well, then that doesn't help me. But it does get people to, I like doing this because it gets people to sort of release these internal things that are like, nobody would ever want this. No, no, we really do.
15:03
So I'll go through this little digression through conductor, here's a config. This is what the conductor sees. It's like, where are my clients? Where are the config files? Notice one, that's not a good set of trials, but this is testing, so testing the testing. Here's a player config.
15:21
If Steve Bourne is in the room, you'll notice that this is a ridiculously horrible version of just a bunch of shell commands. So I should probably adopt a shell scripting language, but I got a little too far into it. I've got line numbers, see, they're called steps. Told you it was primitive, but nobody else had one until I published this and suddenly four people or five people were like,
15:42
wow, we got that. Anyway, pretty simple. Where do we find our conductor? Where do we find the master? What do we do when we start up? Why do we do a ping when we start a test? That's a good one. What's another reason? Yes, ARPCache. You don't want to be testing whether or not
16:01
you've loaded the ARPCache, at least not in this particular test. Got a run thing, we're collecting a bunch. This is on one of the devices under test. So one of the things that you can do once you've got something like this is you can not only run the tests from source to sync, but while the machine is running, you can start collecting performance analysis stuff.
16:20
So this is doing a bunch of looking for the instructions retired on the system as it's trying to forward packets. So then you can find the hot point in the kernel, which I can tell you where that is. There are a few. So, and then we collect the results. So this is just an example. There's a whole bunch of these. I'll give you another pointer. All of the, I'll give you another pointer at the end,
16:40
but all the work we're doing when we're doing the performance stuff. Conductor is open source and on GitHub, and there's a project under, they're all under GVNN3, which is my GitHub login because I couldn't get GNN. And then all of the tests and results and config files for the things I'm going to show you are in something called Netperf.
17:01
There's a little Netperf project that you can clone and you can see what I've done wrong because I really like people to start looking at that. So baselines. Many people, when they want to run a benchmark, the first thing they test is the thing they know their boss wants, right? The boss wants to know that NFS performance is 20% faster
17:22
or that TCP does this or UDP does that or whatever it is. They're testing for a specific thing, but they don't establish a baseline, right? And establishing a baseline turns out to be really important because then you don't have, otherwise you have nothing to measure against. You just give someone a number. It's fast, great, faster than. So in establishing some of the baseline measurements
17:43
that I'll talk about, we use, actually it's iperf3, I should put the three on, which by the way is now maintained by a former Freebies2 developer. So that makes my life easy because I can get him to put in patches. iperf3 is a TCP-based test that seems to give reasonably statistically significant results
18:00
as opposed to Netperf. Your old Netperf, not the new Netperf. The new Netperf, so Luigi Rizzo added this thing called Netmap, which you may have heard about. And if you haven't, it'll be talked about at every BSD conference for the next five years. So you will hear about it. It's also in this book that I worked on. Netmap gives you direct access
18:21
to the very lowest levels of a network interface card's device driver. And what that allows you to do is to drive that card pretty much at line rate without interference from the network stack, right? So if you are doing packet-type tests, something like Netperf is something you really want because you can just pump packets out at line rate
18:42
at 10 and 40 gig. And I've tested this on 10 and 40 gig already. And that's a really good way to abuse the living crap. I guess I'm recorded. But anyway, to abuse the living crap out of a device under test, whereas TCP has so much other things going on that what you'll really find is it's all good or everything is fine, right?
19:01
As we say on the FreeBSD project, everything is fine. Well, that's because all the machinery has smoothed out all the rough edges for you. If you want to put the rough edges back, you use Netperf. So here's a baseline TCP measurement. This is just a host-to-host between the switch. No forwarding going on through the host.
19:21
You notice that iPerf's reporting this like every second for 10 seconds. It's saying, wow, getting really consistent 9.41 gigabits per second. So talk is over. We can go home now. So the talk is not over and we cannot go home. So this is the baseline when we just turn things on. So now we've got something to work against
19:42
in terms of what do we get host-to-host when we're not doing forwarding. A lot of the tests that I've been doing lately look at the forwarding path of FreeBSD, which is something that people have not looked at as much recently because people assume you don't use FreeBSD as sort of a router or directly in the packet path. But a lot of people do, it turns out.
20:01
And the better it gets, the better we will be. Besides, it's fun. So we saw 9.4 gigabits per second with TCP. I really should put commas in here. See the source, see the sync? They don't match. Why is that?
20:20
That's because this is the packet gen measurement. This is just raw packet performance. And so somewhere we are losing packets. And part of the exercise is to find out where. So why we see this is eventually TCP very quickly ramps up to full-size packets, which are much easier for everyone to process,
20:43
including the NIC and the operating system because you're not dealing with 64 bytes at a time. Packet gen uses minimum-sized packets, and the device under test can't quite keep up, as you noticed. But if we hadn't done the baseline, we'd be like, well, if we'd done the baseline and accepted that we were done,
21:01
we would not know very much. And we'd think, everything is fine, which it's not. One of the interesting things I find about the minimum-sized packet trick, so many network testing systems, especially the expensive ones, and many people who build network cards that are selling them to you,
21:20
are really big on like, we can do X minimum-sized packets per second. And it's like, that's great. There's only one real use of minimum-sized packets, ACKs. Now, if you happen to be, I don't know, feeding 40 gigabits of TCP to people's televisions, you probably care about the ACK rate coming back.
21:41
But this is not, this is just sort of the worst case, but it's not always the most interesting case. We'll take a look at that in a little bit. Hmm? And VoIP. But, I don't know, how fast do you speak?
22:01
Yes. No, I understand. So, that was some of the baseline measurements we did just for TCP and for packet forwarding. I want to talk about some of the more recent work that we've done since, so I've, since we did the initial work on forwarding.
22:20
It's expensive. Clearly the spell-checker did not catch that. IPSec and its algorithms. So, we have IPSec and FreeBSD. We use it, and many people use it. You know, we know that IPSec and encryption are computationally expensive, often offloaded through coprocessors.
22:41
If you saw John Mark's talk before lunch, then you saw the work that he's done to bring the AES and I instructions into FreeBSD as a way of accelerating the encryption part. So, one of the things to do once John Mark had gotten that in the head was to then see, well, how much does that help? Or, my real question, having looked at IPSec,
23:03
which is actually pretty good, the thing I wanted to know is what is the weight of the framework, right? So, when you introduce some framework like IPSec or a TCP stack or whatever, you put some extra software around things to make it work, before you go figuring out how fast did that screamingly fast new instruction
23:21
from Intel make things go, you've got to sort of figure out, well, what happens when we're not doing anything at all? And by not anything at all, I do not mean no IPSec, I mean using the null stuff. So, here's our measurement method for this. This is a two-host test with either transport or tunnel mode for IPSec,
23:43
depending on what we were doing. And using the same machines you saw in the lab, we used iperf3 to do TCP testing over the IPSec transporter tunnel between those two hosts, between links1 and rabbit3. And again, all the results, all the configs, they're all up in the netperf GitHub.
24:02
So, two hosts set up, iperf3. Obviously, I used Conductor because everyone else hasn't released their awesome open-source software. And then, a very simple task, 10 rounds, 10 seconds each, to try and make sure I'm not completely lying just with one result. So, what do we get for a baseline?
24:21
Remember, this is two 10-gig NICs. So, using null encryption, we actually have null encryption, which someone broke, because no one tests. And so, I put the null encryption back. Why did I put null encryption back? Because I don't want anything interfering. I want to know the speed of the baseline framework.
24:41
I want to know what is the cost of just turning on IPSec. So, no authentication, no encryption. And, one of the things that happens when you use IPSec is you lose a bunch of the things that make 10-gig cards go fast. 10-gig cards go fast, not because there's a little man running very quickly in them, because they're water-cooled,
25:01
which they will be eventually in. Keep seeing NIC cards that look like old-style graphic cards that I'm running for them, like that lime wire on them in bubbling water. So, when you turn off TCP segment offloading, and hardware checksumming, and large receive offload,
25:22
it turns out the cards are not so screamingly fast anymore. So, this is the result of running null. The result of running not null, but just the 10-gig cards with none of the features is only about, I think it's like 40% more than this.
25:42
I'm still working on that result. So, this is the baseline we get with IPSec on. 2.4 gigabits per second between two hosts, running TCP all day, all night. So, with IPSec, we've got authentication and encryption.
26:02
You can pick one or the other or both. You should always pick both, just saying. But, you know, so what's the effect of turning on something like HMAC SHA-1, which is actually kind of expensive computationally? So, this is transport mode. There's no encryption.
26:20
This is hundreds of megabits a second on a 10-gig link. So, we are less than 10% of the effective bandwidth between the two hosts. That is not good. But we're not gonna get to why today. Actually, to get to the why, you're gonna have to come to another talk. Teasers. And then, one of the new modes that John Mark has added
26:44
is this AES and I stuff, and one of the algorithms that takes advantage of it is AES GCM, which is Galois Counter Mode, I think. Okay, excellent, I got it right. I even probably, did I say Galois correctly? Thank you real bad. So, this is running in tunnel mode
27:00
where your complete packet is encapsulated. There's a whole nother header on the front. Everything is hidden from everyone, but maybe the NSA, that's recorded too. So, we've got both encryption and authentication. And what happens when we go between using no hardware support? If you thought that authenticating was expensive, encrypting is super, super expensive, but really secure.
27:25
And then you get, what is it, like, five times the number of bits through once you turn on the hardware support. We're actually within half the speed of our original baseline. Our original baseline was 2.4 at max. We get to 1.3 at max with the AES and I support.
27:44
So, this is not the end. This is just the beginning. This is, well, what do we have? What do we get? And now, the next thing we're gonna work on, and I'm not gonna present today, is what else can we do? But, one of the things that,
28:01
I'll leave up the overall picture while I rant about premature optimization. So, I don't know if you know this, but hubris is an issue for software engineers, except me. And so, we all think we're smarter than the compiler, the hardware, everyone who's ever looked at anything.
28:22
And often, people will look at a very narrow chunk of code and say, oh, I could make that faster. I'm like, but does it matter? This comes back to the relevancy question I brought up in the first real slide. In order to know whether things are relevant, you have to have this kind of set of measurements first.
28:40
And then, you can be like, well, okay. We know that hard versus soft, that using the hardware versus the software version is much faster. We know that it's still not even up to what null is, which, okay, we know we have to do some work. That's good. We also know that null is not at the speed that perhaps it could be. So, we wanna find out why, right?
29:02
And when I showed you the config file for the device under test, you saw I was starting to add PMC, which is the performance monitoring counter system. I've done a bit of analysis also with DTrace. And that's the kind of stuff that's gonna tell us why. Now that we've got a framework in place and a set of software that can run the tests, and it's all out in the open for people to use,
29:21
we can start digging down into why. And maybe the why is, you know, sometimes you look at the hardware and you look at what's going on, and you just find out that you have reached the limit of the hardware. I do not, for a moment, believe that this is the limit of the hardware. I think that there are limitations in the software.
29:40
But to find that out, come to VBSDCon in the fall outside of Washington, D.C., and you'll see what the results of the why tests are. So last big set of tests. This is work we had done a bit earlier. Jim and his team worked a great deal with PF.
30:02
They work on PFSense. So we wanna try and figure out, well, what is the overall performance of PFSense, raw FreeBSD, openBSD's PF, which is where, you know, where FreeBSD's PF originally comes from, but there's been a great deal of work in particular to make it multi-core on FreeBSD.
30:22
And then there's this other operating system, which I dare not speak its name. The firewall rules are given in the paper we presented at ASIA BSDCon 2015. I'm not gonna make you read firewall rules on a slide. I'm not that mean.
30:42
So we went through, and actually Jim ran a bunch of this stuff and we worked through this. We went through a bunch of scenarios, right? And that's the kind of thing that you're designing when you do good benchmarks. You don't just pick one thing and go, look, this number is five, right? You know, like this number is five when all of the other variables were controlled in the following way.
31:00
So first thing we do is some single core, no filtering. What no filtering means is there's no real rule. We've just turned the thing on and PF or IP tables or whatever, or what have you, it looks at every packet. So that bit of framework is being touched every time, which is definitely an overhead, right? You touch a packet, you look at a byte,
31:20
you invalidate a cache, you know, things happen. Bad things happen to bad people. So how did it go? So this is a packet per second measurement. This is a very common network benchmarky thing. Instead of measuring just raw bits per second, it's like number of packets per second.
31:41
And in this case, it turns out in single core without filtering, PF sense is faster than OpenBSD, which is faster than FreeBSD, which is faster than CentOS. Hooray, BSDs, all three are ahead of Linux. Yay. Then we turn filtering on.
32:01
And in single core with filtering, current doesn't do so well, but it does come in just behind Linux, which is how we like to really position the BSDs, right? We're just behind Linux. So again, even with filtering, PF sense is ahead of OpenBSD. You see that we've got these standard deviations
32:21
in packets per second. So you're going to see a really wild one in a bit that I haven't been able to beat into submission. So PF sense, OpenBSD, CentOS, PreVSD current. And this is current as of February, I should probably put the date on the slides at this point, February, 2015.
32:41
So, you know, we talked about modern hardware very early on in the talk, right? It turns out multi-core really matters unless you don't want to do anything. So we turn on multi-core and things get very bad for the BSDs because CentOS basically is ahead of everything, right?
33:00
And now PF sense is only just behind Linux. But PreVSD is not just behind Linux and OpenBSD is really not just behind, right? And what you'll notice here that's interesting, there's a not applicable to the speedup, right? We know why that is, right? We know that PF and OpenBSD is not multi-threaded
33:21
or was not in 5.6 when these were done. But it does show that it matters. They do get about a thousand extra packets. So not so good. So now we've done multi-core without filtering. This is, we're just touching, you know, the framework touches the packet, but there's no rules being executed.
33:41
Nobody's looking at the rules. The packets just flow through the machine. A couple of counters go up and, you know, cache lines get invalidated. What happens when you turn on filtering? Well, not so good. Now PF sense is not so close to Linux. PreVSD is really not close to Linux or PF sense.
34:00
And OpenBSD isn't close to anything but the ground. It's a very bad moment for everyone in sports. In fact, what's interesting here to me in the OpenBSD case is that even turning on multiple cores seemed to have very slightly negatively impacted performance. I don't think this is statistically significant,
34:22
but it was, I'm like, if you've got no multi-core sort of thingies in the way, what's going on? So there might be something odd there. But then you also notice this, right? So in the single core case with filtering on, CentOS would not have been at the top. But once you get here, clearly the IP table stuff has been well optimized for the multi-core case,
34:42
which other than a very small number of people, and certainly not people who'd build like a home router, everyone's got multiple cores. My watch has multiple cores. That is actually true for once, right? It's a new watch. My toaster does not yet.
35:01
So what are some of the lessons to take away from this, right? Well, this gives us some answers and more questions, right? We now know what the state of play is, right? What does the field look like? If we want to improve TF in any of the BSDs, OpenBSD or the version that we now have in FreeBSD, we know where we stand in relation to other systems,
35:24
and we know that we have work to do. Even if we won, whatever winning means, unless we were doing exactly line rate, which I would have considered to be a statistical error, we know we have work to do. So there's answers, but more questions,
35:41
which is what always happens when you do these, well, what should always happen when you do a benchmark? If you come to the end of the benchmark and you're like, yep, no more questions, I'm like, really? I don't think so. You might not want to do more work, but there are always more questions. Turns out multi-core matters, getting multi-core right, i.e. fast multi-core primitives,
36:01
matters even more, right? So the top-end Haswell core system is an 18-core system, with two rings in it between the cores, and if you think that is where we're going to stop, you're wrong. For anyone who saw, who was at the Dev Summit yesterday and saw some of the ARM stuff,
36:20
those are 48, was it 40, 32-core, 48-core? 48-core? I missed it, but I had to talk to someone. Multi-core really matters. Felt doing it fast really matters. You know, why is IP tables the fastest, right? So one of the things that we're going to need to do next, or basically me, unless someone else wants to volunteer,
36:42
is to dig into why the Linux stuff is faster. Is it, do they use RCU because they can get away with it? I mean, because IBM lets them? Is there something they're doing there that's just better, and can we learn from them? I wouldn't use the word steal. So, and you know, why does FreeBSD lag pfSense
37:03
when pfSense is based on FreeBSD? So what did, you know, what did the people who were working on something that really was a, you know, a firewall do to improve their performance? That's actually pretty important. All right, so here's the full picture.
37:21
When's the last time anyone drew a bar graph? Jim drew this, it was very good. I was really happy, because he, I was like, I don't know how this works. This is more useful when we're going to look at what we're doing next, but you can see that this is everyone lined up, right? So pfSense, OpenBSD, FreeBSD 11 as of February,
37:44
and then CentOS 7, which is the Linux IP table stuff, right? And, you know, now you've got a picture. This is what it looks like when you're trying to do, you know, software-based firewalls on an open source operating system, which is something we're not giving up. I mean, as it was pointed out to me the other day, FreeBSD still has three firewall systems, right?
38:05
Oh, yeah, that's my plan, exactly. The question was, am I going to work on another one? Like, yes, because I know that I can do so much better than, no, I'm not going to work on that one. I think I'm going to fix whatever we decide to use next.
38:21
So this is what we call a longitudinal study. So I have a thing for performance analysis, one might say an unhealthy interest. And, you know, now that we've got the hardware and we've got some more software to run it, the idea is to make this into a continuous longitudinal study. We will publish different things about different bits of the kernel.
38:41
Like I said, we're going to talk about the why of some of the stuff you saw today when we get to the VBSDcon presentation, which has been accepted, so that means I have to do the work. The idea is to try and report this several times a year. Since I wind up at a couple of VSD conferences a year, that seems to make sense. We want to cover more subsystems. So one of the things that I plan to do
39:02
for the VBSDcon presentation is not just look at the whys of IPSEC and try to do some more performance improvement there. But if you've ever worked on FreeBSD, you know that we have packet forwarding and fast packet forwarding. And you might ask yourself why you would ship a system that has the knob that says fast turned off.
39:24
I frequently ask myself this question. So one of the things that I'm going to look at between now and the fall is my goal is to basically collapse the two and to take the things that were made fast in the fast case, move them up where they should be in the slow case,
39:40
and take the fast case out, and well, yes, take the fast case out. So we're going to look at that. Another thing that I want to look at is IPFW, right? Maybe eventually I'll look at IP filter if someone doesn't remove it before I have to. People keep saying.
40:01
But IPFW is another comparison I'd like to make because IPFW's architecture is completely different to the way that PF works. So if you've read both sets of code, and I did because there's a section in one of the chapters on the two of them, their approaches to how they do packet filtering are completely different, right? How they decide which packets to keep and which packets to throw away
40:20
and where they throw things into rules and all that stuff. So it'd be very interesting to see how those design trade-offs get played out under the study. So I said I'd tell you where to get it. So the Netperf work, which is the scripts and the results, are all in this thing here. This is called Netperf,
40:41
which actually I admit was a really bad name, but it was late, because there's a million things in the internet called Netperf, it turns out. But I'm not renaming it now, because it's too much trouble. Here's Conductor, so again, under my little thing here. pfsense, I think you all know where to get FreeBSD as well. And then the other thing I really want to talk about, if you decide you want to do something like this, don't.
41:04
But if you really decide again that you want to do it, you really need to read this book. And actually, Arun Thomas, who's also here, pointed out that this book, which was done in 1991, is about to come out in a new version in 2015. This is the book
41:21
on computer systems performance analysis. Raj Jain is a really good writer. You can read this book before bed and not hit yourself in the head when you drop it on yourself as you fall asleep. It's well-written, it's easy to read. Well, set the stats apart. Other than that, it's really easy to read. And Jane, Professor Jane now, I guess,
41:40
is very much a networking person. So a lot of the examples that he uses, some of them are database sort of query optimization thing, query measurement stuff, but a lot of it's network-based. So if you're interested in particular in network performance analysis, this is the book. It's a great, great book. All right. We have about 15,
42:02
but they probably kicked me out a little earlier, so about 15 minutes for questions. Any questions? Way in the back. Hey, Christian. So the question was, did we consider NPF? And at the time that I started doing this,
42:20
NPF was a little new and I did not wanna broaden it, but there's no reason not to. We probably should look at it because NPF is also a completely different system. I don't, you're, who's an MPSD person, raise your hand? I won't complain. Does anyone know if it's actually multi-core, multi-threaded?
42:42
Do you know? I mean, I know you do BMake, so. All right, so that would be another interesting question. We might find it operates about like PF in an open BSD. Other questions?
43:02
What did you use? Packagen. Right, okay. Yeah, that was Packagen. Other questions? Yes, oh, wait. Ah, call on the mic. Packag forwarding is in fact faster
43:22
than packet forwarding. Yes, so I have done that bit. There is a reason I wanna take the lessons, or the question was, have we run any benchmarks at all in the fast versus non-fast path of packet forwarding? And that was actually in a previous version of those slides, but I wanted to start including the new IPsec stuff. Yes, if you turn on fast packet forwarding and you don't need certain features
43:41
that get turned off when you turn it on, which is kind of, you know, the fast packet forwarding path has two things in it. It has a reordering of the way you decide what to do with the packet. That's the thing that should be making it fast. But it also turns off a bunch of things that we check in the normal forwarding case. So the question is, you know, how much can we get out of the rearrangement
44:00
versus how much it got out of like, well, you don't need those features. Like that filtering stuff, you don't need that. So we have run that, and I did a bunch of analysis on that with some D-trace scripts as well. So the next thing will be basically, can a combined one be as fast as fast forwarding
44:20
for software forwarding? Mike. Doing benchmarks using some of that expensive network equipment, certain partners insisted that we had to aim for zero packet loss. We insisted on like 0.1% or something. Any comments on that? Yes, so the zero, so the question is,
44:44
so a bunch of the people who demand these numbers demand that it's zero pack, that you aim for zero packet loss. And that is not an unreasonable thing to ask, or at least near zero packet loss, because then you're getting what usually is expressed as the effective forwarding rate for the effective packet rate, because it's how much you can do
45:01
before the machine actually goes boom. So that is a good thing to test for. I have mostly just recorded like what we're losing, or you do the thing, if you run it up, and then you run it back, and then you sort of try to get it close. I think my tests are not down to 0.1.
45:21
I would say it's within 5%, if I were gonna guess. But yeah, I think that's an important part of the set of variables to control for, is packet loss. Yeah. So packet gen is in user source tools tools netmap.
45:42
Not that I use it a lot. And if you do use it a lot, so actually if you look in the netperf repo up here, you'll see there's a bunch of scripts that already use the packet gen stuff. It does not get built by default when you do build world on FreeBSD. It's a FreeBSD thing. For those who've used DPDK,
46:01
there's also a packet gen for DPDK, written actually by an old colleague of mine from Winderer, which also does a similar job using DPDK, if that happens to be something you're using in house. Using FreeBSD, I use a lot of netmap. I've looked at the DPDK stuff, and I will probably use it for other things.
46:21
Also, I've done a few extensions, and so has Adrian Chad, to the packet gen stuff in terms of randomizing your ports and being able to do a few extra cool things that'll make the packets not be completely regular. Yes? Tracability, you're using, I assume,
46:40
FreeBSD on Lynx 1 and FreeBSD on Lynx 3. How do you isolate your possibility that it's your source and your sink? So one of the baselines we took was, and if I go back, I have to go all the way back.
47:00
Oh my God, I'm gonna have like thumb problems. Here we go. So one of the first baselines I took, which actually is not in this set of slides, is these two, right? Yeah, oh, sorry, I should repeat the question. So traceability, how do I control for the fact that it might be this one or this one that's broken? By the way, one of the things we may have found is that the head of the tree
47:22
is a bit slower than the release and not because of debugging. So that's something else I need to trace down. And we found that by looking at what we saw here. So we take this out of the equation, and we make this as the first baseline, because we know that it's just going host to host. In terms of completely taking it,
47:40
the right way to do that is something that Olivier Charchart, somewhere, raise your hand. There you are. If you look at some of the stuff that Olivier is doing on the BSD router project, where he's tracking over time what happens with different releases, the graphs, those are really great. And that's kind of like the next step for this kind of stuff. So then we can say, oh, head of this date, when ultimately what I would like to be able to do
48:01
is push a button and be like, okay, it's four weeks to release, and let's see what happens when we do the standardized tests for networking. And then we will be able to go back and be like, faster, faster, faster, whoops, faster, faster, fast, whoops, right? And so it's the whoops that we're looking for. Is that what you're asking? Not quite what I meant. I meant that.
48:23
And it's 9876543 million packets per second. I'm going to trust whatever precision it tells me, because I know that Ixia has gone and done all the testing, the validation, and there's a traceability in the testing sense of the word
48:40
of that number. They can prove that that number is correct. That's true. I'm looking at FreeBSD. Well, I've got the Chelsea hardware in this case. I've got the driver from, I don't know if that's Chelsea or it's a good one. It is from Chelsea. Both, yes, both. I've got a FreeBSD packet framework. Has anyone ever proven that the numbers that those frameworks report are, in fact, correct?
49:02
Yes, right. Also, the counters of packets in the hardware come from the hardware. So you'd be trusting Chelsea in the endpoint case, but also the switch is another good point. Now, the reason I'm using open source is because,
49:23
I wouldn't say I'm poor, but the Ixia is super expensive, right? So you are paying a quarter million base to US to get that kind of level of traceability. And if I were building a NIC card, I would probably have an Ixia. But if I'm working on an open source operating system
49:40
and I want to prove that it's going faster, then I'm, you know, unless if you have a spare Ixia, there's a lab. Where's West? There's a lab that way and they would love it, right? And then, and you need to pay the maintenance every year on it too, because that's not cheap. Yes.
50:06
I really don't want to do that. Other questions? Yes. Ah, do you have, do you have interns at Verisign?
50:27
If you give me an intern, I will get you repeatable tests. I mean, that's what I'm working towards, but at the moment I've been more focused on getting the tests up and then figuring out what's wrong in the kernel, because theoretically I do kernel work. But I mean, that's ultimately the goal, like is to, you know, we've got QA and stuff to do
50:44
testing of things that are easy to test that are testable on a single host. The reason to build conductor and to probably look at, you know, DPDK also turns out to have a test framework that I'm going to look at. The folks in Swinburne have a test framework that they've just released.
51:00
You know, looking at something that makes that automatable and reportable and that people will look at is definitely a goal. I don't know that that'll be happened by VBSTCon and I will not give a date on when it would happen, but I mean, because you're still running your tests by hand, right, Ovi?
51:20
Yeah, so we've automated some of it, but we haven't automated the, you know, yes. I mean, that would be great. And the other, so one of the other problems with that kind of thing, so right now we've got a test lab full of equipment for developers, right? So like, you know, well Navdeep's got tons of equipment, but let's say I need to look at something out on the shell CO. They didn't, turns out they don't give me a SSH
51:41
into the shell CO's internal network. Strangely, I don't know why not. But, you know, so that lab exists mostly for kernel developers and software developers to be like, I need access to some crazy thing I can't afford. Let me work there. As opposed to a rack of machines that, you know, like a Jenkins kind of thing where it's like, we just run the test and then we get the results and we run the test to get the results.
52:02
If we had the money, and the thing that's, it's not just the money for the machines. It turns out, and I said this thing about remote hands earlier, having the hands to do the work is the hardest part for this stuff. Well, you know, you're Cisco, right? Or you're Verisign, I don't know. You're a large company that has people you can assign to this, it's a bit easier.
52:20
I mean, I've worked at companies where we had a lab and we had the lab manager. And the lab manager was responsible for building that software. And they, you know, had a couple of people who worked with them, and that's a lot smoother. When it's, you know, people who are, even if it's their job, it's not that main part of their job, it's hard to get that to the point of critical mass.
52:40
That's a long answer to that question, yes. Going back to the, or being a FreeBSD setup, when you're testing something like TCP, you get interactions, virtualities, and things like that. I have not, so one of the reasons that I'm gonna put this all the way to the end, one of the reasons I did a lot of the open-sourcey bits,
53:01
other than the fact that I'm a communist, is, or a socialist, something. Anyway, these two bits, is I want other people to steal my code, right? I want people to look at what I've done in the Netper stuff in particular, and either say, it's wrong, right? It's like, I tried this, this doesn't work, because I really like to know if things are broken. But also just take it as examples
53:21
and be like, sure, you know, I'm gonna set up a Linux versus a FreeBSD, CentOS versus a FreeBSD, a CentOS versus a P, open BSD, a CentOS versus a NetBSD, and see how they interact. And then the next step after that is kind of the, you know, the dream thing where someone drops a trillion dollars on my head and I can buy every piece of network equipment.
53:42
Because my goal, someday, is to own every piece of network equipment. Like owning all the music, you can never listen to it. So the idea behind this is that I'm one developer. There's more than one of you out there. I'm really hoping a lot of people in this room will pick this stuff up, tell me what's wrong, send me bugs, start running them themselves,
54:01
send me results. Anyone who starts working with this stuff, I will give you access to, you know, write access, commit access to the GitHub. You can send me pull requests, you can start putting results in if you're willing to and your company allows it. I'm a consultant, so my boss, though a jerk, will let me do it because it's me.
54:20
So that kind of stuff can get in there. But yeah, there's a ton of stuff we could be doing, right, like the list is not infinite, very long. Other questions? All right, I think my coffee has run out and so is my time. Thank you very much. Thank you very much. Thank you very much. Thank you very much.