Measuring Performance on OpenBSD

Video thumbnail (Frame 0) Video thumbnail (Frame 2542) Video thumbnail (Frame 10752) Video thumbnail (Frame 19531) Video thumbnail (Frame 29989) Video thumbnail (Frame 41568) Video thumbnail (Frame 52401) Video thumbnail (Frame 63234) Video thumbnail (Frame 66139) Video thumbnail (Frame 75241) Video thumbnail (Frame 86509) Video thumbnail (Frame 93871) Video thumbnail (Frame 98433) Video thumbnail (Frame 107490)
Video in TIB AV-Portal: Measuring Performance on OpenBSD

Formal Metadata

Measuring Performance on OpenBSD
Alternative Title
Visualization of Regression and Performance: Know when something went wrong
Title of Series
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
When you try a new software version, something may be wrong or slow. You want to figure out when the regression was introduced. After updating OpenBSD you might see that something does not work as it used to be. To simplify debugging it is helpful to determine the point in time when the change was introduced and search for the relevant commit. For functional regressions a test suite that is executed on a daily basis is sufficient. By providing recent test results, the relevant day can be seen and the responsible developer of that area is informed. Making statements about performance is more difficult. The requirements for measurements may change, new test programs are needed, test hardware is only available for a limited time. So it is not sufficient to store historic data on a daily basis. You want to change granularity or look back into the past. For that purpose I have created a system that can create a performance measurement of an OpenBSD kernel in the past. The kernel is compiled from a certain CVS checkout. That may sound easier than it is as the OpenBSD kernel is not self contained. It belongs to a base system, there may be incompatibilities with userland. The performance also depends on the compiler version that changes over an OpenBSD development cycle. My solution installs an OpenBSD release and updates the kernel from source in fixed time steps. If there is a relevant user land change within the step, the necessary parts of the system are updated. This allows quick progress while compiling the kernel without rebuilding the whole system. With each new kernel the performance is measured.
Regression analysis State of matter Firewall (computing) Multiplication sign Weight Boom (sailing) Multiple Regression Product (business) Mathematics Process (computing) Software Document management system Energy level Software testing James Waddell Alexander II Active contour model Resultant
Regression analysis Complex (psychology) State of matter Multiplication sign Source code Water vapor Fluid statics Mathematics Document management system Electronic visual display Information Multiplication Physical system Source code Touchscreen Arm Firewall (computing) Kolmogorov complexity Software developer Multiple Regression Parallel port Website Software testing Data logger Resultant Web page Service (economics) Computer file Software developer Firewall (computing) Virtual machine Maxima and minima Multiple Regression Number Testbed Architecture Crash (computing) Computer hardware Software testing Computing platform Computer architecture Pairwise comparison Multiplication Information Interface (computing) Mathematical analysis Planning Basis <Mathematik> Line (geometry) Number Software Computer hardware
Regression analysis Installation art Serial port Mountain pass Multiplication sign Source code Numbering scheme Set (mathematics) Function (mathematics) Mereology IP address Blog Linker (computing) Document management system Kernel (computing) Negative number Convex set Extension (kinesiology) Scripting language Link (knot theory) Network switching subsystem Keyboard shortcut Fitness function Bit Control flow Statistics Connected space Category of being Message passing Process (computing) Chain Configuration space Normal (geometry) Website Software testing Personal area network Hill differential equation Escape character Video game console Resultant Router (computing) Ocean current Point (geometry) Slide rule Functional (mathematics) Statistics Server (computing) Game controller Service (economics) Link (knot theory) Computer file Image resolution Instant Messaging MIDI Virtual machine Control flow Login Graph coloring Multiple Regression Number Revision control Goodness of fit Term (mathematics) Computer hardware Program slicing Drill commands Video game console Software testing Maize Router (computing) Computer architecture Installation art Information Server (computing) Interface (computing) Kernel (computing) Software Function (mathematics) Computer hardware Revision control Video game Object (grammar) Communications protocol
Standard deviation Regression analysis State observer Building Installation art Code Length Multiplication sign Direction (geometry) Source code Set (mathematics) Bit rate Client (computing) Solid geometry Open set Counting Formal language Mathematics Computer configuration Different (Kate Ryan album) Document management system Kernel (computing) Network socket Core dump Compiler Programmable read-only memory File system Series (mathematics) Physical system Social class Source code Email State diagram Channel capacity Mapping Outlier Block (periodic table) Software developer Moment (mathematics) Interior (topology) Data storage device Parallel port Maxima and minima Control flow Measurement Mikroarchitektur Connected space Befehlsprozessor Chain Compilation album Configuration space Software testing Right angle Resultant Reverse engineering Point (geometry) Slide rule Server (computing) Socket-Schnittstelle Computer file Patch (Unix) Virtual machine Calculus of variations Branch (computer science) Average Event horizon Number 2 (number) Fluid Computer hardware Software cracking Video game console Software testing Configuration space Data structure Proxy server Booting Compilation album Module (mathematics) Default (computer science) Pairwise comparison Multiplication Quark Physical law Basis <Mathematik> Line (geometry) Symbol table Compiler Number Kernel (computing) Software Personal digital assistant Computer hardware Revision control Musical ensemble Operating system
Randomization Installation art Multiplication sign View (database) Execution unit 1 (number) Function (mathematics) Document management system Kernel (computing) Befehlsprozessor Single-precision floating-point format Core dump Position operator Rhombus Link (knot theory) Sound effect Bit Lattice (order) Virtual machine Compiler Order (biology) Buffer solution Software testing Right angle Quicksort Resultant Laptop Slide rule Statistics Computer file Patch (Unix) Real number Virtual machine Calculus of variations Drop (liquid) Number Mach's principle Computer hardware Video game console Software testing Configuration space Booting Socket-Schnittstelle Information Line (geometry) Cache (computing) Kernel (computing) Software Object (grammar) Musical ensemble Local ring
Intel Statistics Scheduling (computing) Interior (topology) Direction (geometry) Virtual machine Number 2 (number) Mathematics Document management system Befehlsprozessor Oval Network socket Core dump Software testing Booting Compilation album Socket-Schnittstelle Sound effect Maxima and minima Sequence Kernel (computing) Software Software testing Cycle (graph theory) Table (information) Resultant
Group action Existential quantification System administrator Source code Tracing (software) Atomic number Different (Kate Ryan album) Document management system Kernel (computing) Core dump Videoconferencing Error message Multiplication Channel capacity Software developer Sound effect Bit Unit testing Message passing Befehlsprozessor Cycle (graph theory) Quicksort Reading (process) Web page Slide rule Computer file Control flow Online help Drop (liquid) Student's t-test Element (mathematics) Number Frequency Differenz <Mathematik> Computer hardware Energy level Data structure Traffic reporting Firmware Computing platform Computer architecture Form (programming) Server (computing) Plastikkarte Client (computing) Line (geometry) Vector potential Musical ensemble Table (information) Library (computing) Regression analysis Multiplication sign Parameter (computer programming) Mereology Mathematics Coefficient of determination Bit rate Synchronization Semiconductor memory Befehlsprozessor Visualization (computer graphics) File system UDP <Protokoll> Stability theory Scripting language Algorithm Email Computer file Moment (mathematics) Flow separation Measurement Website Software testing Metric system Physical system Resultant Laptop Functional (mathematics) Game controller Overhead (computing) Service (economics) Image resolution Virtual machine Graph coloring Wave packet 2 (number) Revision control Causality Operator (mathematics) Software testing Message passing Hydraulic jump Turbo-Code Graph (mathematics) Variance Vector graphics Number Kernel (computing) Grand Unified Theory Network socket Computer hardware Object (grammar)
Point (geometry) Regression analysis Empennage Server (computing) Functional (mathematics) INTEGRAL Code Mountain pass Connectivity (graph theory) Multiplication sign Execution unit Virtual machine Cloud computing Directory service Function (mathematics) Client (computing) Open set Revision control Duality (mathematics) Network topology Blu-ray Disc Document management system Kernel (computing) Computer hardware Integrated development environment Software testing Relief God Source code Suite (music) Building Server (computing) Computer file Content (media) Client (computing) Mereology Unit testing Orthonormal basis Hexagon Befehlsprozessor Process (computing) Function (mathematics) Interpreter (computing) Website Software testing Musical ensemble Electric current Library (computing)
Game controller Image resolution 1 (number) Virtual machine Function (mathematics) Revision control Architecture Mathematics Network topology Message sequence chart Document management system Kernel (computing) In-System-Programmierung Single-precision floating-point format Core dump Metropolitan area network Source code Suite (music) File format Content (media) Mereology Line (geometry) Cohesion (computer science) Measurement Data mining Process (computing) Function (mathematics) Universe (mathematics) System programming Video game Software testing Electric current
Computer file State of matter Mountain pass Real number Multiplication sign Computer-generated imagery Stack (abstract data type) Product (business) Number Exclusive or Bit rate Document management system In-System-Programmierung Software testing UDP <Protokoll> Booting Monster group Scripting language Curve Email Electric generator File format Moment (mathematics) Parallel port Kernel (computing) Software Order (biology) Configuration space Right angle Reverse engineering
so I'm my name is Oksana boom I'm gonna
be used to develop herbs for more than 10 years now I work at the company in Munich we build firewall and VPN appliances based on OpenBSD so it's also my job to make OpenBSD reliable and fast and suitable for our products so what I
want to do want to do is detect changes in performance in all that meets the when they occur so when when everything is something is committed and it affects the performance level to see ok there was a difference and and also see how what does it develop releases that we have the regressions here he said something happened that is good and I want to measure a real-world performance also it's basically network performance that I'm interesting interested I started with the network state performance not the forwarding performance it's a different story and so I want to see how does our team TCP snake behave so what I will do first of all I will talk about whether they are we coming from what did we have before it will also talk about my regression tests then how does this performance testing over time work what results did I find and what's the perfusion what I want to give you on your weight so starts with
the state that was before I started the testing performance so in income has come near work for we have a high performance firewall test that and we use it for multiple purposes you want to generate numbers for that we control our customers that are basically high and the marketing all of us wants high numbers the developers want to know how the system works and then we want to see when we buy it when needed maybe choose if you have a provisions does it go faster doesn't know not faster what are the pensions it will be released to be have regressions there we want to have poverty performance again comments and everything so this everything resulting in bad results for that what I want I want results for developers and what you see here is you can't see really what we are measuring we have some thoughts struggling around you don't know why you have some some lines in there and it's also unclear by the way we are testing the the 6-0 release always the same software get very varying results and nobody knows why so the the Harper setup is also right rather complex because it's for multiple people doing testing the same doing tests at the same time we have different requirements and just told you before and you want to test for different hardware so we have several machines installed the software and it works basically in some linux service and he had some old machines they're important ten years ago and so they had only one gigabit interface and put them with a big switch I know with 10 gigabit through those those targets and measure how much how much for water VPN performance we get there and if you do that it depends on how many other people are working simultaneously then the switch dropped some packets or not and you have 20% variants in in in throughput depending on the face of the moon or whatever so what I want so the problems are we have too many requirements I told you the beginning we have too much complexity complexity which is bad if you want to reliable numbers and if you want to dig down what's what's really the problem there you have not enough flexible flexibility because it's all setup in a static way and other people are using it in in in parallel if you start meaning that the cables around and reconfiguring the switches other setups break or you don't have enough permissions to configure the screen the switch or people complain when you remove some cables so what I want to have a set up this is so I've known these English in here I have another about an SD machine here and I have only a cable inventory don't switch no nothing so there's much less things that influenced my microsites and everything I met her is somehow based on the open BSD I want to get information about so the other thing I had before is my regression test I worked on that I started that I think four years ago and I ran the OpenBSD regression tests on a daily basis those regression tests existed before they were just check it to the source tree and every developer could make could you make in some subdirectory and would pass or not but there was no regular execution of them and a lot of them were unmaintained or just failed so what I started I can just took the test that already existed here and run the daily I can run them on multiple architectures to have a comparison how does it work on 32-bit or 64-bit and I'm planning to extend it on spark and we have millennium began yin and what the nice thing is when you do it daily and and record it and publish it on a web page that you get history you can say okay three weeks ago this test started failing so what was going on there I can I can leave that to the commit or to the sauce change that caused the problem and that's something I also want to do for regard for the performance and what I also do is is collecting useful information I create log files you can download object files from the from test run you can debug it more or less online and if if it's really complicated and Scott also login at the Machine and do your your analysis there and everything is accessible you just go to this website and can see the results of the other things so here's how it looks
like but we didn't fix it so what we have here is a is a display of everything and it's quite red because we have new architectures here that's arm v7 an arm 64 if it doesn't crash during the test run and I have spent too much time on those arm platforms of salt 5 quite a lot of them is red and we are working on that so what what I show here
or what is told before we I want to have links and information on what's going on so I put a lot of little AC in here for example the ROM history you can click on that and you see when each test was executed how the machine was set up so what I do there get a snapshot from OpenBSD every time and every day or so Theo de Raadt builds a snapshot with the current beliefs and they're not released with the current software version and I just install it on the machine there's now two installer for open B is the end I can just set it up the config is setting and here you can see how how it installed and if they if they're broke anything or if the other installer is broken or snapshot is at some problem that we cannot install so then I have a link here to this CBS back of the text so here the tests and you can just go in there and see oh did somebody commit something in the test so that's the reason why it broke you have some some color scheme to show why what what is the past is always green yes Brett is failed you have skipped and that means there's some something mission missing in the hardware setup so it cannot run there so we have architecture-specific tests for example years and GCC tests and on our 64 we don't have GCC so we skip it and there are some other reasons why I skip it we also have some tests then use that do something with the keyboard driver and if there is no mouse attached and it's skipped and if you click on on such a fail or in the past you get the link to the to the log to the text output so again can immediately see or get an idea why the test fails so here I have a link to the locks of the setup of this specific thing so this is the date went away and I ran around it when I run it see if this one is the most recent one so make the slice last month and here every day you get four new architectures and when you click on that you get the locks how it was installed and you can also download object files that were generated well test run and perhaps you see get some more information from there and I've also have a cron job who deletes it at here here so they're covered a little bit here the test results only deleted so the drive doesn't fill up and here you can also click on it you get VMs from the machine and see what person was actually tested to be able to come along and what's what what's the test art so what we get at for me now I clicked on the only show me the i386 tests I've had it six because that's the first architecture I started with and so we have the history here today we did run the test so we see okay this year's US has some problem forever and here this little Cocteau something here between 13 and 15 went wrong so here has to start in the investigator so those GCC tests failed since we switched to the plain linker because it's missing some features and so it basically chase but on some warning of someone unsafe functions are emitted by the linker and that doesn't work currently but then is promised to fix it and here with other tests the past some pilots and fail sometimes that basically means it's not a stable test there should be some more investigation make it more stable so there's a fragment test your sense on some IP fragments but escapee to another machine and maybe either something it's get lost and when they get lost the test fails and it's a little bit Fraga or fragile so but the other thing I do is I sort them here so if something so these tests I fix it four days four days ago but if this tastes here for example it doesn't terminate so it takes just takes too long and it takes more than an hour after hour every test is supported and so it's there's no term category so we have some failures they are sorted here we have some pass here that they are they fail in some other architectures that's why they are sorted here and here we have some tests and and we have a lot of great passing tests down here I was cutting away from from the slides because didn't fit and the interesting thing for forgivers is that here to concentrate on that and fix that so I
run all the tests on from harm and so most of them are just running garnishing so we have some convex tortillas and userland Authority or some tests here and use a source you just type make and then something happens and either passes or fails but there are some other tests that are more complicated for example and even tested tests are your protocol and they have a remote she and I sent some skating packets straight some think that our packets and look at the upper plies and check whether whether that fits so you can do to the test with two machines this one is controls everything get the best script is run to make try the run and its main login ssh to here to make some setup and it may send some some packets here that i'll do the actual test and other tests that have more machines involved the most complex one are testing PF forwarding redirect chain and fragmentation so i I run the PF the PF that extensive it's basically run here so it can send a packet here then he goes through PF gets modified or reassembled whatever a test I'm doing here then it goes to a router I need the router here because I want to do pass on to you tests so this router has a little haven't you government and you at the interface it's only 1300 so the negatives be fragmented somewhere here if it's ipv6 over here if it's ipv4 then it hits to disarm this is always either were just running a normal thing service of the console is it it's just return and it also runs the the actual service in part seven if it's a good EP packet it's just recharge and there's also TCP echo service you open the connection everything you write in gets back out so I can just write these be streamed through this and here it has to do some pop and you hear the PFS room to do some magic things and then- back if it works if it doesn't get the data back the test fails so how do you set up there it's all a little bit about it so I have this master here that's also the webserver they can as results and this master controls the serial console server that also handles all the installation so here I can say ok or install or upgrade all the machines and it will install the latest snapshot on these machines with the hobby installer and the OpenBSD installer allows to put a site TGT teaches that somewhere that's where all the configuration files in here they are stored here and here it gets all the configuration files and then it gets the IP addresses and this configure statically so that is set up to run those tests ok now we switch from regression testing to the Fonz's what didn't you do fix the test so what do we want first of all we want to see the history when did things get slower or faster we want to be have reproducible results so if you run the test once get a number to run it again I want to have the same number that's pretty hard I want to have the details so I can see ok in which situation was the machine was everything's that are correctly also see which which commits were in that region and when what date of the kernel was run the air would userland is running there I want to have to drill down so if I see ok one month ago something broke or got slower then I want to drill down say ok which commit was it which our was it what what did really cause that and I won't have it somehow automatic so I don't want to run everything manually either performance test around by a cron job and it's just in the background just needs a little bit of babysitting if something breaks hopefully it doesn't break doesn't any nothing breaks and so I also wanted to automatic and I basically I do the test runs manually now but I have setup chromed up that always does the last week updates the last week of commits every week so how does it work so yes I installed an open BSD released and I have a defined point where I can start from having Colonel installed and userland installed and it's publicly released and every everyone knows what it is and then and check out the colonel at a certain date so I want to test this Colonel Burton then I compile it I compile it on both machines you remember two machines with a cable in between it and blue the same thing from both machines so I compile it then I run the tests and then I can never tell talk about how much I want you to step forward so it may be a day or month or week or an hour usually when I want a day a day based setting is a good resolution so I step one day ahead and do the next checkout compile the kernel run the test and they do that until I'm at the end day that might be in the next release and then I correct connect the results to make some some statistics oh that's the next slide so basically I want to avoid to rebuild a user life because it takes forever so I'm hiding the kernel is two minutes compiling the user land is two hours so
how does the result look like basically I copied the code that that generates those results from from the regression test so it was quite similar so it behind I have here is the run history so you can see okay when was it installed and how was the kernel compiled and everything we have here from some some details in this line is playing later then here we have the the tests are wrong basically it's an i / 3 from pork system I installed there I ran some TC Peters daltonism - our option that normally when you have a knife server and a client that the client connects observer and the old data is sent from client to server but mean that - are it's in the other direction and we run some TCP bash its TCP T to be managed here it's with hundred simultaneous collect connections here is only one connections we run I curves with unity and reverse and I have two tests at us so we had one tie stock setup with one kind of machine and there's not a capacity appearance coded here somewhere so the one machine is here and the other machines then here I compiled the kernel and measured time how long it takes the main idea because that is that I got so what we did before before we did some performance testing people compiled a their their machines so developers compile the kernel compile their they use the land and they always say oh it's getting slower and to to measure that it's getting slower I also a common compile meant to make clean and then it runs jj4 on the for cpu machine and j8 on the execution and also do some disk drive file system perform but I'm not satisfied with the others have to find you so now when you click oh no I forgot to explain this so here we have the date when I run the test here it's the birth night is also in this case it's 6 for release in solid and then I have the first CBS check out that's here so it started here at 16th of March and I stopped at 23rd of March and it's a text proxy between with one day's day and I would need to test five times I give more detail about that later so if you click on bottom those such test drugs or if you plea yeah get to the next slide that shows what happened in one of those test runs so here you still you can have those those blocks you can look in more details and here's the checkout date so here this texted it running in February and I checked out October 2017 and here checked out and since 4th October that's 11 October 18th October 7th week step-step weekly through the things to release a weekly fluid so what they do now I have to check out a and between this day and this day there were 49 common commits and if you click on the CBS law you also see all the commits I go through the CVS history and collect everything and show what was between here here like a Lebanese language show it and then we have those finished verbs so I spoke asked already what happens with use of that I have to recompile the user line from time to time things that that PSD can be compiled the girl can be from aquatic system that's not the case with OpenBSD the OpenBSD carmel requires that it's compiled and built on a system on a userland system that matches more or less at stage and so I have to update the user lengths so that it's comfortable to the kernel so basically what I have to do is recompile the kernel they recompile the compiler the plank compile and when it's updated and that's also good performance wise because I don't know I'm not only interested in the performance but I also want to know what do the compiler and its options have and the influence on the performance so I also measure the important steps in using so every time I work then I know ok here I had to do something with a youth event I'll explain that later in more detail so and now here give the numbers of these tests the iperf tcp in forward direction and i repeat those tests multiple times to get more stable numbers but my absorb variation was that the test goes up and down and gets faster and slower and so I tried to do it here five times and here present average number so quite right for network performance through good actually the maximum number so let's maximum and when those repetitions if results that are not stable that they are jumping a lot then I print them in right here so this one has a lot of jumping in it you see it's very red so here we should look into more detail so I give you next slide and you go to the details of all the repetition so you see here it looks like hehe we had some steps of not files and here it's repeating one one two three four and here we have some strange strange values all of them are red you can't beat two numbers but here it's for gigabit throughput and here it's only 1.6 so what what happens this is a core machines with two sockets and these two socket machines are very unreliable deformed wise and it happens that once a while the throughput goes down to 1.6 gigabit and I don't know why reboot it because I'm against 4 3 9 4 3 8 106 create and if you've rebooted it gets into this state and if you rebooted again it is out it don't know why sense so he end up in the back and I also printed in red when it's too big so I see easily in where where problem and you report that to the other side so it
can also generate some graphs out of that so here we are in we install 6 - yeah and we run until 6 3 and we make one measurement every week so she told you I have several tests that have shown us different symbols here there's a an explanation on the website but it doesn't matter which stresses which and what you see here we have this that's the occasional what 6 through point nobody knows why and most of them are in this band here you also see that the numbers are jumping up and down so it's not one one line here we have one line firm but you're here on top it's more or less the different tests also gives different results and here we had one test the tosser I'll explain later what is of just remember we saw it here what I also do is those two vertical lines that are the clerk's when I touch you the lens to make Colonel compile again or to change the compiler or whatever then I have such run I have a static file with the build cracks where an update when sex okay at this date we have to update the compiler if the previous test is the prom date and the next test is I've put the date in the meantime the compilers recompile so here's a small example of the clerks I have had in the previous slide so I to make it easily detectable I also have a line for the release date I don't feel anything there just want to show you where the leap really well so the first thing I had to do I want to check out the compiler and I saw that there is a pocket CBS and you cannot check out the lender branch if there was a commit on the non vendor branch later back in CVS so I think see yes - and edit a patch to the clerk's and recompile - yes then I can check out LLVM that was released 5.0 at that point and recompile the caramel is where what we also have is there's a header file that's called PN bar and we have the PF structures in there so when we change to be a structure in the kernel the PFC GL to change the kernel has to be compatible otherwise the update of the PF remove fails and usually you see that in the performance difference because I I'm running with behalf of the default role and if during boot opf fruits can be loaded they can get a different gift set and then we see our performance and so I figure out and I look at the rock oh there's the problem then I add a clerk to say okay here when we have the peel bar header file changed then have to recompile CDL that's rather sharp it's just a few seconds to recover so the next problem he had so somebody committed here a driver and he made a mistake and so the treat was broken it was only broken for half a day or so but if my measurement just hits that moment if you can't compile the kernel so you have to look at the fix that's committed later and patch that the sources that were before and then it come eyes again so when it here here was not a PFC GI change CTL has the same problem if the 60 TL strike strange it doesn't work anymore that may have some influence how our system was booted what you see is perception good so if that changes as just recommended here the updated LLVM again here we had some VF change and here we have Yogi's series so that's how that works well that's the steady fire everything it's a module where describe all the things that change things around it so testing a release on a daily basis from one who eats the next takes me more than a week okay only compiling the column and it would be ten times slower if I compile okay now here it's jamming in the in the state diagram how the the userland building works so we come from here check out the kernel compile it run the tests advance and now we have to look okay what's there twerk if not we can just go check out the next day if yes you check out the userland at the day that's given in the build first build our tool chain and then continue yet as explained just so what I want to do next what I've already map is I don't have only one set up a few setups the difference between the two is here I have two sockets eight-course and give the same hardware with one socket for course so I ripped out one CPU here physically and then the performance numbers are much more stable so running on a two second CPU is dead especially if the open operating system is not aware of a CPU architecture and the new Maya so what I also want to do I've installed the machine but nothing more you still have a Linux here to make some performance tests and compare tool in Excel then we have stable anoxia it's always the same and we can look up what's the difference when we only change this but it's it's not running yet just class so how does it that's the installation installation works it's the same setup as for the regression tests just different genes here so if his master that publishes in the back says the consult sir okay install the release and started here and it stores it here and from the master there comes the configuration and there comes the recompile come on so it says okay here login make check out other B and check out edit again build it it fills it in parallel and it goes here and says around the test so the running the test is done locally you can just also just log into this machine and say
ok now and then you will get you can look at the machine however how it's running the test and what what the kernel is doing and what top is doing and take some statistics there so you can really dig into it and that's the strip that's just running here and doing all doing one test run so does the I prefer to test the TCP deck bench to make timer and the positiveness and it also locks metal in here and say okay here we start the iperf diamond as a receiver okay now I have some results so you remember the the slides before we had one that had a single performance peak and it was weekly when it didn't Alice I did the same test but you did it daily so I just took fingers two weeks yeah it's 14 days 14 days and they to check in India we still have those those strange numbers of machines we call that for now and now we see our is here and I could look at all the CBS comedy right here and I found out that dlg and OpenBSD Kumiko committed its meeting actually here where he connects multiple packets and sends it to the network at once instead of doing the hardware output for every packet and it was committed here and it was back out heat back out here because it broke suspending some resumed ones on laptops so what I can see it is okay that's the very valuable or interesting approach to to increase performance we just have to fix this problem and I wrote deal Gmail and he fixed it later so then there's not a thing yes that's on the later slide so what we what we have now in OpenBSD be here recompile the kernel every time we reboot so what I was trying to figure out why we have such changing numbers especially sometimes if it goes up and down and it in the second one so I disabled the recompiling the colonel because recon planning the colonel is done in the background after the machine has booted up and using starting jumping back up ground drops while you do promise testing is not good at unit so it disabled it and then I asked myself what happens if I own it enable it so I have three different things now what I can do when I repeat the tests so I come here compile the kernel then I run the tests then I can repeat them and run them again and run them again and run them again or I can say okay what happens if I reboot the machine in between run the test reboot the Machine run it again will the Machine run it again and go this one or I can say okay what happens if I relink the problem relating the crowd and OpenBSD means that we have a random sort order over object paths the idea why we do this is we want to avoid and that you can do blind drop attacks thank you when you don't know where the object files how the object files are sorted within iraq within a bio-memory it's very hard to guess where the gadgets are you do further up and that means you it's very hard to exploit the problem and that's why we relieve the kernel in a random order at every boot so every often be is a machine has a variation of the ground layout it's thousands I think and you have possible orders so it's very large it's very hard to do to hit the right ordering so I think I try to do that so now what happens I compare the suicides over here I install these four run it for five days do ten tests each and keep the machine running at each each test so here that's ten tests executed here ten here ten years and here and in between I recompile DeCarlo and that to the TBS check out but it's just after release here's the obesity release and we were in the car lock and no nothing performance relevant has been committed there so either there was nothing humanoid just some things and but you see if it's going up and down a little bit but it looks quite stable then what happens if I reboot the machine between tests you see it's going a little bit more spreading a little bit more but it looks quite similar and that happens if I relink the kernel and sort and reorder the object fast you see it's a large spreading here and I guess that's locally locality of the of the caches where we catch the execute instructions and if you reorder the instructions the the cache locality is bad or at least random based on the on the layout of your con so but yet to care about and you do performance testing you have one kernel tested in a number then you apply it and recompile and test it again you may see this variation and not the patch you're testing you you have to repeat the testing doing it multiple times to see the bend of the what the the bend of the one target and compare it with another band in my view are the buffer below just by recompiling the kernel you get variation and when you were rebooting the machine there were removing the machine with it Colonel real inking on reboot to say yes disable the really key every time otherwise the really insane yeah and the relay team would run in background when around my tests that would be bad awesome so I disabled really and reading explicitly if I turn it on so it's just doing it in front o tester paper so and now I go back to my to CP who's talking machine so that's a core machine because I have another effect here so their info 6-4 I do two tests for 50 days and five tests each and we have one line here and we have another line here sometimes so what's that it's another unexplainable phenomena so when they
click on the tables I see here in the second cycle that I have those red numbers and I have only the red numbers here and here and here and here so and it's always that it doesn't matter which one I take it just have to to select one
of those with with those those numbers here and if I click on it I see this
result yes right and there's red so the second run it's slower so a reboot here and we'll keep that means I don't reboot the machine and here it's lower and it's only slower when then sending it forward Direction this is the minus R that means the iperf is sending the packets in the other direction that's not affected but if it send it in a forward direction it gets slower and what I figured out is I ran the tests in sequence so I do the the network test and then I run comma comma was made - j8 and this - gah means that an egg compilers running on all course and it effects the statistics that are relevant for scheduling and after I compile caramel the iperf is scheduled among others core and another socket and the phones rocks which didn't happen before okay that's halfway explainable what's even more strange is if I compile the kernel again this effect goes away then they are always scheduled on the first socket and then I have the maximum performance again so I often will get out of the ASCII tennis and couldn't believe that it is that way and the int thought that's very strange and get me to this that in front scheduler that didn't change anything so that's so what
did I make do next I did run from six three that's here to six four that's here and that's 200 two days it's a day resolution here so you see the colors are different the reason is that same hardware but only one CPU with four cores and then we don't have those those numbers here we don't have the second line here so what happens here we install it here and here it goes down I don't know why and this end there we edit read proline read Quran is a [Music] spectral mitigation we got recommended by angel and here in NY recompose the compiled separate line work around and [Music] poverty so I don't know what this is that's probably a metric read error then I went on and here we enabled witness witness is running to find out if there are any problems in the current movie got it from that from FreeBSD sorry FreeBSD and saved some second traces in the background got some checks on I'm walking and if there's a problem in Penix and gives you a result so it was just the impacting thing that we enabled during the release during the development cycle and to get something some result so that users use snapshot see that the see those stack traces and report them and we can fix it public park is always off I'm su sitio too so I have a bunch of some city knows the town of anything that make might affect performance I try to be very fast except for one thing in the BIOS I disabled the CPU turbo thingy so CPUs can change their CPU speed and when it's cool and I say ok don't do that don't get higher when it's cool just keep the same level to make the numbers constant but in the kernel I disable all the debugging aids like to debug and with the CDL to get stable numbers because normally for delete release we disable it and before we enable it and you would see a jump here but I you don't see it so what we get here is we turn on witness the next one here is we enabled red : no not regular red protect so red protect is a feature from from Morton red car red car sorry red car so it's red car so it adds some additional checks when we leave each function to see if the SEC partner has been changed and that gives a little bit of overhead attention here and what we did here is we turn off witness again so witness was just a temporary experiment to see how do we have mocking problems and the car will return the dog got some reports fix some softer and Afghan so we may thank you so we have stake here is red polo and red car here I've added reference counting to the piece again let's think of it like a stocking stuffer structure in the kernel so they are replica reference how did now and the thing that's this little phone struck you because it used atomic operation in synchronization between so now we go on that's the next release that's one six four two six five so here we are in April this year the former ties that did not much happen except for eh sorry for merely for hatred but it happens here is at the I notice the elements smaller there we disabled stick perfecto if red card is enabled so the red card is the more paranoid check in the stack and then you don't need the SEC protector anymore because every thing the secretary would find red card this little performance up to make sure that may be related to that the main problem is that we don't see precise numbers but we always see the stand and if there's a huge change we can see it but if it's only a little bit it's it's hard to see in how to guess what's going on I remember the pages we edit some feature that make some video we get comments actresses we get a report from ru so we see the movie TV and some architectures print the arguments that were from factions when you call them for example a 3ds it started because it takes a little stack and you see the non stack and if 64 doesn't do that because everything is past registers and the sec trains cannot find the register values and then we added a feature that we print and then we write the register values to sex so that you can see the arguments that were passed the function from the set price and we enabled it here - that's eh it's a little small drop so
now we have ceased fine and as I told you I created the slides here so that's the last memory and you remember the TX mitigation from tlg I wrote in my mail says hey as a performance increase and dmg said okay let's try to make it in a way that it doesn't break Claudia's laptop because it was cloudy didn't I see so it comes back out and he implemented it again and we say say jump let me saw two releases before and it was back up still there hi I want you to tell you about the CBS la so between every measurements I go to the CBS and get all of the and sort all the changes to that are relevant between those check out so I have the beginning check outlets to make the first measurement and the second check out and everything in sources that was me Deb it was a commits and they are or consecutive here and a shawl which files were attached and you can click on the diff and we go to the CBS back we can look at the difference of the file here you see the commit message ends here is the TX medication that's the one that increases the performance so you can just from from the from the table its monthly to this and you see what was going on between the two Carl between the two columns so I want to summarize my findings okay all this measuring sucks it's a lot of work and it's strange things going on and it's hard Monte see the CPUs size performance wise especially when when you want to accept precise numbers the reproducing numbers is very hard and even if you do it's quite unclear what are the effects behind it do not trust your numbers test it again see that you can reproduce it and keep the setup as simple and stupid as you can otherwise you don't have a chance to get the the cause of the changes because we have too much capacity so what else can be done nowadays I'm only testing between two openbsd and the sex I could also add another machine and measure something forwarding I can set up this Linux machine I've shown before and use this as a gliding service our form stable Linux release and tests all open is the variance between it or I could use previously if you like then the UDP test site I run UDP tests but those numbers are really sockets quite hard to say what does it mean if you get a number and you repeat because do you measure what you're sending or what you're receiving and if you send to match the kernel is busy with reading things without receiving things and the eye part will it is simply a group of 200 of the release version I have three different I park there and they all have different algorithms to measure UDP so that's there there's potential to improve so what I also want to do is when I get a dip then I want to see does this diff make the performance better or not and I'm gonna have no way to do that automatically right now so what I would want to do is put the dip in there around the test and have a website that's okay with this do this and without to do something else I want to go farther back with releases look at five nine five six one zero six one just comparing them the file system performance the main problem with this machine's is that we have great control ants and LSI controller which doesn't like the driver and firmware and the performance was really bad and so either I have to change the rate controller or fix the driver or take another hardware and then I can do more reasonable cause of performance and then this line added last week because we have this bed of results because of the sorting of the kernel and I should add another moment where I sort the kernel object files in the constant way so they try to get more reliable numbers even if applied itches or change to check out late so I didn't do everything on myself had some help so there was junk lemco he does the hardware administration for all the machines and creates the possibility to stand up there then more in spool is the student working for me and he did all the guts with the new clock to get those nice graphs so I didn't have to dig into that and anyone on my employer is hosting the machines giving the electricity and gave me work time for it so if something's for you so that's the reader says resulting performance s you can also get the raw data of the performance test it's only new platformer that just just the numbers down there and then the new pod scripts get the right numbers out of it to create the graphs I've committed the poor kids that create that on give up this is the strips that do the installation of the the harperman machines and that's the source code of those large slots so everything developer fault would be nice to test so we have regression test for things we have also much in because those libraries period I have written some tests for Safari roads more or less and it's their component tests will have
places where we've decided we know whatever what exactly what should come
out of a TLS handshake at these steps and so there's literally hex encoded bytes of what the handshake should be and we brought it and we compare the output that would be like a component test and yet you probably got something more lungs of a unit or an integration test so we we brought bloom over and fed in a lot of stake sack I'm on my couch and he worked with the ports people and got it to where that same regression test there's also stuff like effectively God where we could brought how all the versions of OpenSSL installed along with Libra so sell and do compatibility tests connecting between all of those so it's kind of a it's everything from like component
tests for functions to effectively almost you might call in to end copy the process and test from FreeBSD and then they were promoted well they also run on NFS they're running effervescent NFS and that's the test that Bob just mention here it I run the the the client and server code he's a client this is a client server and in compile it with libros l open SSL and SSL for one and two across an interpreter with interoperability test so I take the same battery provided with three kinds of libraries and then the other side and compiling with three kinds of libraries and do all yes yes yes so we started for you help removing the big locks into the open B so the basic problem in the oven beastie column and that's why I did that because we want to get beyond the amount of the point is that we have locking that only allows to networking on one CPU at the same time and you have a lot contention there that's why it's so I have a revolt machine there so that's a sale machine from 2012 so in any way the OpenBSD lab I collect all the hardware that nobody else needs anymore and run my tests there and in more less doesn't matter for the regression test it doesn't matter at all and for the performance test yes I need a machine with a lot of CPUs to make the CPU problem visible but if it's ten years old or very expensive [Music] [Laughter] [Music] so the regression tests run everyday and the performance tests I just go to the website
some reason so what I do here and cron job that repeats the last week so what what what I do is when I say okay I want to test a release I have a maid line it's a mega control my make fire so I can say okay do this here make something and then I can find you mid so a little make mine I can make me - end and it prints line and it says okay it's a day step and I change the day to three hours and it's a change to start in the end and then then I fired also at the release I do manually whatever I have edit is a cron job the does the last week here so every week every Friday against the daily resolution of the last week and heads down to the total slot so that's the size of all measurements see here and and then you can click on the other ones single measurement and you always the last one and that's the bloom honesty yay in core machine and you see it's lower than the four core machine because we have this big lock and then it's scheduled in the wrong core and then have more lock contention and then it's over by the way also have some make outputs it's UDP and that's total crap that's the unity don't don't trust the universe II don't trust the unity so that's the machine that the eight cohesion or yes that's yeah core don't know what that is maybe that the team's mitigation may be faster for TCP is lower but but that's just a guess yeah so so he pees describe anybody I get one or two gigabit I don't know why the the Buddhist is bringing their all the version of iperf printed completely different format so they are missing here we get to release missing because they in the output homogeneous change and the change the how they interpret the the lost packets okay we have a question there I wanted to have multiple tools to to see if one to enhance back yes yes so what what what can do here move its life to scroll why is it so
crappy so I have to go to the no money you have here perhaps when I make it bigger so now I had it so you can have the TCP bench only you can click on this and the hyper finish and now we are only testing TCP bench oh no Meah interesting ok so you can choose what you may want to consider so and I always set the CDL to turn it off yeah but just compiling it in emails oh yeah so that'sthat's the thing you saw there but it had no particularly way to edge it out and I also wanted to show what's the thing that the people built the kernel at that time see ya so FreeBSD has an exclusivity disclaimer when we boot up with witness and in the configuration file because certain idiots Horrocks run just with witness enabled and then complain about that performance witness is fantastic to order reversals but if you're interested in performance started off because it'll be we use those packet generating tools in the company for our forwarding and VPN tests for our product and we have a network stack that works in parallel the other weekend much higher performance before OpenBSD as I'm doing TCP here that's really hard with those Pickett generators and nmo I'm using stack performance I'm not doing forwarding performance here and doing a real TCP they don't do a real TCP handshake and generate packets there and so here I'm just flying OpenBSD with itself and every problem I see is somehow hidden but they're still the problem if you metal to the stack what do you measure do you measure the number of packets that in the same format there receive criticism and if you send yes yes that's more suitable for pouring for morning didn't you do forwarding yet and you also have to be careful so what we do again while we say we increase the right until one percent of the packets get lost and we take the receiving rate of that and that's our UDP performance but that's not that what the I curve tool is doing there so so perhaps it just writing our own tool for the state or even with the I perk 3 itself I've seen a lot weirder stuff on previous T in the last year or two then I had seen in the past just just stuff found wanting to work about getting blocked at magical bottle racks and stop that even not my perfect Reaper if I cook through the whole nother monster but I just wanted to throw that out there mm so what's possible is that and I can repeat those tests easily so I could have a script that runs all the tests it's no big deal to add another test and running the whole thing in the past again because I can generate the history at every moment so any more questions