We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Benchmarking FreeBSD

00:00

Formal Metadata

Title
Benchmarking FreeBSD
Subtitle
Benchmarking - what not to do and how to avoid it if possible
Title of Series
Number of Parts
26
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
System optimization and tuning is tricky business. Benchmarking such systems is even more complicated as the number of things which can go wrong at doubles at the least. FreeBSD is best known for its killer features, but it also includes over 2500 sysctls, most of which can be tuned to do something interesting. This talk aims to give an overview of some of the more interesting things which can be tuned in FreeBSD, and advice on how to avoid the most common errors in benchmarking FreeBSD. Tuning a system heavily depends on hardware present in the system, so all tuning advice will necessary contain system-specific parts, but overall there is much to discuss when talking about optimizing specific systems: networking, storage, file systems, even the CPU scheduler. Networking is still very much dependant on the quality of the NIC and its driver, but large parts of the systems such as ZFS are pure software and can benefit from tweaks and tuning which slightly alter the behaviour of algorithms. Things get very complicated when comparing different hardware configurations, and even more when comparing different operating systems. Doing a good benchmark of two unrelated operating systems is tricky because it requires similar tune-ups to both systems. This talk will try to explain where the pitfalls are, and also present some field results.
Value-added networkBlogFreewareTrailBenchmarkSystem programmingSystem administratorSoftware developerLevel (video gaming)Server (computing)World Wide Web ConsortiumComputer hardwarePort scannerSoftware testingCore dumpSoftwareBit rateApproximationMachine visionMetropolitan area networkMultiplication signFrame problemMeasurementBit rateCartesian coordinate systemMathematicsNumberPairwise comparisonSoftware developerBuildingSound effectOperating systemEmailRevision controlPixelOrder (biology)Invariant (mathematics)Server (computing)TheoryMereologyOperator (mathematics)Database normalizationResultantModel theoryDescriptive statisticsComputerSystem programmingInternetworkingProduct (business)Enterprise architectureService (economics)Presentation of a groupComplete metric spaceForcing (mathematics)Heat transferRight angleoutputMedical imagingRoutingBranch (computer science)InformationElectronic mailing listBlogLibrary (computing)Computer fileBenchmark2 (number)Arithmetic meanFrequencyComputer hardwareGoodness of fitSystem administratorSoftwareSlide ruleCache (computing)Hybrid computerBefehlsprozessorDifferent (Kate Ryan album)1 (number)Core dumpComputer animation
Regulärer Ausdruck <Textverarbeitung>Standard deviationOvalNP-hardComputer fileSystem programmingMetropolitan area networkFlash memoryBenchmarkStatisticsHard disk driveNP-hardSlide ruleInternetworkingMiniDisc2 (number)Noise (electronics)MeasurementMechanism designFile systemInformationMereologyDifferent (Kate Ryan album)Flash memoryDigitizingMessage passingLatent heatRAIDError messageGraph (mathematics)Form (programming)AverageSequenceExtension (kinesiology)CodeReading (process)CurveLinearizationType theoryObservational errorData structureWritingSystem programmingProcess (computing)Heat transferUtility softwareInferenceConfidence intervalLogicPosition operatorRight angleMultiplication signSocial classCASE <Informatik>Single-precision floating-point formatBitoutputTwitterNatural numberBacktrackingNumbering schemeExpressionBit rateDesign by contractComputer animation
Metropolitan area networkMaxima and minimaPartition (number theory)MiniDiscInstallable File SystemDifferent (Kate Ryan album)Attribute grammarComputer fileSystem programmingZoom lensStudent's t-testMedianPort scannerAverageBlock (periodic table)Exclusive orDataflowRight angleBitInterior (topology)Branch (computer science)Operator (mathematics)Variety (linguistics)Sound effectSingle-precision floating-point formatWater vaporMereologyRow (database)Network topologyReading (process)Multiplication signWritingCASE <Informatik>Bit rateView (database)BenchmarkBlock (periodic table)Computer fileLimit (category theory)AutomatonSoftware developerDesign by contractEnterprise architectureDifferent (Kate Ryan album)Utility softwareSensitivity analysisProjective planeConfiguration spaceSystem programmingResultantReflection (mathematics)Group actionSequenceFile systemArithmetic meanInformation securityGastropod shellOctahedronEntire functionMultimediaSummierbarkeitDatabaseAreaQuicksortBridging (networking)Heat transferMedical imagingDivision (mathematics)Spectrum (functional analysis)Partition (number theory)Default (computer science)Semiconductor memoryOperating systemDirectory serviceStreaming mediaPresentation of a groupMetadataData centerMiniDiscHard disk driveParallel portGraph (mathematics)Thread (computing)Video projectorScheduling (computing)Content (media)NumberDisk read-and-write headVibrationSoftwareType theoryComputer animation
Cellular automatonMetropolitan area networkMaxima and minimaDataflowOpen setZoom lensRead-only memoryProper mapBenchmarkGamma functionExt functorConsistencyReal numberRange (statistics)Right angleMultiplication signDifferent (Kate Ryan album)DivisorResultantPresentation of a groupDatabaseUniverse (mathematics)Semiconductor memoryDirectory serviceSoftware testingLine (geometry)Limit (category theory)Operator (mathematics)Stability theoryOcean currentNoise (electronics)Graph (mathematics)Water vaporBranch (computer science)Row (database)Theory of relativityLogic gateBenchmarkCore dumpSystem programmingInformationPartition (number theory)Key (cryptography)Type theoryVirtual machineCASE <Informatik>File systemMereologyPerspective (visual)Connected spaceScaling (geometry)Server (computing)Data storage deviceReading (process)Remote procedure callFrequencyMusical ensembleNumberHeat transferOrder (biology)Computer fileStatisticsBlock (periodic table)IntegerExponentiationElectronic mailing listConfiguration spaceCalculus of variationsVector spaceSquare numberSystem callThread (computing)MeasurementSocial classClosed setOpen sourceMiniDiscConcurrency (computer science)Hard disk driveMulti-core processorArithmetic meanCurveVacuumCartesian coordinate systemLoginAddress spaceRead-only memoryDemonTracing (software)TheoryException handlingFunction (mathematics)Standard deviationTrailInterior (topology)Front and back endsWritingComputer animation
Metropolitan area networkSystem programmingBenchmarkDifferent (Kate Ryan album)Gamma functionZoom lensValue-added networkDirectory serviceDatei-ServerProxy serverServer (computing)World Wide Web ConsortiumRegulärer Ausdruck <Textverarbeitung>Open setAttribute grammarComputer fileDataflowUser profileClient (computing)Density of statesRing (mathematics)Computer hardwareSocket-SchnittstelleCache (computing)Dependent and independent variablesWater vaporVirtual machinePoint (geometry)Limit (category theory)Local ringSystem programmingSocket-SchnittstelleLogicMultiplication signBenchmarkOperator (mathematics)ResultantData structure2 (number)Line (geometry)DatabaseProfil (magazine)Computer fileDifferent (Kate Ryan album)MeasurementOffice suiteTwitterChainConfiguration spaceError messagePhysical lawSpeech synthesisFile systemUniverse (mathematics)Sheaf (mathematics)Client (computing)Message passingConnected spaceArithmetic meanService (economics)Concurrency (computer science)CASE <Informatik>MereologyLatent heatInformation securityInstance (computer science)InferenceNumberAlgorithmElectronic mailing listServer (computing)Hard disk driveSemiconductor memoryBitEmailBridging (networking)Datei-ServerAverageField (computer science)Software testingSoftwarePlastikkarteThread (computing)Total S.A.Right angleCache (computing)MiniDiscComputer hardwareProcess (computing)Remote procedure callScheduling (computing)Partition (number theory)BefehlsprozessorCondition numberSoftware bugProxy serverDatabase transactionPublic domainComputer animationDiagram
Metropolitan area networkZoom lensMultiplicationSocket-SchnittstelleBlock (periodic table)VarianceMaxima and minimaSoftware testingBefehlsprozessorPort scannerCompilerImplementationKeilförmige AnordnungBenchmarkData Encryption StandardGamma functionRemote Access ServiceBoss CorporationKnotSoftware developerLinear regressionValue-added networkPoint (geometry)Default (computer science)Interrupt <Informatik>Software2 (number)Configuration spaceData compressionSystem programmingKernel (computing)Library (computing)Computer hardwareEncryptionDifferent (Kate Ryan album)Process (computing)Data structureInternetworkingCompilation albumBenchmarkNumberScalabilityPublic domainDatabase transactionSocket-SchnittstelleSpacetimeSoftware developerComputer fileTransmitterOperating systemMeasurementBefehlsprozessorCASE <Informatik>QuicksortRevision controlMultiplicationPresentation of a groupConcurrency (computer science)Streaming mediaDirection (geometry)Boss CorporationMereologyInformation securityCausalitySoftware bugSet (mathematics)PlanningAlgorithmFinite differenceSystem administratorResultantGoodness of fitCurveSemiconductor memoryCompilerGame controllerMessage passingMusical ensembleNoise (electronics)Model theoryShared memoryView (database)Natural numberAnalytic continuationBridging (networking)AreaComputerLipschitz-StetigkeitOcean currentBitServer (computing)Type theoryProjective planeMultiplication signAssembly languageGroup actionExecution unitDrop (liquid)Perspective (visual)Virtual machineOperator (mathematics)Physical systemStudent's t-testEntropie <Informationstheorie>ImplementationWave packetUtility softwareControl flowObservational errorComputer animation
Transcript: English(auto-generated)
Thank you everyone for coming. I'm Ivan Vornos and I'm here to talk about benchmarking, which is a hot topic always and everybody is excited when new benchmarks come out and everybody's trying to either repeat them or prove them wrong on something. This presentation will be mostly about
how to do certain aspects of benchmarking. More advanced users will probably know them already. So this is how to do and what not to do when doing a benchmark operating system. And the thing is, I've also tried to do some new benchmarks
of previous day which will be kind of extensive and useful for the years to come. The last ones were done by Chris Kenaway in 2008. This is the time frame of the 7.x release. It's a bit old. And also there's some benchmarks appearing on the mailing list
and on various blogs which measured certain aspects of previous the performance and which are not really professionally done and which paint not so good picture for previous day.
And I try to either repeat them or prove them to be done in a less than satisfactory way. So when I say most of the developers, I mean developers who are interested in measuring performance of certain aspects of their systems or subsystems they're working on.
System administration, system administrators will also find some useful material here. And lastly, I'm going to talk for a few slides or a few sentences about how and when to avoid benchmarking because sometimes you don't feel you need it.
Generally, when you're doing a benchmark that means you have some purpose in mind. Either you want to compare your system to some other system, you want to compare your hardware to some other hardware, you want to compare your application to some other application. It is mostly useless to perform a benchmark on its own.
So if you have a number that says that many megabytes per second or that many operations per second and you don't have anything to compare it to, it's just mostly useless. I mean, you can compare it to a previous version of a system that's also valid. You know, it doesn't have to be a completely new system or a completely new hardware. But generally, benchmarking means
you have to compare it to something. And the goal of a good benchmark is to make it repeatable, to make it useful for other people, to make the description of the benchmarks as elaborate or as complete as possible so the other people can repeat it
and see for themselves if what you did is good or are their own measurements comparable to yours. So I'm also going to talk extensively on how the benchmarks I did were set up and what I did to actually arrive at the results.
Repeatability is always a good thing. Even if you're doing benchmark only for yourself, it means that next week or next month or next year, you will be able to create the same environment and repeat your benchmark to check if something has changed, for example. Is the new version of an application or an operating system different
or maybe more or less performant? I hope to get some high-end hardware for this, but unfortunately I didn't. So this is all done on two identically configured servers, IBM 1U servers, and only a four-core CPU
without hybrid trading. So four cores is everything you get in this presentation. There's two gigabytes of RAM and four SATA drives set in RAID 0 because I wanted to stress performance and not really redundancy and reliability.
And networking was done on a gigabit embedded NIC, which has two ports, one of which is connected to the internet and one of which is used to connect the back sides of the servers with a really simple wire, so without a switch in between.
Some notes about software versions. I did both previously 9.1, which is the release version, and the previous 10 current, which is the development, the future version 10. This version 10 was for approximately two weeks ago,
so it is really recent. Of course, the debugging was turned off, with this invariance and such things. For comparison with Linux, I used CentOS 6.3 because it is really probably the most used
so-called enterprise version of Linux. I had PostgreSQL 9.2, Blockbench, Bonnie, Plus Plus, Filebench, and my own Bullet Cache server. I will describe it later. And as preliminaries also,
I'd like to talk about what and how to interpret to post-process your benchmark results. And this is a picture which is in every college textbook about doing any kind of measurements, so I will also repeat it here.
When we talk about measurement results, we can talk about how accurate they are or how precise they are. If they are accurate, they measure the real thing, measure the real aspect of the system you're trying to measure, and if they are precise, they're all clustered around basically the same value
which should be the one true value you are trying to measure, but it usually isn't. So it will be more clear in examples, but it's also related to introducing measurement errors because of domestic errors, those which are influencing our measurement
in a way that you measure actually something else. There'll be some interesting examples here. So basically, you're thinking you're measuring one aspect of the system, and in actuality, you're measuring something different. And also random errors, which are basically noise, measurement noise, which can come from all parts of the system.
You can basically either reduce it or just limit it and use statistics to establish some kind of medium or average value. Lots of benchmarks also exaggerate in their precision.
So you can see some benchmarks on the internet that says a hard drive has 412 or SSD in this case, 412.567 megabytes per second. This is usually nonsense because you cannot actually measure hard drive or SSD performance to this precision.
In most cases, I mean, really most cases, the best you can do is measure maybe the first two significant digits. So this would be probably be better expressed like 410 megabytes per second.
In theory, you could achieve really, really, really good precisions like in this case, but in practice, you won't ever. It's also useful to introduce the error bars in the form of either on graphs, doing this small extensions on the graph
below and above the real value or express it numerically or something. Basically, the error bars are an expression of the precisions of your measurements. Confidence in that the measurements are really actually precise or really repeatable
in a way that is useful. My favorite example here is measuring hard drive performance because you get a lot of file system benchmarks on the internet. We just create a RAID array or just to use a single drive
and create a file system on it and just run some kind of benchmark on it. This is not exactly the way it should be done because especially on mechanical hard drives, so-called spinning graphs and such, you have huge differences in performance, linear sequential read performance or sequential write performance
between the inner part of the drive and the outer part of the drive. If you use the disk info utility, which is part of every FreeBSD, it's a part of the base system, you can get a really nice illustration of how performance differs when you go from the outside of the drive platters
to the inside of the drive platters. So for example, if you just create a file system on such a drive and measure file system performance in any kind of utility, I mean really any kind of benchmark utility, it's very much influenced on which part of the drive
your file system code places the data. So if it places the data on the outside, you'd get one kind of performance curve and if it places in the inside, you'll get another type of performance curve. This is a difference of approximately, I think 100 megabytes per second. So really it is significant.
So don't do file system benchmarking on whole drives, especially not on mechanical drives. SSDs have a different kind of problem. They have a lot of internal structure. The flash sensation layer basically says that your data which you're thinking you're writing a particular place on a drive
can actually end up in some other place on the drive and it can be internally fragmented and it can be compressed and it can be transferred in a way you have no influence over. So benchmarking with flash is a whole other topic.
This is also an illustration of a few runs of file system of this info benchmark which is shown in the previous slide. And it says, it basically shows how there's quite a lot of noise when benchmarking hard drives.
Even though this is this info, so it doesn't pass through a file system. It talks directly to the hard drive via slash dev, slash DA zero or some other device specific path. Basically, it says that there's still significant noise here.
This is noise between, for example, 300 megabyte per second and 350 megabytes per second. It's significant. The noise is also present in all cases. If the measurement is on the outside of the platters or on the inside of the platters, it is still present.
It is not really something you can avoid. So you use statistics to get an average value. How many people have seen this video? Yeah. Basically, the guy shouts at a drive area.
There's a drive area, it's in a data center. The guy shouts at it and his acoustic vibrations are transferred to the drive heads and he sees unnoticeable spikes in latency. So really mechanical drives can be really, really sensitive. You get visible, visible spikes.
This means that if you really want to get scientific about it, you will really need to go to really much trouble to do it right. Fortunately, you can always aim for ballpark value.
For example, you always want to get values like 300 megabytes per second, not exactly 335.456. You really have to be realistic about your precision, so you can expect on it. So what do you want to do with a hard drive
or a file system? Basically, you want to create a partition which is somewhere close to the outside of the drive and which is small enough so that the difference in performance is not noticeable between its beginning and its end.
This partition has to be of large enough size so you can run meaningful benchmarks on it. For example, it has to be at least two times larger than your available memory. Because if you run sequential read or sequential write test, you need to ensure that it's not cached. But since a lot of data comes to file system metadata
and you have the reserved portion of file systems which is UFS approximately 8%, usually you have to do three times or four times the size of your RAM for this kind of partition. And as an illustration, this on the left side,
this is my partition, and on the right side, this is again the graph showing the whole drive performance. So when you look at just this partition, the inner, middle, and the outer, middle and inner parts of this partition basically have the same performance, plus minus some noise.
And of course, the whole drive is completely different story. So these are the results. I think the blue channel is not working on this projector. This should be a blue bar.
Blue bars, or in this case, black bars, are Linux performance. And the red one is FreeBSD 9, and the yellow one is FreeBSD 10. This is a result from Bonnie++. I'm only using the file system bandwidth, actually, the sequential read and write rates.
And also, the rewrite rate. You can see that the results are interesting, and it's hard to tell if you can set much emphasis to the Linux performance, which is worse on writing,
and better on reading, or the other way around. What is interesting, and which will be confirmed in other benchmarks, is that actually, there's happened some development between 9.5, one release of FreeBSD, and 10.0 release of FreeBSD, in such a way that basically,
rewrite performance is much better. Also, the read performance is a little bit better. It means that some kind of new development has probably happened in the area of concurrency, on maybe more fine grained lockings in some part of the system.
When you think of file systems, basically, file systems are kind of a database. You get records, which are files stored on a medium. These records are simple, but they have associated metadata. They have associated security permissions.
They have attributes. They also need to be concurrently accessed, whether for reading or for writing. So this database really can get quite complex. ZFS is notorious for having a really, really complex layout. Basically, it's more compared, or more often compared to a database
than of traditional file systems. And the file system needs to support other features, like they need to be reliable. They need to support stream operations. Network file systems are also complex in a completely different way.
Next, I'd like to talk about the blockbench benchmark. It basically is a very small utility, but very cleverly written. Its goal is to create a tree of smallish files. It has two types of files. One is a sort of text files of two kilobytes in size
or four kilobytes in size, and the other is multimedia files, or in this case, image files, which are 4K in size. But generally, the sizes are not always the same. They are spread across this spectrum between two kilobytes and 64 kilobytes.
The interesting thing is that blockbench is multi-threaded. It creates hundreds of threads. It can create hundreds of threads. The default configuration starts with 110 threads, and all of these threads are divided into several groups. One group creates new files. The other group modifies files.
The third group reads files. In this case, the third group, the one that reads files, is the largest one. So we have, for example, 80 threads reading files, and the rest are writing or creating them. It also uses atomic renames for some writes. So, for example, it modifies a block file,
and then it renames it to overwrite the old content. This is important because in FreeBSD, much emphasis is given to write blocking. If you've seen any Linux benchmarks, you've seen pictures like this.
So basically, Linux performance in blockbench is all the way over there, almost two million, and FreeBSD performance is way over here, 700,000. So it's puzzling, so why would this be? This all is done on UFS. I have no UFS numbers here.
Basically, it comes down to how the operating system schedules reading and writing. What is interesting is that there is some improvement between nine and 10 versions, and to prove it, you can use the very easy,
very convenient utility present in FreeBSD, which is Ministat. It is also part of the base system. So we already have two benchmarks. One is disk info, and one is, or two utilities. One is disk info, one is Ministat. It can be used to create quality benchmarks.
I basically was wondering what's the difference between 9.1 and 10 releases, and it says that approximately 90% better performance is achieved in 10, but in the read case, but approximately 17% lower performance
is achieved for the write case. So it needs more investigation. Blockbench is a multi-tiered benchmark. It uses a lot of parallel operations on the same tree of files. There's something called the write bias of FreeBSD,
which basically means that writes on a single file block all other Xs. I think it even blocks other reads. I'm not sure. I think it is such. It is useful to remember the directories. The directories themselves are also files. So write operations on the directories themselves
are also blocked. I did some system call tracing. Basically, you get a long list of CIS calls, and it clearly shows that there are a multitude of threads during various operations, opening, reading, writing,
closing, and such. And from all this data, we can create a pretty picture, which kind of characterizes the number and the frequency of operations. What isn't shown on this axis is this zeros should be operation duration times.
So they are really very small in the order of microseconds. So it got all rounded up to zero, rounded down to zero. So we see from this graph that Blowbench
does a lot of what we knew before. Read operations are dominant. Surprisingly, there are a lot of close operations, which is kind of normal when you say, when you know that Blowbench opens files, modifies them, or reads them, and then closes them again.
And there was a really small number of write operations. Basically this curve over here. It's interesting to note that read operations also take the longest comparatively to everything else, which I think indicates that they are actually blocked
by other types of operations, but it needs more closer investigation. The other benchmark which is often used in FreeBSD world is PostgreSQL benchmark. It's a database. And in this case, I've used the PGBench benchmark,
which is a part of PostgreSQL source base. I've initialized the benchmark database with the scale factor of 1000, which means 100 million records. The database is approximately 16 gigabytes in size.
It fits in RAM. This is deliberate because I'd like to test, for example, concurrency in database access without really benchmarking the hard drive in the same time. And the configuration is, of course, stable during the hard tests. These two machines I was talking about at the beginning
are configured so that they do output Linux and FreeBSD. Both Linux and FreeBSD have access to the same partition. So the same partition, the same file system is used for both Linux benchmarks and FreeBSD benchmarks.
This is the partitions I was talking about at the beginning. So that one with, which has small deviations between access performance on outside, middle, and inner tracks.
PostgreSQL is a fairly modern database, which means it has lots going on in the backend. The most important thing is the write logging. Basically, if you do a write operation in PostgreSQL,
the data doesn't go directly to the database store itself. It is written in the write-ahead log first. And periodically, this data is then transferred from this write-ahead log to the database storage itself. The problem is that this can happen at unexpected times.
So for example, if you run a short benchmark, you can run the whole benchmark with the data residing in the write-ahead logs and not in the data storage itself. And you can run another benchmark, which is started,
and then right in the middle of benchmarks, this transfer occurs between the write-ahead logs and the proper storage. And this introduces a large amount of noise in your benchmark results. So benchmark runs need to be longish, at least five minutes, 10 minutes. There are recommendations for even longer times,
like half an hour. And it is crucial that the PostgreSQL server itself, the daemon itself, be restarted between the benchmarks because a restart of PostgreSQL causes all data to be transferred from the logs, from the write-ahead logs to the data storage itself. So you get more consistent, more precise measurements.
And also, it's necessary to run out of vacuum. The vacuum in PostgreSQL is the part of the system which basically cleans up data in case of updated records
and deleted records and calculates some statistics information, which can also influence your results so they can be scattered or very noisy. On this particular machine, which is the quad-core Xeon I was talking about,
I've run benchmarks which are read-only, which are read-write, and also I've run benchmarks which are on the file system, meaning on the drives, on the drives themselves, on the disk array, and on the memory file system, which is a RAMFS in case of Linux and TMPFS in case of FreeBSD.
I've also done some remote benchmarking, meaning remote connection to the database from the address server. I will talk about some of the results in more detail. So on Linux, I've discovered that there's almost no difference between fully cached database on X4
in read-only benchmarks, and it's RAM file system or memory file system. On FreeBSD, using TMPFS is faster than using UFS, which is fully loaded on our fully pre-warned database content.
The blue lines are basically Linux, and the red lines are FreeBSD. This dip here is very surprising. It is also very repeatable. I get it in all configurations, in all backpack runs.
I have a theory why is it happening. I'll try to explain it later. What most people are interested in right now is that on this particular machine, I get lower performance of FreeBSD than of Linux, except for this dip here.
I don't know if you can see it, but there are two lines here, the dark blue one and the light blue one. The dark blue one is disk file system, and the light blue one is a memory file system. On FreeBSD, the dark red one is the disk file system, UUFS,
and the light red one is TMPFS, the memory file system. There is a noticeable difference here. This is, these are the right results. There again, the Linux curve is the blue one,
and the FreeBSD curve is the red one. Even though this particular benchmark should really depend on the disk performance, apparently it doesn't. The OpenTIC system has huge influence
on the total performance. This graph shows benchmarks of remote access performance. So what I did was use the same servers.
One was obviously running the database, and the other was running the benchmark client. The database was always on memory file system. This is the distinction between local access, which is the four lines here, the dark blue one and the dark red one,
and the remote access over here, which is the light blue one and the yellow one. We can see that remote access helps a little bit, because the benchmark client itself is a CPU-intensive process. So when separating the benchmark client
and the other machine, you free some resources on the server itself. But this is offset almost entirely by the fact that there is a TCP connection between those machines, even though it's just a wire between the two network cards. The same general conclusion can be drawn,
that FreeBSD still has some work to do in performance improvement. And because we can now separate the client on another machine, I think that this is actually caused by some kind of scheduler issue in FreeBSD.
In the original case, you have bought the client and the server processes on the same machine, and somehow, if at approximately 12 concurrent clients, Linux has some kind of a problem, I think.
So my best guess is that it's a scheduler issue in Linux. However, to get some more optimistic news, these are the results done by Florian Smits, and I also think Jeff Roberson, on a hugely different machine.
This machine has 40 cores, it has four CPUs, 10 cores each, and 80 threads in total on the system. And the results are much better on such short configuration. Basically, in this configuration, FreeBSD is consistently better than Linux.
So there's some hope yet. The algorithms used in achieving large callabilities probably favor systems with large number of CPUs. So something needs to be done to maybe close that gap.
Linux, which is blue in this graph, lacks in performance up until approximately 32 clients or 32 connections to the database server. This is also PostgreSQL 9.2. This is also a benchmark of local benchmark client
and the local server on the same machine. Next benchmark I would like to talk about is Filebench. This is a benchmark done, I think, by Sun Microsystems back in the day,
and it has a large number of profiles. One profile, for example, is a file server, which basically creates a list of large files, and the other is the web proxy profile, which creates smaller files. There are different benchmarks operations
being done on these profiles. Unfortunately, I think that this benchmark has some problems running on FreeBSD. I have some strange results, I'll talk about them later, and I'm not so sure that it's really a correct benchmark. I'm not sure that these results I will show you
are really correct on the FreeBSD side. I've done local drive measurements on UFS on this partition I was talking about, and also NFS3 and NFS4, because there was some talk on the mailing list about the huge performance difference between NFS3 and 4 in FreeBSD. So I wanted to find out what's happening.
This is the file server profile, and the results are strange. On the one side, we have Linux, which is this huge dark bar here.
The results is larger than 800 megabytes per second, which is obviously due to caching, which is all right, or actually it would be all right, if it's not for FreeBSD results, which is this red bar here,
which is almost 100 megabytes per second. The drives themselves can pull 450 megabytes per second at least, so I think that the huge difference between the Linux results and the FreeBSD results are due to benchmark problems.
So this is an example of maybe a systemic error or systemic problem in benchmarks. I've run NFS3 and NFS4. These are the two bars right here, the light blue one and the dark blue one. The light blue one is NFS3,
and the dark blue one is NFS4. And even on Linux, NFS4 is significantly slower for some reason. I was hoping to talk to Ulrich, but I think he's not in the conference right now. The FreeBSD results are lower than Linux results.
I really cannot explain why local disk performance should be so low, because this is much lower than the hardware itself can do on this machine. And the NFS results is, again, significantly lower than the NFS3 result.
The NFS was mounted over TCP. On the other hand, we have the web proxy profile, which again shows really quirky results, but a bit different. This time, Linux has lower performance
on the local hard drive measurements, and FreeBSD has hugely better performance on the same conditions. Again, it looks to me like the benchmark itself is wrong. The benchmark itself seems to be, either it has a bug or is Linux specific,
and there's some problems with porting it to FreeBSD or something. This is possible, but not really probable that such a result would happen. So, this just goes to say that you need to choose your benchmarks carefully.
As an interesting side result, this is the same benchmark on the file server profile, and I've used the FreeBSD as a NFS3 server and Linux as the NFS3 client,
and shows that the results are basically the same. Which is encouraging. Last year, I talked about my bullet cache server. It's a memory cache server, similar to memcache-d.
Its specific thing is that it creates TCP traffic of very small transactions, like the two bytes, 128-byte messages being going back and forth between the client and the server. It is heavily multi-threaded, and also uses non-blocking IO,
and if you benchmark over Unix sockets, meaning both the server and the client are around the same machine, and talk through Unix domain sockets, you get easily two million transactions per second
on mid-range hardware. The results are in favor of FreeBSD on this particular benchmark, but the benchmark itself was developed on FreeBSD by me, so I used a lot of tunings for this to actually happen.
I also would not advertise this as a truly good benchmark between Linux and FreeBSD. It's more of a curiosity. Between measurement errors, the same performance is between 9.1 and 10 releases.
The benchmark is interesting because it illustrates that FreeBSD's TCP stack is fairly multi-threaded, and fairly finely locked, so TCP transactions, TCP streams do not influence each other.
Multiple concurrent streams can be used on FreeBSD without performance problems. It also shows that I got, for example, nearly 500,000 packets per second per direction in this configuration, so this is a lot of really small TCP messages,
so I'm satisfied with the performance shown. To illustrate the difference between TCP over the plain wire connecting, plain ethernet wire connecting the servers,
and Unix domain sockets, Unix domain sockets' performance crosses one million transactions per second, but this is also very naturally expected between on the other, on one hand, you have the whole ethernet channel, you have the whole ethernet actual networking happening,
and on the other hand, you have only memory passing between processes. So one thing that a lot of people are trying to figure out is what to tune on FreeBSD to get acceptable performance, and I would argue that, especially with release 10
and even the now current release 9.1, there is not much you can do, or actually the system auto-configures itself pretty well. We had an increase in read-ahead tunable recently,
also buff space, high buff space and low buff space, so they are now auto-tuned, you don't have to touch them. Some tutorials advertise increasing the network, transmit and receive descriptor rings,
or increasing their size. I haven't actually noticed some significant difference on my previous benchmark, so there is some difference, I'm not so sure if this is because of maybe measurement noise, I need to do some more benchmarking to actually find out,
but the point is, the default values are actually pretty good. I also tried to increase the number of interrupts per second allowed on the EM driver on the CPUs, and if I increase them, I get also a very, very small improvement,
maybe less than 1% of performance. So unless you are really trying to squeeze out sub-percent performance from your machines, you don't have to touch, for example, network driver configuration. Kern.maxusers governs basically
the size and number of internal kernel structures. It's been auto-tuned for years now, and for example, for the last example, this tunable, which is maybe familiar to everyone who had run Apache, or PostgreSQL, or something else,
which has a lot of processes forking, is also gone from 9.x releases. So my point is, you don't have to rely on manual tuning as much as you used to on FreeBSD.
I didn't give any CPU-specific benchmarks, and by that I mean I didn't do any encryption, I didn't do any kind of compression benchmarks and such, because for the most part, they are trivial. Unless you have a broken compiler,
or unless you have different tuning in your compression libraries, or different algorithms in your libc, you're not going to find any significant or any useful difference in such benchmarks. So most of the benchmarks which are now available on the internet show a noticeable difference
in compression performance. For example, they use gzip, or some other libc utility, to compress a huge file on Linux, and they do it again on FreeBSD, and then show, for example, that Linux performance is a little bit better, or the FreeBSD performance is a little bit better.
This, for example, relies on how the libc was compiled itself. For one time, FreeBSD didn't use an assembler-optimized part of the compression algorithm, because there was a security bug in it. So a libc was compiled with C-only code,
which had a lower performance than the assembler-optimized version. So this is the cause of a drop in such benchmarks. If you want to, you can test the quality of libc and lib implementation, for example. You can test the quality of the I-O,
SDI-O set of calls, or something like this. But then be sure you know what are you benchmarking. You are not benchmarking, for example, the kernel in this case. You are benchmarking the algorithms in your libc. You can also test your compiler. FreeBSD has a sort of old-ish compiler by default.
This will change in 9.0. And just as an illustration, you can find benchmarks like this floating around the internet, then where you have, for one side, GCC, and recent versions of GCC as well.
And on the other hand, you have LVM, CLang compiler, and so, for example, this particular benchmark shows huge difference between CLang and GCC. What this picture doesn't show is that there are other benchmarks
showing completely different values. So in this case, it is mostly, you can run benchmarks like this. There's no problem in running benchmarks like this. But you have to know that you are running a benchmark of a compiler. So if you compile, for example, a compression library or an encryption library
with different compilers on different operating system, you, it's all right. You can do this. But just realize that you are not exactly benchmarking, for example, the operating system kernel. You are benchmarking the quality of a compiler.
And sometimes, benchmarking is done in circumstances which really don't require it. There are a lot of reasons why you can do benchmarks. But there are also reasons where benchmarking really, it's not as useful or the numbers
aren't as useful as you think. You can choose an operating system based on its features, based on its community, based on its large number of software or supported software. You can also say that it's cheaper to buy another machine and run a slightly slower operating system
than to reconfigure everything. And one conclusion that is also good to keep in mind is that actually FreeBSD gets better through the years and it consistently gets more and more scalable. During the Chris Kennaway benchmarks of 7.x era,
FreeBSD has strived to achieve scalability on eight CPU hardware. And it succeeded, but that's all news now. And during the 10.x era, realistically, FreeBSD probably aimed for being scalable
or being better than Linux on hardware containing 32 CPUs. So it's consistently improving. It is very much a modern system going forward. You need to benchmark maybe out of curiosity. You need to benchmark if you really want to know
what's going on in the system when you're planning or budgeting for a new project, when your boss tells you to. And finally, we get advocacy. And this is perhaps the most commonly used reason to do or actually publish benchmarks on the internet. All right.
So, yeah. During the developer summit, I was talking to some of the other developers and some of the cluster administrators of FreeBSD. And we will try to get up some continual benchmarking going on. So maybe at the next BSD can or something like that,
I will have some updated numbers and I will get some nice performance curve improvements over the time, over the year of previous development. I think this is it. The point of this presentation was mainly
to point out some interesting facts about benchmarking itself, about what to do and what not to do when benchmarking an operating system. So I hope it was a reasonably interesting presentation to hear and thank you for staying.