We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Benchmarking Ruby

00:00

Formal Metadata

Title
Benchmarking Ruby
Title of Series
Number of Parts
65
Author
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language
Producer

Content Metadata

Subject Area
Genre
Abstract
Testing is firmly ingrained in our culture, and many of us rely on a network of services to test, evaluate and profile our code. But what about benchmarking? Learn tips and tricks about how to approach benchmarking and the variety of gems that are available. Learn about tools to help determine algorithmic complexity of your code, as well as how this information can help you make development choices. Learn how to properly set up your benchmarking experiments to ensure that the results you receive are accurate. More importantly, discover that benchmarking can be both fun and easy.
39
FreewareTwitterComputer animation
TwitterSoftware frameworkTwitterSeries (mathematics)Software frameworkQuicksortImplementationArithmetic meanBitBenchmarkTraffic reportingDifferent (Kate Ryan album)BlogWeb 2.0Coma BerenicesComputer animation
BenchmarkCodeWritingBitDataflowComputer animation
BenchmarkSoftware testingFunction (mathematics)Functional (mathematics)Software testingCodeWritingDivisorBenchmarkDifferent (Kate Ryan album)Computer animation
CodeCodeProduct (business)Formal languageMacro (computer science)BenchmarkLibrary (computing)Computer animation
BenchmarkCodePrice indexExplosionTotal S.A.Physical systemBenchmarkFunction (mathematics)Traffic reportingSpectrum (functional analysis)Subject indexingMultiplication signCodeQuicksortElement (mathematics)TrailRandomizationPhysical systemBit rateNumberResultantLecture/Conference
Variable (mathematics)Function (mathematics)Boilerplate (text)Library (computing)Standard deviationBenchmarkBlock (periodic table)Multiplication signCodeObject (grammar)Variable (mathematics)Function (mathematics)Boilerplate (text)Structural loadResultantComputer animation
Price indexBenchmarkExplosionBoilerplate (text)Traffic reportingBenchmarkMultiplication signIP addressBitCode2 (number)Block (periodic table)IterationFunction (mathematics)CountingFunctional (mathematics)
Price indexCalculationPairwise comparisonCodePairwise comparisonBlock (periodic table)CalculationNumberBit2 (number)BenchmarkIterationDefault (computer science)Standard deviationResultantTotal S.A.1 (number)FrequencyVariable (mathematics)CuboidSubject indexingPoint (geometry)
Price indexCalculationPairwise comparisonSubject indexingIteration2 (number)NumberBenchmarkNeuroinformatikMathematics1 (number)CodeView (database)Pairwise comparisonArithmetic mean
Flow separationView (database)Price indexPairwise comparisonBenchmarkExplosionCodeView (database)BenchmarkObject (grammar)ResultantTime zoneIP addressPairwise comparisonOrder (biology)Block (periodic table)Function (mathematics)Different (Kate Ryan album)Type theoryMathematicsTraffic reportingPositional notationMereologyScaling (geometry)Covering spaceInternet service providerPlotterRadical (chemistry)EstimatorQuantum stateElement (mathematics)Electric generatorSubject indexingLogical constantMultiplication sign
Price indexRankingObject (grammar)Subject indexingFunction (mathematics)Logical constantDifferent (Kate Ryan album)Radical (chemistry)Multiplication signBit rateASCIIPlotterBenchmarkMereologyDiagram
ASCIIBenchmarkoutputMereologySoftware testingRange (statistics)Variety (linguistics)Hydraulic jumpDefault (computer science)IP addressMultiplication signCartesian coordinate systemObject (grammar)NumberComputer animation
BenchmarkGroup actionBenchmarkCodeConsistencyoutputComputer animation
Integrated development environmentConsistencyBenchmarkCompilation albumIntegrated development environmentQuicksortContext awarenessMultiplication signGraph (mathematics)Avatar (2009 film)Function (mathematics)Point cloudObject (grammar)Randomization
Function (mathematics)Graph (mathematics)Cycle (graph theory)BefehlsprozessorVirtual machine2 (number)BitClosed setQuicksortLine (geometry)Computer animationDiagram
ExplosionBenchmarkProcess (computing)BenchmarkPositional notationLine (geometry)CalculationPhysical lawFunctional (mathematics)QuicksortLoginSquare numberOperator (mathematics)AnalogySoftware testingDiagram
BenchmarkFunction (mathematics)Software testingComputer fontBenchmarkData structureOrder (biology)Fibonacci numberSheaf (mathematics)Functional (mathematics)Task (computing)Physical lawSequenceComputer animationLecture/Conference
Fibonacci numberSocial classBenchmarkWechselseitige InformationBenchmarkFibonacci numberSoftware testingImplementationBlock (periodic table)Multiplication signFunction (mathematics)MultiplicationPolynomialMathematicsNeuroinformatikNumberBitDefault (computer science)Functional (mathematics)Process (computing)Sheaf (mathematics)Form (programming)1 (number)Object (grammar)System callSequenceLine (geometry)QuicksortRight angleGroup action
BenchmarkSelf-organizationBuildingObject (grammar)Computer configurationComputer animationLecture/Conference
BenchmarkPairwise comparisonHash functionExplosionRandom numberOrder (biology)BenchmarkIP addressObject (grammar)Direction (geometry)QuicksortHash functionException handlingBlock (periodic table)Sinc functionMultiplication signComputer configurationCASE <Informatik>Traffic reportingQuantum state
Hash functionCovering spaceDirection (geometry)Object (grammar)Multiplication signDiagram
Hash functionCASE <Informatik>Revision controlGoodness of fitFunctional (mathematics)Line (geometry)Greatest element2 (number)Reduction of orderHash functionWritingBenchmarkObject (grammar)CodeComputer animation
Object (grammar)BenchmarkCodeObject (grammar)BenchmarkCASE <Informatik>WritingIP addressMultiplication signLibrary (computing)Computer configurationBit rateComputer animation
Multiplication signCodeQuantum stateNumberMultiplicationElectric generatorBenchmarkMeasurementComputer animationDiagram
BenchmarkGUI widgetSoftware testingBenchmarkMultiplication signQuantum stateCodeCASE <Informatik>Game controllerTrailTraffic reporting
MaizeBenchmarkMultiplication signTrailBenchmarkSound effectComplex (psychology)CASE <Informatik>Different (Kate Ryan album)Object (grammar)Line (geometry)Shooting methodNumberRight angleMessage passingComputer configurationQuicksortCodeArray data structureComplete metric spaceComputer animationDiagram
Group actionDeterminismBenchmarkCASE <Informatik>BenchmarkObject (grammar)Inclusion mapRight angleNumberComputer animation
DeterminismBenchmarkDifferent (Kate Ryan album)MathematicsBit rateObject (grammar)Multiplication signBenchmarkGame controllerAdditionSoftware testingOperator (mathematics)Position operatorIterationDiscrete element methodDiagram
Annulus (mathematics)DeterminismAverageBenchmarkExplosionBitGame theoryNumberBenchmarkOrder (biology)CASE <Informatik>Group actionRandomization1 (number)Best, worst and average caseDifferent (Kate Ryan album)Diagram
CASE <Informatik>Expected valueLinear codeCone penetration testQuicksortSoftware testingBenchmarkBest, worst and average caseMultiplication signDiagram
AverageFile formatGeometryGeometryFormal languageFile formatCASE <Informatik>AlgorithmConvex hullBenchmarkType theoryTask (computing)CodeOffice suiteVideo gameObject (grammar)Open sourceReal numberSet (mathematics)PolygonSoftware development kit
PolygonPoint (geometry)Line (geometry)CASE <Informatik>Office suiteConvex hullConvex setGeometrySet (mathematics)String (computer science)Complex (psychology)CodeMereologyMultiplication signImplementationAlgorithm
ChainAlgorithmMultiplication signoutputCodeGoodness of fitChainObject (grammar)Convex hullType theoryBitPoint (geometry)CASE <Informatik>Different (Kate Ryan album)PlastikkarteDevice driverComputer animation
CircleRandom numberSample (statistics)BenchmarkExplosionObject (grammar)2 (number)Convex hullMultiplication signImplementationConvex setAlgorithmPoint (geometry)Software testingCASE <Informatik>CircleBenchmarkString (computer science)Different (Kate Ryan album)Line (geometry)RandomizationType theoryOrder (biology)Dynamical systemSlide ruleTraffic reportingoutputCodeDevice driver1 (number)Word
MathematicsBest, worst and average caseAlgorithmCodeFunction (mathematics)ChainObject (grammar)CircleCurveBitArmInternetworkingCASE <Informatik>QuicksortoutputMereologyType theoryGreatest elementDependent and independent variables1 (number)Software testingZoom lensComputer animationDiagram
Formal verificationInternetworkingBitCodeBenchmarkAlgorithmNeuroinformatikState of matterMereologyComputer configurationFormal verificationSoftware testingComputer animation
CodeBenchmarkFormal languageCodeBenchmarkUniform resource locatorDifferent (Kate Ryan album)Multiplication signDependent and independent variablesResultantComputer animation
BenchmarkCommitment schemeBenchmarkWritingMultiplication signCodeMessage passingMathematical analysisDifferent (Kate Ryan album)CASE <Informatik>Computer animation
Mathematical analysisBenchmarkoutputRange (statistics)Different (Kate Ryan album)IP addressBenchmarkPairwise comparisonRange (statistics)outputQuicksortSoftware testingKey (cryptography)InformationComputer animation
BenchmarkDisk read-and-write headResultantFrequencyCASE <Informatik>Computational complexity theoryForm (programming)Multiplication signIP addressCodeComputer animation
VideoconferencingSlide ruleLink (knot theory)Context awarenessCodeBitCodeBenchmarkSlide ruleBasis <Mathematik>Formal languageSoftware repositoryFood energyElement (mathematics)Computer animation
Series (mathematics)BenchmarkResource allocationVacuumSoftware testingTask (computing)Variable (mathematics)Configuration spaceIntegrated development environmentWebsiteProgrammer (hardware)Software testingBitBenchmarkMobile appObject (grammar)InformationQuicksortMultiplication signMathematical analysisProjective planeComputer animation
WhiteboardRhombusComputer iconTurtle graphicsSoftwareEvent horizonVideoconferencingProjective planeComputer iconComputer animation
Transcript: English(auto-generated)
Thank you so much for being here.
I'm gonna be talking about benchmarking in Ruby. But first I would like to say hi to all of you. It's so great to see so many friendly faces, so many familiar faces in the audience today. I really appreciate you deciding to come to my talk. My name is Davy Stevenson. I'm Davy Stevenson on Twitter. I'd love to hear any of your thoughts and opinions
on the talk afterwards. If you wanna talk to me more about it, feel free to contact me there, or of course wandering around the halls for the remainder of the last day here at RubyConf. So let's talk a little bit about benchmarking. And what do I mean by benchmarking? What I mean is not that I'm going to be talking
about Ruby implementations or frameworks and benchmarking those. If you're interested in that sort of thing, there's a lot of really great resources online already, including isrubyfastyet.com and a really great blog post by Made by Market called the Ruby Web Benchmark Report. So if you're interested in learning more
about how the different Ruby implementations compete, compare against each other, or how the different, many different web frameworks compare, I would point you there. Instead, I'm gonna be talking a little bit about how to benchmark Ruby code. And then I'm gonna talk a lot about the many common pitfalls that I see people making,
and I make myself when benchmarking code, and I use these for illustrative purposes to try and make it so that you can be thinking about these things while you're writing your benchmarks and hopefully avoid many of these pitfalls. And finally, what I really wanna do is to show you that benchmarking Ruby is very fun and enjoyable,
and it's something that everyone can easily fit into their workflow when it's appropriate, and gain extra knowledge about the code that they're writing. So why should you benchmark? The question I'll ask you next is, why do you write tests? Most of us, I think, have fully subscribed into the test-driven culture that writing tests
to verify our certainty and functionality of our code is really important. That allows us to be sure that the code that we're writing is doing what we think it's doing, and to be comfortable in refactoring things, knowing that our tests are gonna be there to catch us if we make a mistake.
So therefore, why should you be benchmarking your code? It plays a completely different role than testing, in that benchmarking is all about certainty and performance. You want to be certain that the code is performing in the way that you expect, and unexpected things aren't there tripping you up and making your code run a lot slower
and more inefficiently than it needs to. And when you're benchmarking, you might benchmark your code, but it's also really, really helpful to benchmark the code in the gems that you depend on, and also Ruby code itself, by doing benchmarking on code that's not your own, you gain a broader understanding
of what that code is doing, how it's performing, and how using that code, and using those language particulars, how that affects the code that you ship in your final product. So how do we do benchmarking in Ruby? First off, in the standard library,
there is a great tool called benchmark. All you have to do is require benchmark, and you can easily write a couple reports like this. So this is an example of benchmarking an array. We're starting with one that has the integers, one, 10,000 integers, starting at zero, and then we're shuffling those,
and we're going to be benchmarking the time it takes to find a particular element in that array, and the time it takes to just do an index lookup. So that's at an index. And so inside there, we're just calling
those methods that we want to benchmark, and we're using the random method to make sure that we're picking different elements that are inside that array each time to give us a nice, broad spectrum. So the output of this sort of benchmark looks like this, and it's a lot of numbers, and really what you're kind of doing
is looking for this final output column, and there is, you know, keep a track of what's happening in your system, make sure nothing else is impacting you, but that's kind of where you're looking to find the results, and how long that code took to run. So what are some pros of the built-in benchmark library? It's really easy.
It's in the standard library. But there are some cons as well. There's a lot of variable fiddling. You have to define how big of the object you're creating, how many times you're going to be iterating, the code block to try and make it so it's time that's not zero milliseconds and not 10,000 years,
and I find the output to be kind of difficult to read, and you have to kind of have a heavy cognitive load to try and parse those results. And there's a lot of boilerplate code that's required to get those benchmarks running. And so here's an example. So this is the previous code, and I'm highlighting some of the code that I think counts as boilerplate in this example.
We have to create a value of n, and then inside each report do n times calling a block. So Evan Phoenix wrote a really awesome gem called Benchmark IPS that pretty much eliminates the need for this boilerplate. Instead of calculating the raw time
that the block of code takes to run, instead it's trying to calculate the iterations per second that that code can run at. And so it's pretty easy to switch that out, just require a Benchmark IPS instead, switch the method that you're calling on Benchmark. And then the other cool thing is it has
this nifty compare function. And so what does the output of this benchmark look like? So here we have the final Benchmark IPS report. And so I'll go in a little bit more detail of what it's doing. So Benchmark IPS does a pre-calculation block where it runs through and tries to calculate first
how many iterations it can run in 100 milliseconds. And it uses this to be a little bit smarter about how it calculates the final run to make it a little bit more accurate and produce less variability. So then it runs and tries to run the calculation for the iterations per second
for the default of five seconds here. You can modify that. And so it tells you now the number of iterations per second that it calculated, the standard deviation for that, and then also the total number of iterations that it ran in the five seconds. Because I specified that comparison block, it also does this cool comparison
where it tries to calculate how much faster the fastest code block ran compared to the other ones. And this can be really helpful. So again, if you're just wanting to find the results really quickly, what you're gonna wanna look at is here, that at ran at 3.2 million iterations per second
and index ran at 41 and a half thousand iterations per second. And then we can see here that index then is a decent amount slower than just a pure lookup into that array. So what are some pros?
I think that it's a lot less fiddly than the built-in benchmark. The other cool thing about it is because it's measuring iterations per second, it means that bigger numbers are better. The more iterations per second that can be run, the faster and more performant your code is. And so I think it's a little easier for our brains to easily compare which ones are better than others.
It also has pretty much the exact same syntax as the built-in benchmark. So if you're already used to using that, it's really easy to switch over to using IPS instead. And it also has that really cool compare feature which can be really helpful if you just don't want to have to do math really easily. Just have it do it for you. That's what computers are good for. So what are some cons?
It is a separate gem that you have to require. And then the other con that I find sometimes is that it really only provides a snapshot view into the performance of the code that you're running. And so what do I mean by that? So here is that example in the comparison block that it outputted for an array of 10,000 elements.
And here is the output for that same IPS block for an array of 100,000 elements. You know, why is this comparison showing these different amounts of how much slower it is? That's a 10 times difference in what it's saying the performance is. Why is that?
And so in order to answer some of these questions, I wrote a gem called benchmark big O to try and build upon what benchmark IPS provides. And instead of specifying the size and the array outside of the benchmark block, it brings that portion into the benchmark itself.
So I've called this gem benchmark big O because what I'm trying to do is see how the performance of things scale as the objects change in size in order to try and estimate what the big O notation of that code block is. So the changes are shown here. Instead of creating that array outside of it,
I'm using what's called a generator to generate that object. The array is one of the built-in types. But then you can also specify it specifically a generator, which you provide a code block to tell the benchmark big O gem
how to create the object that you want to test. So this is identical to this generate array. The next major difference is that you have to pass in some extra objects into the report block itself because that array in size is not globally accessible anymore and that those are changing over time.
Those are being passed into the report block, the generate object. And then the next thing is that I have a couple other different features including being able to print out a chart of the results and also to print out a plot of that on the terminal as well. Benchmark big O under the covers
uses benchmark IPS to do most of this work. So when you're running it you'll see a whole bunch of output like this. But the cool and important part about this is the charts that it provides. So here's using chart kick and output of the difference in performance of at and index on array. And so now it becomes really, really clear why there was a difference in performance
by changing the array size. At is a constant performance method while as index grows linearly with the size of the array, which makes sense. Index has to go in and find a object within the array. As the array gets bigger it's gonna take more and more time. And just because I think it's cool here's what the terminal output looks like
because oh my gosh, ASCII charts. And before anyone thinks that I actually did that that's just part of what Gnuplot provides so I totally stole all of that. So pros to the benchmark big O gem. It allows you to run some tests on a wide variety and range of inputs.
It has charts, and I love charts. And it has ASCII charts. So that's pretty much the biggest pro right there. So of course there are some cons that go along with it. Just like benchmark IPS, it's a separate gem. It also takes a lot longer of a running time. The default number of steps that I suggest is 10 for this
so by default it's going to run 10 times longer than benchmark IPS to try and collect this data. Sometimes it can take a really long time. And then the other con is that it's not always applicable. Benchmark big O is really only useful if you do have an object that's changing in size. There are many things that you're gonna wanna benchmark
that don't have that behavior at all and therefore benchmark IPS is still gonna be far superior because it will actually work. So next I'm going to talk about how to benchmark effectively, otherwise known as
avoid making these common mistakes. Yesterday Jeremy Evans gave a really good quote in his talk which is that benchmarks really are useless unless, like other things in science, we can't reproduce and critique these benchmarks. So all of the code that I'm going to be talking about today is available on GitHub.
You can run these examples yourself really easily and to investigate and see more about how I set things up and play around with it, so please feel free to do that. I highly recommend it, give me input, tell me if I have screwed up in my benchmarks, that would be fantastic. So what's the first question to ask yourself?
And that is, is my environment consistent? And so I'm gonna give another really simple benchmark example to try and illustrate this. I am going to be comparing the sort method and the sort by method. Sort by is really, really cool if you want to be able to sort by a known method name on your object.
So here I'm creating a bunch of random users and I'm comparing how to sort those by name and how to sort by them for name. And this is what it looks like when I'm running Netflix and SoundCloud and Pandora and Photoshop all at the same time. And you can see here that the graphs are very, very wiggly
and you're not getting, I was not getting consistent output and so it's really important to make sure you're aware that this is calculating CPU cycles on your machine. If your machine's doing a lot of other things and taking up those CPU cycles doing something else, the values you're gonna get are not accurate at all.
Closing things down, here is a much clearer example of what you're expecting to see and that both of these methods are effectively, should be straight lines, still a little bit of wiggle here but that you can now see a little bit more clearly how much of a benefit you're gonna get
by using sort by instead of sort when you are sorting on a name like, a thing like name. The other thing I'm gonna highlight here is that the benchmark Big O gem also has a compare function which does something slightly different than the IPS one in that once you've run these calculations
it's going to try and compare these against known Big O notation lines so that you can try and do a better job of seeing exactly how that performs. And so what do I mean by that? Here is the array sort line, is the red line from the previous line and I have calculated out, or the gem's calculated out
what it would look like if that was constant, log n, n, n log n, or n squared. And you can see here that it's pretty close to lining up exactly with the n log n line which makes sense since this is a sort function and sorting, as we know, is an n log n operation. And similarly, the sort by,
even though it performed better than sort it still at the end of the day is an n log n operation. The second question, are you verifying the behavior that you're testing? And so the quick answer to that, of course, is you can write tests to do that.
So here is an outline of how you might structure a benchmark in order to be benchmarking two methods, test A and test B, and then writing some tests to verify that the output of those methods, the output of the thing that you're benchmarking, is what you expect.
So Tyndalev gave this great quote at Keep Ruby Weird which is, the Fibonacci sequence is the perfect benchmark. And so to continue this example, I'm going to use a Fibonacci sequence example. So here is the test that I wrote to verify the behavior of a Fib function that calculates the Fibonacci sequence.
So I have a whole bunch of the first sections of the Fibonacci sequence there, and then I just iterate over them and verify that the output of Fib is what I would expect. The other thing, and I'll talk about this a little bit more, is that I also want to make sure that this method doesn't change its behavior in multiple times that we're calling it.
For both IPS and Big O, it's calling the same block over and over and over again. It would kind of suck if your implementation somehow was returning different values each time that you called it. So I'm going to call Fib five twice and just verify that those two things are equal. And so let's say while I'm researching
the Fibonacci sequence, it's also reading all about factorial, and I completely screwed up the method Fib function. Instead of writing the correct Fibonacci, I just slapped it in there and returned n plus Fib n minus one instead of the correct value. So now the benchmark doesn't care.
It runs perfectly fine and outputs a nice chart here that shows this nice linear-looking function. And so if I hadn't been running the tests, I might think, ah, great. My job here is done. But by running the tests, I can see, nope, I have a failing test. This Fibonacci function is not Fibonacci sequence. So I go back and fix it
and actually calculate the right Fibonacci, which is Fib n minus one plus Fib n minus two. And now I can see that that chart looks drastically different. Instead of that line, now we're seeing this spiking polynomial that we'd expect from the Fibonacci sequence, and my tests are green. The other thing that I can highlight here
is the way to modify exactly what sizes that you're going to be using for the benchmark Big O gem. It has some same defaults for a lot of objects, but for the Fibonacci sequence, because it is a polynomial, it's going to grow really rapidly. I had to limit it pretty directly down to only the numbers five through 24,
because anything more than that was taking 10 jillion years on my computer to run. And so if we want to see this nice polynomial in chart form, here it is. Look at how glorious it is. So I use benchmark Big O for a lot of these examples, mainly because I think that showing these charts can really help bring across
the performance implications of these things. A lot of these examples, the same exact things apply for benchmark IPS as well. And if you don't want or need the charts, you should not be using benchmark Big O. So next question that we're going to ask. Are we modifying only one thing?
And so here's what I mean. Here we're going to write a benchmark to compare reduce and each with object. These two built-in methods do almost the exact same thing, seemingly, except that they accept their options in the opposite order, which is really confusing.
And so we're just taking this initial array with a bunch of hashes in it, and we're effectively copying it over to a new one. And so when I run this IPS report, like it's saying that reduce is almost 3,000 times slower than each with object. How could this possibly be the case?
How could reduce possibly be this slow? And if we go back and look at our example, I can see that the reduce and each with object are not the only two things that I changed in this example. The behavior of these methods is slightly different in that each with object does the accumulation for you.
It doesn't care what the block return is. It's always going to keep returning you that same object, while as reduce is expecting the block to return the object that it's then going to pass to the next object. So in this case, in order to make it a nice one-liner, we switched out using a direct assignment for a merge,
since direct assignment returns the value and merge returns the object. Except we can detect and calculate how much of a performance impact this had on our example here. So now I'm going to do the same sort of thing, benchmark big O, only I'm going to calculate that
on what happens when I run it on merge and what happens when I run it on a direct assignment. And now it becomes clear why those two things were running in vastly different running times. Direct assignment is linear, or constant I mean, and merge is linear because what's happening
under the covers is merge is allocating a brand new hash every single time it runs through and as we know, allocating an object in Ruby is super fast, I mean no, it's not fast. So by iterating merge over and over again, we were compounding this extra cost.
And to highlight how big of an impact this has, I'm going to go back to our original example and at the top is the correct version of reduce. So in this case, I'm doing the assignment and then returning the hash on the second line and I'm going to compare that against
the original merge option. And so what do we see here? Well, the reduce is not a constant function. It is linear, but doing an n squared is so much more computationally intensive that it effectively looks like a straight line down on the bottom.
So stacking linear within linear is not a good idea, don't do that. So the next question is are you accidentally mutating objects in your benchmarks? And this can affect both benchmark IPS and benchmark Big O, and well all of your benchmarks
unless you're writing special code to take care of this case. So here I am writing a benchmark to illustrate to someone how much better using delete if is than trying to iterate over a bunch of values and deleting them each individually. You know, there's a lot of really great methods
in the standard library and they're there for a reason. And so by running these benchmarks and trying to find out why are these special methods there when I could just do this with other code that I can write. Well here's kind of an example of why this happens. So here we have delete. I'm gonna delete, I wanna delete half the items
in my array each time. And so I'm gonna iterate over all those numbers and delete them. And instead, here's the delete if option. It's gonna pass it in a block and tell delete if which items that I'm really looking for to delete. And so if I run this chart,
I get something that looks completely reasonable. This is exactly the chart that I would expect to see out of this. In that, again, delete if is linear, not constant, but you can't tell because n squared is so overpowering. But the problem with this is that I'm not actually measuring what I think I'm measuring. The delete methods are obviously removing things
out of those arrays. And so as I'm running this code multiple times over and over and over again, it's only deleting it the first time. And then in subsequent iterations, there's nothing left to delete. The delete benchmarks are still gonna be iterating over all of those numbers each time, but there's nothing actually being deleted
out of those arrays because they're already gone. And so what can I do to protect myself from this? In this case, I'm going to have to dupe that array each time before I run the test. So then I can be sure that I'm running the deletes
on an array that actually has those objects in it. And this is a good time to point out that it's also really good to set controls when you're writing these benchmarks. The array dupe method has a cost into itself, and making sure that we are understanding how the other code that we have to write
to set up our benchmarks correctly, how that's affecting our benchmarks, is really important. So now I'm gonna add a third report, which is simply to keep track of the amount of time that the array dupe takes itself so that I can at least know how big of an impact this is having on my benchmark. So now in this one, I'm going to see
how long it takes to dupe an array. I'm going to see how long it takes to delete while modifying that array, and how to do it the right way, which is to dupe the array and then delete that. And so here is that benchmark for the delete if case.
So luckily, we can see here that the array dupe cost is very, very minimal. It's a pretty constant cost because we're just duping arrays with numbers in it, which is very, very fast in Ruby. This is not always the case. If you're duping complex objects, this could be a much bigger impact. But luckily, we don't have to worry about that affecting or changing our benchmark.
But now we can see the big difference the mutating case had versus doing it the right way. The mutating case is that purple line, and it's performing much better, because it doesn't have to do as much work than the dupe plus delete if. And so by not paying attention to
whether or not we're modifying our objects, we could be thinking that our code is a lot more performant than it is. Or possibly the opposite case can be true as well. And for completeness sake, here's the same sort of example on the iterating over the delete option. So we can still see here, again,
the doing it the right way, the dupe and delete, is higher cost than the underlying case where you're deleting the objects out of the array and passing that along to the next benchmark. So our next case is, are we using random effectively?
So here's a simple benchmark, and I am going to create an array, and I'm gonna shuffle it, right? Because we want to shuffle our objects. And then I'm going to see how long it takes to find a particular item that I know is in that array. I'm gonna pick the middle number, and see, and run include.
So I know that that number's always gonna be in that array. That include should always return. It's a randomized array, everything should be great, right? Oh no, this is not exactly what I would expect from that, right? I don't think that the behavior of include is really, changes that drastically on different sizes.
So what was wrong here? The main problem here is that include is a short circuit operator. It's going to stop whenever it's found that object. And so for each of the different sizes, it's generated a different array with a different shuffled position of that midpoint. In some benchmarks,
it might be towards the beginning of the array. In some, it might be towards the end. And the thing about the benchmark Big O, and this would also affect benchmark APS, is you're running that test on kind of the same object each time. Trying to, if you need to be doing some additional shuffling, you're gonna have to be doing that in your benchmark
and taking that as a control itself, to be modifying the object between each iteration. So it can be a really big cost and so these benchmark gems don't do that by default. So what would be a better way of going about this? Well, if I just want to ensure that I'm picking something that's in that array,
then by running rand on the size is an easy way to do the, pretty much the exact same thing. And by running this benchmark, we get what we expect, which is that linear growth. So I'm gonna expand this out a little bit more. And so this case, I'm not shuffling the array to begin with. So I have just the numbers all in there,
in original order. And so I'm gonna run three different benchmarks on this. I'm going to do a deterministic best case, where it's gonna find number one. And then I'm going to do the deterministic worst case, which is the last item in that array, the size.
And then to try and do the random case, that's gonna be the average case for this array. So now, when I run this benchmark, we get this cool three linear values out of that. We can see that the, searching for the first item in the array
is, as we would expect, a constant behavior. It's easy to find the first thing in a small array, as it is a very big array. The finding the end point is going to take a linear amount of time, depending on the size of the array. That's what we'd expect. And the random case splits the middle. The other cool thing about this sort of thing is now we see a cone of the performance behavior
of this method. We can see how this behavior might change, depending on the case that you're testing on. So sometimes it's going to be really, really fast, and sometimes it's going to be a lot slower. And so this leads us to my last question to ask of you,
which is, what case are you wanting to benchmark? You have to ask yourself, do I want to test for the best case? Do I want to test for the worst case? Or do I want to test for the average case? And this can have a really big impact on the data that you're collecting. So in all of my previous examples, I have mostly stuck to Ruby language internals,
mainly because everyone knows the Ruby language, and I didn't have to explain a lot of what these methods were doing. But I made the claim earlier that you could benchmark your own code. And so I'm going to, for this last example, benchmark a gem that my office provides as an open source tool. This is a gem called Terraformer.
Terraformer is a geographic toolkit for dealing with geometry, geography, and formats. What does that mean? It takes geo-formats from one type and can be able to convert it to another, geo-json to esri-json to other types. And it can also do a couple simplistic geography-related tasks on those objects as well.
One of those things is the convex hull algorithm. And for those of you who aren't geo people, and you're like, well, the what? A convex hull is the convex polygon that encapsulates a set of geo data. This could be a collection of points. It could be a polygon.
It could be a string, a line string, which in this case is showing that the convex hull of this light blue line string is that purple polygon. And this can have a lot of really useful things for people who are doing geometry. So when the office built out the convex hull
implementation, the first algorithm that was implemented was something called Jarvis March. And Jarvis March is pretty easy to implement. It's pretty simple code to read. But it can vary in performance time based on its input size. And we were like, ah, it's probably not that bad.
Except that we noticed that instead some of the performance of this algorithm was kind of really terrible. So we were like, oh, maybe we should make this better. So we found a different algorithm called monotone chain. And it seems to perform reasonably a little bit better but I wanted to see exactly how much better does monotone chain perform
than the Jarvis March algorithm. And so to do so, I took into this average versus worst case scenario. And so Jarvis March is known to perform differently based on the different types of input you give it. And in particular, if the object that you hand,
the Jarvis March algorithm has a lot of points inside the object that are on the hull itself, it performs a lot worse. So by knowing a little bit more about the algorithm that I'm running a test on and doing a little bit of research, I could come up with two cases of generating objects.
So the first type of object that I'm gonna run this test on is the random mock. So I'm going to be able to, taking a point in the middle of downtown Portland, I'm going to just randomly move that point and generate a little random line string. Knowing how Jarvis March behaves,
I can also make something that implements the worst case scenario. In this case, it's a circle. A circle, every point in a approximated circle is on the hull. And so by creating a circle, I know that that's going to put Jarvis March through its worst case scenario.
In order to basically switch between those two different objects, I'm gonna be creating a dynamic generator for benchmark Big O. So the dynamic generator uses this generator name that's defined outside, and it uses that to call either the random mock or the circle method,
depending on what I want at a particular time. And to make things easier, I also made a little helper for the convex hull, since setting the implementation of the terraform or convex hull takes a really long amount of words, characters. So convex hull object implementation will return the convex hull on that particular algorithm implementation.
Whew, getting there, so much code. So here's the benchmark itself. So I've passed in that dynamic generator, and the other thing about the convex hull algorithm is it takes a really long time. It's a pretty complex algorithm to run. So I'm gonna bump up the running time of my tests to 20 seconds.
I'm also gonna fiddle with the sizes that I'm gonna test, and I'm also snipped out all of the reports to put on the next slide. And I'm gonna chart this out on terraformer.html. So here are these reports. So I'm gonna do four. I wanna see how each algorithm performs on each input type.
So first, I'm gonna generate the random mock ones, test that on Jarvis March and on Monotone. Then I'm gonna set the generator to do circles and test it on Jarvis March and Monotone. And so what do I get? Whoa, so I can see here that just as expected, the Jarvis March on circles takes,
that's an N squared curve that we can see there. And all of the other ones are collapsed way down to the bottom. So I'm gonna zoom in a little bit more so we can see those more clearly. So now we know Jarvis March circles, that's really bad and terrible performance. Everyone would hate you if they tried to run the algorithm on a circle.
So let's zoom in a little bit more. So now we can see the Jarvis March on a random input is effectively a linear. It still performs a little bit worse than the Monotone chain but if I was only viewing this on the random, the average case, I would think that Jarvis wasn't actually that much worse than Monotone chain.
It's a little bit slower but not that bad. Not anywhere near the same sort of response I would have by testing that worst case. And so I'm gonna zoom in again a little bit more so we can see here that the Monotone chain algorithm ran pretty much effectively equally on those two different object types.
Which verifies our assumptions that this algorithm doesn't change in output performance based on the values, the types of objects that you pass it. So now I've learned a lot more about these two algorithms than by testing both of these average and worst cases
than I would have if I'd only picked one. Phew, so I just ran through a whole bunch of examples and a lot of code. So I hope you stuck with it and if you got a little bit lost, again it's all on the internet so you can check again later and try it out for yourself. Fiddle with some of the variables and see how it performs. So conclusion.
I think the most important part is a lot of this is around verifying your assumptions. Verify your assumptions about the state of your computer. Verify your assumptions about the code that you're writing in the benchmarks. Verify your assumptions about the algorithms that you're testing. That's a very important part of being able to write effective benchmarks.
The great thing about benchmarks is it can help you learn more about your code and other code including learning more about the Ruby language itself. I hope that, you know, I think a lot of people knew a lot of the performance implications of some of the stuff I talked about but maybe you learned something a little bit new and different about the Ruby language that you didn't know already.
So when should someone use the benchmark gem? And my response is just use benchmark IPS. It's easier than using a benchmark gem and it gives you easier to comprehend results. So when do you use benchmark IPS? I'd say use it all the time.
Whenever you're having discussions with your coworkers about oh, this is faster than that. We do that a lot and write those in our code comments, write and commit messages that this is faster than that. How many times have we actually taken a step back and verified that statement? With benchmark IPS, it's extremely quick and easy to run a quick example and verify
that what we're saying is true. It's great for quick spot checks in this case and it also is really, really good for more expansive analysis too. If you want to run a lot of different comparisons on a lot of different objects, benchmark IPS can really be great for that. So when to use the benchmark big O gem?
The first one key might be if the input data that you're trying to test has some sort of range of sizes. If there's something that you can grow or shrink that has meaningful value to the thing that you're trying to test, then benchmark big O might give you more information than benchmark IPS just by itself.
And another way to ask this is, could I envision in my head the results in a chart form? If you can, then that means that you can probably figure out what your input sizes are and be able to use benchmark big O to generate those charts for you. And it has to be okay if it takes a while. As I said before, benchmark big O
almost variably takes about 10 times longer than benchmark IPS, which can be, in some cases, a lot more infuriating, which is why I'd still say benchmark IPS is really good for quick spot checks. And after talking about benchmarking for you know, almost 40 minutes now, I also have to take a step back and say,
you know, benchmarking is great, I love it. We can really learn a lot by benchmarking our code, but you can take it too far. And so, you know, you really, you know, just like premature code optimization, there probably is such a thing as premature benchmarking. And so make sure that what you're trying to do is actually has value to the thing
that you're trying to accomplish. And if you're curious for more, if you thought this was a really enjoyable topic and wanted to learn a little bit more, Eric Michaels-Ower gave a absolutely fabulous talk at Berucco this year called Writing Fast Ruby, and talks a lot about how to write more performant code
yourself by doing some simple benchmarks on a lot of the common language features that we use within Ruby on a day-to-day basis. There's also a beautiful talk, as you can see from this slide. Then Juanito Fatas made a repo called Fast Ruby,
which brings all of the examples that Eric talked about in his talk, and added it to a simple repo. And other people have been adding other interesting examples of, you know, basically two language elements within Ruby, one which is significantly faster than the other. And so if you're interested in learning more about that, you can check that repo out,
and play around with some of the benchmarks in there. If you're a Rails programmer, then maybe you might want to know about how to do benchmarking a little bit more specifically with Rails. And Shneems wrote this really cool Derailed Benchmarks gem that can help you do just that. And it provides not only benchmarking tests,
but a bunch of other performance analysis tools that you can really easily run on your Rails apps and get a lot of information out, including additional stuff about, again, how many allocated objects your app's allocating, the RAM required over time,
doing some stack time analysis, all some really cool stuff. And not only that, if you're interested in that sort of thing, you can dig around in his code and see how he does it so that you can learn a little bit more as well. Finally, I would like to thank these fine designers on the Noun project for creating great icons for me to steal, because I don't have any design skills myself.
And thank you very much for coming. And I have like a couple minutes for questions if anyone wants to do that. But thank you very, very much for coming. Thank you. Thank you.