We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

High Cost Tests and High Value Tests

00:00

Formal Metadata

Title
High Cost Tests and High Value Tests
Title of Series
Number of Parts
69
Author
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
There is a value in writing tests and there is also a cost. The currency is time. The trade-offs are difficult to evaluate because the cost and value are often seen by different people. The writer of the test bears much of the short term cost while long term benefits and cost are borne by the rest of the team. By planning around both the the cost and value of your tests, you’ll improve your tests and your code. How much do slow tests cost? When is it worth it to test an edge case? How can you tell if testing is helping? Here are some strategies to improve your tests and code.
43
Thumbnail
29:29
44
Slide ruleData managementSoftwareNumberMereologyBitDifferent (Kate Ryan album)Uniform resource locatorStatistical hypothesis testingSystem administratorSlide ruleSet (mathematics)Cartesian coordinate systemData managementInformationComputer configurationVideo gameType theoryPhysical systemWebsiteXMLUMLComputer animation
Statistical hypothesis testingDisintegrationContinuous integrationStatistical hypothesis testingLogicExecution unitStatistical hypothesis testingQuicksortAxiom of choiceVirtual machineDecision theoryINTEGRALComputer configurationCodePhysical systemIdentity managementWater vaporWordProcess (computing)NumberBit rateRight angleEntire functionClassical physicsUnit testingCore dumpLocal ringSuite (music)Computer animation
Statistical hypothesis testingProcess (computing)CodeStatistical hypothesis testingPoint (geometry)Arithmetic meanStatistical hypothesis testingDifferent (Kate Ryan album)Projective planeRight angleCodeMultiplication signProcess (computing)TwitterSoftwareTable (information)DivisorDocument management systemComputer animation
Metric systemStatistical hypothesis testingStatistical hypothesis testingStatistical hypothesis testingMultiplication signWritingInheritance (object-oriented programming)Term (mathematics)Video gameCodePoint (geometry)BitBit rateRight angle
Statistical hypothesis testingPoint cloudMultiplication signResultantSoftware developer1 (number)Term (mathematics)Right angleLocal ringCodeStatistical hypothesis testingProcess (computing)Task (computing)MereologyServer (computing)Virtual machineForcing (mathematics)Ideal (ethics)Computer animation
Statistical hypothesis testingCodeStatistical hypothesis testingDifferent (Kate Ryan album)Statistical hypothesis testingStatistical hypothesis testingMultiplication signCodeTask (computing)Web pageCovering spaceData managementCASE <Informatik>Form (programming)Software developerPairwise comparisonProcess (computing)Computer clusterComputer animation
CodeConfidence intervalStatistical hypothesis testingCodeStatistical hypothesis testingExtension (kinesiology)BlogMultiplication signDifferent (Kate Ryan album)Product (business)Cycle (graph theory)Video gameSoftware bugSoftware developerValidity (statistics)XML
Strategy gameSoftware developerMultiplication signStatistical hypothesis testingTerm (mathematics)Task (computing)CodeExecution unitProcess (computing)LengthRight angleProjective planeContext awarenessWritingBitStrategy gameValidity (statistics)Statistical hypothesis testingDifferent (Kate Ryan album)Software bug
Function (mathematics)outputDisintegrationWritingObject (grammar)DatabaseExecution unitSystem callPhysical systemData typeStatistical hypothesis testingCodeIcosahedronVarianceRange (statistics)Total S.A.Statistical hypothesis testingPhysical systemINTEGRALExecution unitStatistical hypothesis testingProfil (magazine)LogicObject (grammar)2 (number)CodeoutputMultiplication sign1 (number)DampingLevel (video gaming)Projective planeGroup actionGame controllerNumberTime zoneNeighbourhood (graph theory)Field (computer science)Social classUnit testingComputer architectureExtreme programmingDatabaseDecision theoryDifferent (Kate Ryan album)Function (mathematics)Data managementRange (statistics)MathematicsVariety (linguistics)Row (database)BitService (economics)VarianceExtension (kinesiology)Run time (program lifecycle phase)Functional (mathematics)Interactive televisionContext awarenessOrder of magnitudeSeries (mathematics)Parameter (computer programming)Centralizer and normalizerBit rateSpacetime10 (number)CASE <Informatik>Task (computing)System callShape (magazine)DataflowComputer simulationOrder (biology)Address spaceWritingRefractionSingle-precision floating-point formatSoftware developerCausalityQuicksortComputer animation
Chemical equationGoodness of fitChemical equationDifferent (Kate Ryan album)Statistical hypothesis testingRule of inferenceBitThumbnailMultiplication signFrequencyTask (computing)Statistical hypothesis testingMathematics
Statistical hypothesis testingRun time (program lifecycle phase)NumberStatistical hypothesis testingMultiplication signDifferent (Kate Ryan album)Similarity (geometry)Statistical hypothesis testingUnit testingVideo gameCycle (graph theory)INTEGRALSoftware developerComplex (psychology)Impulse responseExecution unitSoftware bugRun time (program lifecycle phase)Bit rateArithmetic meanProjective planeComputer animation
Kolmogorov complexityFocus (optics)Term (mathematics)INTEGRALMultiplication signStatistical hypothesis testingCodeRight angleTask (computing)Unit testingNumberFocus (optics)Limit (category theory)Complex (psychology)Fraction (mathematics)Statistical hypothesis testingExecution unit
Statistical hypothesis testingFraction (mathematics)Suite (music)BitStatistical hypothesis testingTerm (mathematics)Level (video gaming)Statistical hypothesis testingPoint (geometry)Group actionSuite (music)2 (number)Multiplication signCausalitySet (mathematics)Decision theoryWaveCodeTask (computing)Server (computing)Frame problemBit rateComputer animationLecture/Conference
SoftwareData managementDisintegrationGame controllerLogicDatabaseView (database)Moving averageData managementLogicDatabaseObject (grammar)MereologyCASE <Informatik>Multiplication signGame controllerCore dumpStatistical hypothesis testingSoftwareIdentity managementFunctional (mathematics)View (database)Physical systemStatistical hypothesis testingDifferent (Kate Ryan album)Right angleDirection (geometry)DataflowDiagramProjective planeEndliche ModelltheorieWaveComputer animationProgram flowchart
Game controllerLogicDatabaseExecution unitView (database)outputStatistical hypothesis testingPhysical systemPartial derivativeData modelCodeStatistical hypothesis testingLogicStatistical hypothesis testingMereologyNumberPhysical systemData structureMathematicsExecution unitoutputRange (statistics)CASE <Informatik>DatabaseCode refactoringSoftware bugDoubling the cubeMessage passingClient (computing)Row (database)Endliche ModelltheorieDecision theoryGame controllerExpected valueSuite (music)Greatest elementObject (grammar)Form (programming)Unit testingTheoryBit rateCausalityNatural numberNegative numberPoint (geometry)Latent heatFunctional (mathematics)WritingSpacetimeWater vaporRight anglePersonal digital assistantLevel (video gaming)Computer animation
Physical systemObject (grammar)Row (database)Statistical hypothesis testingExecution unitSet (mathematics)QuicksortValidity (statistics)Projective planeAxiom of choiceComputer animation
Sign (mathematics)Task (computing)Statistical hypothesis testingStatistical hypothesis testingLevel (video gaming)Sign (mathematics)OpticsLogicComputer animation
LogicExecution unitStatistical hypothesis testingSoftware frameworkUnit testingCASE <Informatik>CodeStatistical hypothesis testingLogicMereologyLatent heatTraverse (surveying)Execution unitCategory of being
Statistical hypothesis testingStatistical hypothesis testingDisintegrationExecution unitContext awarenessSoftware frameworkMaß <Mathematik>CodeExecution unitServer (computing)DataflowPattern languageContext awarenessStatistical hypothesis testingParameter (computer programming)Right angleStatistical hypothesis testingSuite (music)Strategy gameMereologyGoodness of fitUnit testingMultiplication signSoftware frameworkEntire functionTerm (mathematics)INTEGRALWritingComputer animation
Statistical hypothesis testingUniqueness quantificationStatistical hypothesis testingStatistical hypothesis testingStrategy gameTest-driven developmentMultiplication signTerm (mathematics)DivisorElement (mathematics)Task (computing)Object (grammar)Condition numberAssociative propertySocial classUniqueness quantificationFigurate numberInstance (computer science)CodeExistenceVariable (mathematics)Software developerPattern languageRight angleSoftware bugXML
Object (grammar)Maxima and minimaStatistical hypothesis testingDisintegrationComplex (psychology)Physical lawAssociative propertyStatistical hypothesis testingFactory (trading post)NumberStatistical hypothesis testingMaxima and minimaObject (grammar)RobotTask (computing)Suite (music)Error messageFocus (optics)INTEGRALSoftware developerMultiplication signRight angleMultiplicationWeb browserUnit testingCodeCausalitySoftware bugExecution unitComputer animationXML
Statistical hypothesis testingStatistical hypothesis testingBuildingStatistical hypothesis testingStatistical hypothesis testingFactory (trading post)RobotMathematicsE-bookMultiplication signSoftware developerMixed realityUniform resource locatorRevision controlNetwork topologyBitCodeGoodness of fitData conversionRight angleLetterpress printingColor confinementNumberDialectVideo gameForm (programming)Cartesian coordinate systemBeta functionProduct (business)
Coma BerenicesXML
Transcript: English(auto-generated)
Am I mic'd, cool, hi everybody, alright I'm gonna go off the title slide for a second and switch back to my,
see I was gonna put this slide up as my blank slide, but I thought you might want the title information to start with, alright, hi everybody, is having a good day so far here? Alright, that's good, that's encouraging. So I wanna start, when I start and when I get
a new feature request, I wanna talk about, I'm gonna talk a little bit about this application that I've been building recently. And just to give the details of it really quickly, it's an inventory management system, it's a very simple one, and the feature that I was asked to create was an administrative feature
where an administrator could verify the inventory of a set of items at a number of different locations in their larger site, and check and see whether,
how, sorry, check the values against the observed values, update the values, and update the inventory they were tracking, a really relatively straightforward Rails feature, but because I think about testing as part of my, as part of how I think about designing
this feature, I start to think about how I'm going to test it, and there are a couple of options. I could do your sort of classic RSpec TDD,
outside NTDD, starting with an end-to-end Capybara test, supporting that with various business logic tests, focusing on smaller and smaller pieces of the code, eventually writing unit tests. How many people does that describe the kind of testing practice that you might do or are familiar with?
All right, some, like about maybe a half or so. I could do what is the Rails core suggested testing practice, which is to basically write the code and then write a bunch of integration and system tests at the end of it to support it. How many people does that sort of describe what you do? A few lonely people.
Just because I'm curious from the last couple talks, how many people here regularly run their entire test suite on their local machine before they check it in? That's maybe a third, I think. How many people's test suite, the entire test suite, only runs on a continuous integration server and never runs anywhere on everyone's local machine?
That is approximately two-thirds or so of the room. That is mildly terrifying to me. And that, okay, we're gonna come back to that, I suspect. How many people here are in situations where they don't test?
They feel like they don't test or they don't test enough, one or two, three, four, seven, 12, whatever. I know numbers. So I could not test, that's actually an option. And all of these choices come down to a decision that I'm making that the process that I'm choosing,
the way that I'm testing, that it will be worth it. Which begs the question, really, what does it mean for testing to be worth it? What does worth mean in this situation? Honestly, what does it mean in this situation? What is any of this?
Why do we do any of this? And more to the point, not even just what does it mean? What does it mean to be worth it? How can I tell? So this talk is called High Cost Tests and High Value Tests. My name is Noel Rappin. You can find me on Twitter at Noel Rapp, N-O-E-L-R-A-P.
Feel free to at me during this. I won't see it during this, but I'm happy to answer questions. My DMs are also open if you have a question that you don't wanna put in the public timeline. If you wanna say fantastic things about this talk during the talk on Twitter, you should consider yourself encouraged to do so. I work for a company in Chicago called TableXI.
I also have a podcast called Tech Done Right, and we'll talk more about that stuff at the end. But let's get on with the main course here. The question here is how can you measure the cost and the value, not just of testing, but of any software practice?
And I mean the honest answer is that we really don't know. Like these things are complicated. There are a lot of factors. There's a lot of contextual factors. It's hard to say how much a given practice costs you, how much a given practice is valuable to you. In particular, tests have an issue here
where tests serve many different purposes within a project. Tests are code, yes, but tests also act as documentation. They also act as a marker of how complete the code is, which makes them process. If you're doing TDD, tests also inform the design
of your code, so they're also designed. So they can have cost, they can have value in a lot of different ways, and it's very, very hard to think of a single way to measure that value. But I'm gonna do it anyway in the name of actually having something to talk about for the next 25 minutes. The metric that I'm gonna use here is time.
Time has a couple of advantages. It's super easy to measure. We're all familiar with it. I think we all have an intuitive sense that if you are building the right thing, that the time you spend on something is correlated with the cost of it. And if you are not building the right thing,
then none of the advice in this talk is going to help you and you should figure out why you're not building the right thing and start building the right thing. Also, most of the other things that we think of as positives of tests in terms of improving the design, improving the code quality, that kind of stuff, ideally, that will pay off in saving time
building future features. Otherwise, you could argue it's not worth doing. So that's where we're gonna start. So tests cost time. You hear this a lot. You hear one of the problems with TDD is it just takes too much time to start up with. So let me break that down a little bit. What are the ways specifically in which a test
can cost you time over the life of the test? So first off, you write the test. And that's obviously time spent that you wouldn't be spending if you weren't testing. There's a particular amount of time that you spend just typing in or thinking about the test that you're gonna write.
The test runs a lot of times, ideally. For some of you, it runs on your local desktop machine. For many of you, it runs on a distant server in a cloud somewhere and rains its result down back at you.
But this is a cost, right? Even if it's not a direct cost in your development process, it is a cost in terms of increased build time, the amount of tests you can run at once, that kind of thing. The test needs to be understood. A new person coming to the feature needs to understand how the test works
as part of their process of struggling with how the code works. And also, the test needs to be fixed when it breaks. Whether or not it's breaking for a good reason, you still need to adjust either the test or the underlying code, and that's a cost that comes out of testing. Particularly, it's a cost that comes out of bad testing
because a bad test is going to fail spuriously more often. On the other side, tests save time in a couple of different ways. Tests can save you time by improving the code design. So first of all, a certain amount of time
that you might spend just whiteboarding or doodling, you might spend testing, and also, a TDD advocate would say that the test actually is improving the design of the code in a way that will save you time in the future. One that I think is a little underrated is that in development, it is often faster
to run an automated test than it is to manually test the code. So I showed that inventory management form before. Testing that manually would involve, potentially, like logging into the site, going to a particular page, putting in specific values in a form, pressing submit. Even if I have some of that automated,
it's still a minute or two process in comparison to the test, the end-to-end Capybara test that covers that case, which would run in a second or two. And so I often find myself thinking, this is too simple to write a test, and then I find myself repeatedly going through a webpage over and over and thinking,
oh, I should have written that test a half an hour ago and I wouldn't be doing this right now. A test also validates the code to some extent and does not replace further QA testing, but might replace some QA testing, or it gives you some confidence that you wouldn't have if you didn't write the test.
And finally, tests catch bugs faster. This can save you time in a couple of ways. It can catch a bug before the bug, before other code becomes dependent on the buggy behavior, which would be bad. This one actually can save you actual money if you catch a bug before it goes into production and you lose customers or server time
or have some other outage because of a bug. So we have four different kinds of cost, writing, running, understanding, fixing, four different kinds of value, design, development speed up, validation, bugs. And one thing that is interesting to me about this is that these costs and these values
happen at different times in the life cycle of the test. So when you actually develop the test, you take the cost of the writing the test, you get the savings of the design, at least some of them, you get the savings of preventing the manual walkthrough.
But then you have to live with this test basically forever. And forever you have the cost of the run, you have the cost of understanding, you have the cost of fixing, but at the same time the bug catching and the validation also live in the code base forever. And what's interesting about this is that not only does this happen over time,
the cost and the value are incurred by different people. The developer writing the test and writing the feature gets most of that developer cost and value, but the rest of the cost and value is often spread across the entire team over the entire length of a project. In a way that makes it very hard
for any individual person to really be able to confidently say this practice has value, this test was useful, this test was not. And it becomes I think very dependent on your context. And so spoiler alert, I don't really have an answer. I'm not gonna come out of this and go, this is exactly what every single one of you
should go out and do right now. Because you're all working within different contexts and all of you have different, those costs and benefits are gonna play out a little bit differently. But what I am hoping that you'll get in the next 20 minutes or so is a sense of how to think about these trade-offs in your own context. And a sense of thinking about testing
in terms of strategy and not in terms of tactics. I feel like a lot of the test advice, including a lot of the test advice that I give to people has to do with tactics. How do I write this particular test? How do I use this particular RSpec matcher? Should I use, would Minitest or RSpec be better for this particular feature?
And then we don't spend a lot of time thinking about strategy, like how many end-to-end tests should I write? When do I drop down to a unit test? That kind of thing. And so that's the kind of process that I'm hoping to talk about. So I wanna start with, I wanna start, I'm like 13 minutes in. I'm gonna start now.
I'm glad you're all here. Forget everything I said up until now. I was just clearing my throat. I'm going to present some data from a very small project that I've been working on for the last, the project that has that inventory management. I worked on it by myself for about maybe 20 hours, 25 hours a week
for about eight weeks. So it's somewhere in the neighborhood of 200 hours of development, more or less. And since I was the only developer, I got to design the architecture and just set all the testing practices. It was wonderful. I recommend not having other people
criticize your design decisions for a few weeks. It's really great. And in the end, I wound up with largely three different kinds of tests. I had end-to-end Capybara integration tests. They're actually RSpec system tests. These tests, the input to these tests
is simulated user interaction. The outputs of these tests are HTML that we're comparing to expected HTML. These tests run empirically based on the RSpec profile data is somewhere between a half a second to three seconds. The three second tests are the ones that pull up JavaScript run times.
And it's almost impossible to say how long it takes to write these tests, but 30 minutes seems like a reasonable amount of time that I spent purely crafting one of these tests like from scratch given how complicated they are relative to the rest of the code. So let's just agree to accept that number.
I've given you no reason to, but let's, because it will make everything else work out super nice from there. You don't know me. I don't know. So the second level of tests are sort of intermediate.
If you think about the testing pyramid, the integration tests are at the top. In the middle, we have these things that I call workflow tests. The way that I write Rails if left to my own devices is that the controller immediately passes off all its data to a workflow or an action or a service object, whatever you wanna call it, that handles all of the actual business logic for the functionality and interacts with active record
to get things into the database. So I have a whole series of tests here that start with passing parameters to one of these objects, calling a run method or an equivalent on that object, and ends with checking the database to make sure that it's made the changes that I expect. These tests are about an order of magnitude faster.
They run between 5 hundredths and 3 tenths of a second. Again, this is actual profile data. I actually did time myself writing a couple of these tests because I just did, and they took about a couple, again, from scratch, about 15 minutes. So much faster to run, a little bit faster to write. And then finally, we had unit tests,
which test the input and output of a single method. This is a wide variety of stuff in this code base. It's active record finders. There's a lot of things that test, like accessibility logic from pundit classes. There's some service objects or data objects that do various and sundry small things.
But all of these tests basically handle one, the input and output of one method. These are much smaller. They take maybe a few minutes to write, and they're much, much faster. The run time of them is between one millisecond and 4 hundredths of a second. So if I put that all together, remember I said this was a small project. The last speaker said that his class,
his project had 30,000 tests. This one has 180, which is smaller, you'll note. But the breakdown was like this. There's 22 system level specs in this system, and they take almost 13 seconds.
What's interesting to think about this is that the system tests are 12% of the code, but they are 75% of the output time of the run time, which seems a lot. Another interesting thing about this is that by and large, I spent about the same amount of time
writing each of these tests, each class of tests as I did each other class of tests. So the system tests were the slowest and the longest to write, but they're also the fewest of them. I spent the same amount of time, more or less, writing each kind of test. So like I said, 12% of the code is 75% of the output time. It's actually even weirder than that at the extreme.
The slowest four tests of this, which are basically the four tests that do run called JavaScript run times, are 40% of that run time. The other thing that was interesting to me when I actually looked at this data was something that I think I probably would have said I suspected but had never really put numbers to, which is that the range of the run times
was much wider as a percentage than the range of the time it took me to write the tests. So the tests took me somewhere between one to 30 minutes to write, so that's like about a 30 times difference. But at run time, it was the difference
between a millisecond and three seconds and probably even more, which is a 3,000 times variance. Is that making sense as I'm explaining it? And the amount by which those two ranges were different was interesting enough to me that I went back and I checked another project to see.
I didn't necessarily have time to write data on that project, but I was really curious to see to what extent that breakdown held up. This is a larger project but by no means a super large one. It's a project that we've worked on for four or five years. Many, many people have worked on it.
I've written probably the plurality of the tests in this test space, but by no means the majority or even most, or most or even a majority. And the breakdown on that test, on that project looked like this. This project has a bunch of cucumber, end-to-end cucumber specs that are quite slow on average. They're four and a half seconds each.
And a much higher number of unit tests that are significantly faster, they run about two-tenths of a second on average each. I would characterize this test, this code base as a whole as having too many end-to-end tests relative to its overall size.
So end-to-end tests in this case, the end-to-end tests, which are the cucumber tests and the system tests are 18% of the tests and two-thirds of the runtime. Okay, that's a lot of numbers. What do they mean? Do they mean anything? Probably not. It's just like, I mean, empirical data in this field doesn't move very far.
It doesn't go very far outside of its own context. But it did suggest a couple of things to me that I thought were worth, at least affected the way I was thinking about testing. I thought that that concept of a balance between the amount of time that you spend
on different kinds of tests, that seemed reasonable to me. That seemed like a good rule of thumb. And what that basically means is the larger, the longer, the more complicated the test, the fewer of that kind of test you probably should be writing. And the smaller and more focused and quicker test,
the more of those you should be writing, which is basically just the testing pyramid, but with a little bit of my own more empirical experience to back it up. Another thing here that is not necessarily shown in the data is that as you write a number of different kinds of similar tests, the cost to write the next one goes down.
So if I write an end-to-end capybara test, it may take me a really long time to write the first one. But if I write the second one and it's similar, I'm gonna copy paste or I'm going to refactor setup, something like that. The cost to write a bunch of similar tests in that run will go down pretty quickly.
And because of that, it seems to me that the short-term cost, the write time, the development time, the design time, is not super related to what kind of tests you write. And I think that specifically when somebody comes up and says you should mostly write integration tests, you don't need the unit test,
one of the places that that impulse is coming from is from the idea that writing unit tests may not be saving you developer time as you write the feature. But I think that writing unit tests
will save you time in the long run along all of these different accesses, along the run time, along the understanding time, fixing tests, bug catching, and that kind of stuff. And the reason that I say that is because over the life cycle of a project, once the test is written,
the long-term cost of the test basically depends on two things, how long it takes to run and how often it fails, especially how often it fails spuriously. And both of those things are dependent on the complexity of the test.
The more complicated the test, an integration test versus a unit test, the longer it takes to run, the more things that can go wrong, the more likely it is to fail. Whereas it seems to me that the long-term savings in writing tests come from focus. If I think of the times when a test has really,
I feel like a test failure has really saved me time, it's where the test failure makes it really clear where in the code base the problem is, because there's a very, very limited number of things that could go wrong to make this test fail, and I know right where to look. Add to that the idea that a small fraction of your test
can be the bulk of your cost, and to me it seems like you can get a big long-term payoff in avoiding writing slow tests. At some level, I have now spent 25 minutes here and I've gotten you all the way to don't write slow tests,
so I can't decide whether that was profound at this point or just stupid at this point, but I do think, and this is both profound and stupid, the way not to have a slow test suite is not to write slow tests, and there's a tendency here, no individual test causes a slow test suite.
How many people here have a test suite that runs more than 20 minutes on their CI server? We're gonna keep going, 30 minutes, 45 minutes. Over an hour? Okay, so there's still some hands up at over an hour. In your over-the-hour test,
is there any individual test that is longer than 30 seconds? Maybe? How many? We have a lot of Selenium tests. Okay, Selenium's its own, yeah, okay, that's fine. Selenium tests. You're a plant from the Selenium lobby. Selenium tests are obviously like,
work on like geologic timeframe compared to the stuff that I'm talking about here. In the non-Selenium world, which is the world I choose to live in as a general course of time, no individual test causes a test suite. That larger code base that I was talking about here,
which is actually like a 16, 18 minute test suite, the slowest test in that one is 13 seconds. And so like no individual pebble causes the avalanche here, so it's kind of a collective action problem.
You write the two second test instead of the one tenth of a second test, and the first time you do it, it doesn't seem to matter, and the like tenth time you do it, it matters a little, and the hundredth time you do it, it matters a lot. It's an aggregate set of decisions. So let me talk about, all right, that's hand waving. Let me hand wave in a different direction
and talk about how you can avoid or how you can think about your testing as strategically to try to avoid those kind of slow tests. So back to this inventory management software. Like let me talk about what I actually did to test this. The first test I wrote was a Capybara end-to-end test,
RSpec system test, and it goes all the way from the user, simulated user, to the controller, workflow, model, back up to the controller, to the view, back to the user. Everything read in these diagrams indicates a part of the system that the test is touching.
So this test fails in a lot of cases. It fails if there's a typo in the view. It fails if something's weird in the controller. It fails if the controller handles off to the logic object badly. It fails if the logic doesn't work. Like that core of the functionality is only one of like a gajillion things that this test will fail on.
And it fails if the database, if there's some problem with how we access the database. Once I write that outside-in testing, I start trying to make it pass. I very quickly find that I need logic, and I start writing a test of one of these workflow objects. This is the next thing, the next test I write.
It starts with the workflow, goes back to the database, comes all the way back up to the workflow. Strictly speaking, it doesn't have to go all the way to the database. It could use test doubles to stub out the database. I don't do it in this project for reasons that would, that made the talk run over time when I had them in the talk, so we can talk about it later if you want. But basically, this test fails
if the work logic fails or the database access fails. And it does not fail if there's a problem in the controller or in the view. But conversely, like this test, the end-to-end test doesn't care about the structure of the code. It is very robust against refactoring, because it's not actually calling
individual pieces of the code. Whereas this test, which fails in a smaller range of code, is somewhat more dependent on the structure of the code and might be brittle against refactoring. As that gets completed, I have a bunch of unit tests that I write, individual specific pieces of logic.
They go against the model and the database, and they fail only in very particular kinds of ways. So eventually, I get this code passing. The tests all pass. I have a suite of, I don't know, five to seven tests, maybe more, depending on how complicated it is. And I immediately start thinking about how can I break this?
What could happen to break this test, to break this code? And to me, the clear thing that can break this is bad input. Like the user could enter in, instead of entering in a positive integer, they could enter in the letter A. They could enter in a negative number, a blank space, an emoji, like all kinds of things could come in that could potentially break,
that would be unexpected input. And I need to deal with that. And I need to write a test to deal with it. Tests for failure cases are like the best kinds of tests to write, I think. And I have this decision. Do I write this as a Capybara as a system test? Do I write it at the workflow level? Do I write it at the unit level?
And interestingly, given the tests that I've already written, none of these tests are going to be hard to write. Like I can take my existing Capybara test, copy and paste it, and just change the input, and change the expectation slightly, and I will have a new system test that covers the bad input case in only a couple of minutes. If I write a unit test,
those tests are potentially very small, and I will have those tests in only a couple of minutes. I decided in this case to write them as unit tests. On the theory, basically, that was the logic that was changing. That the logic in the controller was not changing, that the logic in this workflow was not changing. The part of the code that was changing
was actually a specific object that takes these inventory numbers and passes them along to the database. And that object picked up a bunch of extra logic to handle these database tests, these bad input cases. And in fact, I actually did write these with test doubles. It took about five minutes to cover all the edge cases. The tests are super fast. They don't go to the database.
And if one of them fails, I know exactly where to look. When I had this working, the client asked for a new feature. They wanted a blank row at the bottom of the form that would become a new item, add a new item into the inventory. And again, I had the same question. Do I write this as a system test?
Do I write this as a workflow test? Do I write it as a unit test? And what I did in this case, and this is actually tracing through my process, is I wrote a workflow test. And my theory was that the logic of the controller handing off to the workflow was not changing, or at least was not adding any new logic.
The literals were changing, perhaps. But that all of the logic change was happening in the run method of this workflow object. And so that's where I pointed the test. It did not seem to me like I had a smaller unit that I was encapsulating this functionality.
But that raised a bug. There was a bug where if the user entered a new row and entered a name that was already in the system, it duplicated that item, and that was potentially confusing. And again, excuse me, again, I had the same question. Where do I write this? Do I write it as a unit test, as a workflow test, or as a system test?
And in this case, what I did was I copy-paste the workflow test and wrote another workflow test with a setup, changing the setup. And I am not convinced that this was the right choice. It was okay, it's a small project. But I think it probably would have been better served
to try and find a unit there that would have handled either a unit in the actual active record object or some sort of smaller object that was actually handling this validation. I think that this workflow object was probably doing more work than it needed to. Not a lot, but a little. A couple things that you can look for
in your own testing to see whether you have a test that is perhaps doing more work than it needs to do is if you're doing a lot of copy-paste, in general, if you're doing a lot of copy-paste, that's something to at least reflect on why you are doing that and why you're not refactoring to commonalities. If your tests are doing a lot of copy-and-pasted setup,
to me, that's often a sign that I'm testing at the wrong level because I'm piggybacking on setup of tests that I've already done. Something that comes along with that is a test that have a lot of assertions that are unrelated to the thing that you actually wrote the test to do because you're picking up assertions from some other test.
That just makes it harder to figure out why the test is failing. And if the logic test is very far away from the method that's under test, if you're testing a specific method, that's also a problem. The more the code, the longer the path the code needs to traverse to get to the failure and come back,
the more things that can go wrong. Capybara is wonderful. It is not a unit test framework. And if you are using it, it's so tempting. I say this about Cucumber, too. It is so tempting to use Capybara and it's not necessarily tempting to use Cucumber for anything anymore. But it's tempting to use Capybara
and perhaps even Selenium as unit test frameworks. And that is, by the way, I didn't mean to shame you for using Selenium. Selenium is obviously a very valid choice in many cases. And I don't wanna make people feel bad. It's just, it is objectively slow,
but objectively very, very useful in certain situations. It is also not a unit test framework. So, I mean, some people legitimately don't like writing unit tests. It was a significant part of DHH's TDD is dead talk was that unit tests were just adding costs
and not providing value. I'm paraphrasing what he's saying. In terms of this talk, what he was saying was that unit testing adds cost and doesn't add value. And I recently saw this as a testing advice for a JavaScript framework. Stealing Michael Pollan's pattern, write tests, not too many, mostly integration.
I don't like this, although I see where, like certain contexts, this makes a ton of sense. It makes sense in a context where unit tests legitimately cost a lot to write. And there are a lot of reasons why unit tests might cost a lot to write. This is very often true in legacy code.
Legacy code that was not written to have the units be easily separated. It may be very complicated to write unit tests. You may have to do a lot of setup to test an individual unit. So oftentimes you will see people like me say, if you're testing legacy code and you wanna just get coverage quickly, end-to-end tests from outside using Selenium
or Capybara or Cucumber or something like that are very useful. If you're using some framework that doesn't really handle unit tests very well, some of the JavaScript frameworks don't, then unit tests are expensive and they don't make sense to write. Some people argue, and I don't agree with this, but this was part of the DHH argument, was that unit tests create hard to understand designs.
If you believe that or if you don't believe that small units are a good thing in code, which again, I do believe it, but some people don't, then you get to a very consistent position where unit tests, small units are not valuable, therefore small unit tests are not valuable, and it's a consistent position.
I do think, and also the time to run matters a lot less if the only time you ever run the entire test suite is on your CI server, then the pain of writing slow tests is much less painful. I think that that's actually probably a bad thing. I think that throwing the pain of slow tests
off onto the CI server is a short-term benefit for what is probably a long-term cost. So anyway, so not derating unit tests is a really good legacy code strategy. I do think that you pay a cost in tests that when you do have a test failure, it's somewhat harder to diagnose those tests.
So I'm running out of time. I wanna talk about one more strategy that's really important in terms of making this kind of work, this work. A lot of times, test-driven development will talk about red-green refactor, and they will talk about thinking about tests in terms of what will make tests succeed. I encourage you to try to think about tests
in terms of what will make a test fail, particularly thinking about each new test in terms of what conditions in the code need to happen for this test and only this test to fail. This is different from saying that every single failure should only trigger, every single bug should only trigger one test failure,
which I think is impossible, but I do think that you can get in a situation where every test can justify its existence by saying this is a condition in the code that might happen and only this test will fail because of it. And if you can't think of a reason for that test to happen, then maybe the test doesn't need to be there.
It's a really common pattern in test-driven development where you start off with a test that just does setup. It just is like you create a new object, you create the new class, you call new on it and everything, and it has the right instance variables. And that's a great first test. And once you've written two or three more tests of that object, it no longer has a unique failure path
because if any of those things are not true, some other test is going to fail and you can delete it. I think that way about, a lot of people use the shoulda matcher tests, users should have many posts. I think that those similarly, they don't cost very much, they don't take much time to write, they don't take much time to run. At the same time, if the posts association were to go away,
a lot of tests would fail and one of them would be this matcher. And it's not clear to me that you're getting a whole lot of value there. It helps to try and create the minimum number of objects in a test that you need to trigger the failure. Again, we're trying to focus complexity here. The more objects you're creating, the slower the test is, the more problems you could have.
This is a really big problem with FactoryBot. If you create a lot of associations in FactoryBot, that is a really easy way to get a very slow test suite and a very easy way to get some hard to debug test errors. If a test does cause multiple, if a bug does cause multiple test failures, that is an opportunity to think about whether you actually need all of those tests.
I am not saying that if a test failures, you should just delete it. I should be clear about that. I am saying that if one bug in the code causes like 10 unit tests to fail, you might not need all 10 of those unit tests. And sometimes it is okay to delete tests, which is the most terrifying piece of advice I give to people, because sometimes they take it.
But in the end, nobody ever really takes my advice, so it's fine. You should use integration tests. You need to write integration tests. You should try to write your integration tests so that they maximally save your development time, which means use them in development to support the happy path of initial code development
when you're running through the browser repeatedly. So tests have various costs, as referenced by these emojis. Tests also have value, and you should try to maximize the value and minimize the cost. That's all I have here. I have a couple of URLs for you to check out.
Again, you can find me on Twitter, at nullrap. Please feel free to ask me questions. I have a book. If this was not enough nullrappin in your life, and you were looking for nullrappin in ebook or dead tree form, Rails 5 Test Prescriptions, which is all about how to test Rails applications, is available in beta as of yesterday.
The final printed version will come out in January. Statistically, given my sales numbers, I can guarantee that there are people in this room that have not bought it yet. So I really do think that it will save you a lot of time and a lot of effort. It goes into the new RSpec 3.7 and Rails 5.1 features.
I've even got the factory bot name change in there. Thankfully, they changed it right before I went to press and not right after, which was very, very nice of them. So you can get it at that URL, pragprog.com. Until Sunday, I think, you can get 15% off with the code RubyConfNOLA. I also do a podcast called Tech Done Right, which you can find at techdoneright.io.
The current episode is a conversation with me and Avdi Grimm, who some of you might have heard of. There are actually several speakers at this conference who have been guests on the show. It's a pretty good mix of developer technical and non-technical topics. I enjoy doing it. I think you'll enjoy listening to it. I am out of time. Thank you for spending a little bit of your time here, and I hope you enjoy the rest of your conference.
Thank you.