Testing the untestable: a beginner’s guide to mock objects
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 141 | |
Number of Parts | 169 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/21099 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
EuroPython 2016141 / 169
1
5
6
7
10
11
12
13
18
20
24
26
29
30
31
32
33
36
39
41
44
48
51
52
53
59
60
62
68
69
71
79
82
83
84
85
90
91
98
99
101
102
106
110
113
114
115
118
122
123
124
125
132
133
135
136
137
140
143
144
145
147
148
149
151
153
154
155
156
158
162
163
166
167
169
00:00
Data managementPoint (geometry)Coma BerenicesObject (grammar)Statistical hypothesis testingTheoryBitStatistical hypothesis testingExecution unitHost Identity ProtocolSocial classTwitterPhysical systemMessage passingComputer fileCompilerSystem callStatistical hypothesis testingSocial classCodeMultiplication signLibrary (computing)Student's t-testPhysical systemMereologyUnit testingAlgorithmTwitterOpen sourceRight angleBitInstance (computer science)Formal languageEndliche ModelltheorieComplex systemGreen's functionConnected spaceMilitary baseTheoryGoodness of fitYouTubeMessage passingAnalytic continuationObject (grammar)CASE <Informatik>Boss CorporationVideo gameRemote procedure callFunctional (mathematics)QuicksortClient (computing)Presentation of a groupVarianceEstimatorSampling (statistics)Mathematical analysisBlock (periodic table)Self-organizationElectronic program guideReal numberFirewall (computing)Event horizonAddress spaceSuite (music)Key (cryptography)Password
06:18
Computer virusExecution unitStatistical hypothesis testingObject (grammar)Turtle graphicsAttribute grammarTerm (mathematics)Projective planeComputer animation
06:50
Statistical hypothesis testingObject (grammar)Read-only memoryDatabaseElectronic mailing listParameter (computer programming)DataflowComputer programCodeGraphics tabletTwitterSystem callExecution unitComputer virusAttribute grammarModule (mathematics)Line (geometry)CompilerSocial classMessage passingComputer filePatch (Unix)Thermal expansionPasswordControl flowStatistical hypothesis testingTerm (mathematics)Multiplication signData managementContext awarenessState of matterImplementationWrapper (data mining)Group actionFunctional (mathematics)In-Memory-DatenbankTheoryMereologyObject (grammar)Physical systemLatent heatModule (mathematics)Gastropod shellOrder (biology)System callPatch (Unix)Graphics tabletTwitterParameter (computer programming)NumberConstructor (object-oriented programming)Real numberTrailSpectrum (functional analysis)TouchscreenVariable (mathematics)HoaxInterface (computing)Row (database)Translation (relic)BitFormal languageCodeMessage passingCASE <Informatik>Moment (mathematics)Doubling the cubeWebsiteStructural loadExistenceArithmetic meanAttribute grammarDataflowWordGreatest elementVirtual machineCovering spaceElectronic mailing listMatching (graph theory)QuicksortKeyboard shortcutBoss CorporationSet (mathematics)Integrated development environmentSelf-organizationComplex (psychology)Mathematical analysisRight angleBookmark (World Wide Web)Convex setSemiconductor memoryBlock (periodic table)Social classConstraint (mathematics)SummierbarkeitOffice suiteEndliche ModelltheorieLattice (order)MathematicsPattern languageString (computer science)Local ringGraph coloringRange (statistics)Type theoryGoodness of fitMechanism designVarianceCartesian coordinate systemSymbol tableSource codeJSON
16:33
PasswordMessage passingSystem callSequenceException handlingFunction (mathematics)Lambda calculusCodeStatistical hypothesis testingSound effectBitPatch (Unix)Execution unitWeb serviceEmailDatabaseMiniDiscIntegrated development environmentType theoryReal numberClassical physicsObject (grammar)StatisticsSocial classUniform resource locatorHoaxQuicksortSound effectReading (process)SummierbarkeitInterface (computing)StatisticsFunctional (mathematics)Statistical hypothesis testingRepresentation (politics)BitMathematicsLambda calculusStandard deviationException handlingNumberLibrary (computing)Formal languageForm (programming)CodeObject (grammar)Type theoryMultiplication signSystem callData conversionSequenceGoodness of fitPatch (Unix)RandomizationImplementationSelf-organizationElectronic mailing listDifferent (Kate Ryan album)Universe (mathematics)Group actionClassical physicsWeb 2.0EmailCASE <Informatik>2 (number)Execution unitBit rateTerm (mathematics)InternetworkingSet (mathematics)Power (physics)Web serviceMedical imagingString (computer science)Boolean algebraUnit testingSocial classServer (computing)Data structureControl flowPresentation of a groupComputer animationSource codeJSONXML
23:10
Patch (Unix)Multiplication signControl flowOperator (mathematics)Different (Kate Ryan album)Letterpress printingQuicksortSound effectMeasurementPhysical systemArithmetic meanRevision control
Transcript: English(auto-generated)
00:00
Today, we will test the untestable with Andrew Burrows. Let's give him a hand. Hello, welcome. My name is Andrew Burrows. That's my Twitter. A little bit about me and what I do. I work for AHL. I've been there for nearly 10 years. AHL is a systematic quant hedge fund based in London.
00:25
We're in the business of using computer algorithms to invest on behalf of our clients, and we do that entirely in Python. If you've got a code base written in Python or any other language managing billions of dollars and trading around the clock and around the globe,
00:41
then you're going to want to have some good tests. And for us, we find that mocks and mocking play a part in providing good tests. Hence the talk. A bit more about my NHL. We, a little plug, we host the PyData London talks. We run a coding competition,
01:01
encouraging students to get into Python. You may have seen Charlie plugging that in a lightning talk. We do a whole lot of open source stuff. We use lots of open source. We make our own stuff open source. Check out our GitHub. And my boss would love it if you were to follow us on Twitter. He's even put it on my back. I think it's in case I get lost.
01:21
If you see me roaming around Bilbao and you can tweet my boss and say you saw me. Cool. A bit about the talk. The talk's going to be really example based. There's going to be some theory and definitions and some of my own opinions. I'm not going to go into all the deep workings of mock
01:41
and the full richness of the API. Helen gave a really good talk yesterday, actually, which went deep. And this is a beginner's guide. Hers was intermediate. So if you want more meat, you should watch that on YouTube. It was really good. So I'm going to introduce mocks and give you enough to get started if you're not already using it.
02:03
All my examples are in pytest and Python 3, but that shouldn't be a stumbling block, really. You would hardly notice if you're not already using those technologies. And all my examples are available on my GitHub. So you can get them yourself and run them. Cool. So why are you here? Hopefully you're not like the guy with the beard
02:21
and you are actually writing some tests. But if you're not writing tests, I'm hoping to give you one less excuse for writing unit tests. The excuse that mocking is somehow mysterious, hard, or not for you, then I hope to spell that myth. If you are writing tests, I hope to give you the tools to make testing complex systems both easy and fun.
02:45
Cool. So I said the talk would all be very example-driven. Here's our first example. I was trying to find some common ground, and I figure after we've all been locked in this conference venue for five days together, the one thing we have in common
03:00
is a deep expertise in coding conferences. So all my examples are based on a mythical system that models coding conferences. So here's the first example. It's conference speaker, and this class does just like what I did, comes out, welcomes his or her audience,
03:21
and introduce their Twitter handle. And with this, we get our first definition, system under test. You may see this in the documentation when you read around the topic. It means exactly what you think it means. It is the code that we're testing. So this class, and specifically this Greek function,
03:41
is our system under test. Cool. So how do we test it? It's not very long, and it looks like it should be easy to test. We can just create an instance of a conference delegate. We've probably got a class for that lying around our code base already. Pass it in, see that it does the right kind of thing.
04:01
Easy peasy. Unfortunately, just like in real life, all our conference delegates are Twitter-enabled, tweeting continuously. Just for reference, when I put up a load of code, and you need to look at a bit of it, I'm gonna highlight the bit that's most important in green. So feel free to read the whole lot,
04:21
but if you want to save time, just skip to the green bit. So if we were to use this class in our tests, then we would, one of two things would happen. Either our tests would fail because either whether running doesn't have access to the insect,
04:40
maybe it's behind a firewall, or it doesn't have the connection details it needs, the passwords, keys, to access the API it's calling, or the test would pass, and we would spam all our loyal followers with test tweets. So either way, we don't want that to happen. But this is Python, right?
05:01
So it's easy. We can just make something that looks like a conference delegate. If it quacks like a conference delegate, it's probably a duck. So we can just make a test delegate class, make an instance of that, pass it into our system as a test, make sure the right kind of thing happens, and that totally works, and you can go home and do that,
05:21
and we can stop the talk right now. But the talk's meant to be about mocks, so maybe there's a better way. And it seems like it could be a lot of work if we had to do this every time we wanted to essentially mock out the clues in the name, a class or an object in our code base.
05:42
So we can use the mock library. If you're in Python 3.3 up, you can import it from unittest.mock. Previous to that, you can pick install mock to a rolling back port and import from mock. We're able to create an instance of the mock object,
06:02
pass that in to our greet function, and then we're able to assert the right kind of thing happens. And we'll look at that in a bit more depth in a bit. Cool, so let's have a quick detour
06:20
and actually look at what we've got with these mock objects. The most important thing you need to know about mock objects is that everything on a mock object is another mock object. Every attribute is a mock object. Every method is a mock object. The return value of calling it is a mock object.
06:41
The return value of the methods are mock objects. The attributes and the return value is the methods of mock objects. It's mocks all the way down until you hit the turtles. But what are we talking about as well with this mocks? I mean, when I first started doing this, I was quite confused. I worked in a team of people who used a whole load of words that all seemed a bit interchangeable
07:00
but had some subtly different meanings. So before we go any further, I just want to define some terms in case either you're coming to Python from another language or you routinely use more than one language, or like I was, you work in a team of people who just use completely random terms all the time if you're trying to understand what's going on.
07:21
The first term is a test double. Think of this like a stunt double in a film. It's a really general term. It just means any pretend object using testing. Certainly in Python, we would use mocks, and mocks are definitely test doubles. So there's a first translation.
07:40
Fakes. Fakes aren't, mocks aren't fakes. Fakes are, when you fake something, you're using a real implementation, but it's taking some sort of shortcut. So maybe you're using an in-memory database instead of the real thing. So that's not what we're talking about, but you might hear it. Dummy values. In Python, dummy values are sentinels,
08:00
and we'll cover those in a bit. We use these to pad out argument lists, to trace the flow of data through our code, and you'll see that used. Mocks, now that's what we are talking about. We've just seen one in action. It has no real kind of implementation or any pretense implementation, but it records everything that happens to it,
08:21
and you get to ascertain that later if you want to. And then closely associated, and certainly in Python, we use the same object. We use mocks for stubs. Stubs, in the kind of wider world, and I'll talk about this, a stub has more implementation behind it. It makes more of an effort to pretend to the system will attest
08:42
that it's the real deal. And we can implement a stub in Python using a mock, and we can air with a side effect. We'll see that. And for spies, spies are mocks. As far as we're concerned, that's not split hairs.
09:00
Cool, so with all that kind of theory behind us, let's go back to our example. In Python, we use action assertion mechanics, which is a slightly fancy term to mean that first we arrange our test environment, then we cause our system of the test to act on it, and then we assert that the mocks
09:20
saw the behavior we expected. And it's really important to notice that what we're doing here with mocks is we're actually asserting on the behavior. So we're asserting on what got called, not what the state of the final system was. Here, I use one of many asserts that are built into mock. I use assert called once with.
09:40
There's a whole range of them. I'm not gonna go into all of them here. You should check out the docs. The docs are really good. Or Helen's talk yesterday went through a good number of them and provided a good overview. I'm just gonna show you my two favorites. Slightly opposite ends of the spectrum. Assert called once with does exactly what it says on the tin. We assert this mock was called once with these arguments.
10:05
And if it's called less than once or more than once, or with different arguments, it fails. At the other end of the spectrum, the kind of superpower of these assertions is mock calls.
10:22
What mock calls does is it records every call to your mock and to all child mocks, and it records the order of them. So in the example at the bottom, I'm just in an interactive shell. I make a mock. I make various calls to it and its children. And then I get to see what the mock calls are, and I could assert on that.
10:41
And you notice that it's found everything I did, and it knows the order they happened in. So that can be really powerful, and there's pretty much nothing you can, you can't test with that. Another flip side, or another downside, I suppose, of it being mocks all the way down,
11:03
is that if you pass a mock object into your system to test, and maybe there's a typo in there, so instead of calling speak to, it's calling spoke to, or it's just calling a method that you've not implemented yet, it doesn't exist. The test may still pass, because mock would create those methods,
11:20
those attributes on the fly for you. If you want to limit your mocks to only having the same interface as your real code, you can do that by specifying a spec in the constructor of the mock. And this is the interface, essentially, you want for your mock. In the example, at the bottom, I just interactively create a mock that has the same spec as a conference delegate,
11:43
then try to get it to snore loudly in the session, and it fails, because you definitely can't do that. Cool, and let's have a look at a harder example. Before, we saw our conference delegate class, and we highlighted that we couldn't use it
12:00
on our other tests, because it was tweeting. So we just mocked it out completely. What if we want to actually test this class? So we've got the same problem, it's gonna tweet all the same problems. So how can we do it? Well, we're gonna use a mock, of course, but last time, we were able to just pass our mock objects into the system of the test.
12:21
This time, we can't do it. The thing we want to mock, which is almost certainly a simple tweeter, isn't passed in, it's imported. So how are we gonna get our mocks into this code? We do it using patch. Patch is a great tool. What it does is it replaces the specified object
12:41
with a mock, and then at the end of the patch, it puts it back to normal, so it completely covers its tracks. So what we're able to do is, we're able to patch simpletweeter.tweet, call the system of the test, and then, assert, as before, and that mock gets created and injected into the right part of the code.
13:04
The one gotcha is that you have to get that string, that specification of what you want to patch, correct, and it has to match how the code is used in the system of the test. So if, for example, instead of importing simpletweeter,
13:22
we did from simpletweeter import tweet, then not only would the call site change, but so also would the patching. In my experience, working with a number of people, introducing them to mocks, this is a thing that people always struggle
13:41
to get their head around. It's worth taking a moment to think about why it happens. Essentially, when we're doing from simpletweeter import tweet, we're creating a new variable in this module called tweet that is a reference, the code in the simpletweeter module. But then, if we were to patch simpletweeter.tweet,
14:00
we are replacing the code in simpletweeter, but our reference, our tweet variable in this module, still points to the old code, so we're patching the wrong thing, so we have to get it right. You don't need to get it wrong many times before you spot what you're doing and learn, but it's worth looking out for and worth thinking about.
14:22
Another thing to notice with this example that I was showing is that I'm using a sentinel. I mentioned before that sentinels are the kind of Python word for dummies, and what we do here is we pass a sentinel into the initializer and we check that it comes out in the call to tweets.
14:41
It's a bit like kind of dropping balls in the top of some sort of machine and seeing which hole at the bottom they come out of, and can be really handy. Another use for them is just to pad argument lists to calls when the values aren't important to the code you're testing. I really love sentinels, by the way. They're brilliant, you should use them, they're great. Okay, let's dig a bit further into our fictional application
15:04
and look at the wrapper simple tweeter that we've built around our tweeter API. What it's doing is it's adding just a simple retry functionality on top of the underlying API. The underlying API would return false if it failed to send a message, whereas you can see this code is picking up on that
15:23
and trying five times to send a message regardless. We can write some tests for this. I've said before that the return value of every call to a mock is another mock, but you can override that, and you can actually set the return value.
15:41
So here I set the return value to false, and that means every call to tweet will fail, and then I can assert that it tries, but fails and ultimately gives up. Also notice in this example that I've been using patch this time as a decorator
16:01
instead of a context manager as it was before. These two things are completely interchangeable. You can use them, they work exactly the same. One reason you might use patch as a decorator, especially if you're patching more than one thing, is if you're using it as context manager, your code would slow, as you had more and more patches,
16:22
your code would slowly disappear off the side of the screen whereas, of course, stacking them off as decorators, you don't have that problem. But they're interchangeable, you can do both, they work the same way. Here's a lot of kittens. Oh, don't you love kittens? Cool. All right, what if we want to test
16:41
this retry functionality? We don't want it to always fail. And we want it to succeed sometimes, so we want it to succeed on the third attempt. We can't do that return value, what we can do is side effect. So I set the side effect to a list, and the first call's gonna get a failure, the second call's gonna get a failure, third call is gonna be true, a success.
17:04
Side effects are like return values, but they're basically just more magical. So there's three types of magic supported. There's the sequence we've seen, where subsequent calls to the mock return different values from the list in turn. You can set the side effect to an exception,
17:21
so every call to the mock raises an exception. That can be really handy if you want to test failure scenarios without having to actually manufacture a failure somehow. And the final and most powerful form is setting the side effect to be equal
17:40
to a functional lambda sum of a callable. And the return value of that function becomes a return value of the mock. I like the first two a lot, with no caveat. The third one I do like and is powerful, but I just want to kind of raise a little red flag that if you're doing this a lot, and you might want to just look at your code and just make sure that there's not some better way
18:01
to structure it to make it easier to test. It seems like you're having to use some quite, if you're doing this in all your tests all the time, maybe you're, you know, maybe there's some lighter, you can construct your code differently so it's easier to test. Having said that, it is useful and powerful, so let's have an example of it in action. Let's give our conference delegates the ability to chat to each other over coffee.
18:22
So here we've got some sort of implementation where our delegates can hold a conversation and do whatever they like. And we can test this by creating essentially a stub. I mentioned stubs before, they're kind of mocks but with more faked up implementation. We create a function, strange speak to,
18:44
which holds one side of the conversation. We set that as a side effect for our stranger mock, although now it's probably a stub, doesn't really matter in Python, we just call them all mocks and we pass that into our test. And we can test that code if we want to. But this can look a bit, I mean,
19:01
it's not the prettiest test, I would argue that it looks like, you know, you have this quite nice sort of setup, run it, assert, but then in the middle of it you just got this big chunk of code and what's that doing there? So a little plug for my own library mock extras, it's built entirely on top of mock, it doesn't implement its own thing, it just makes it kind of prettier and nicer
19:23
to create side effects on your mocks. So it gives it this kind of fluent interface so you can say when this particular mock is called with these particular values, then return this. And these are all quite simple examples, it supports some quite complex things. If it looks nice to you, check it out,
19:41
it's on our GitHub, it's on Read the Docs. I mean, to me it just makes the test read a bit more like a story and makes the test slightly better documentation for the code we're testing. But, you know, your mileage may vary. So, we've now learned about mocks and patching,
20:03
sentinels, and we're ready to go and save the universe. But before we can do a warning, beware the dark side of overmocking. What do I mean? So I think most people will agree, it's uncontroversial. We will mock third-party APIs,
20:22
we'll mock out calls to mail servers, web services, we'll mock out things that make our code undeterministic like randomness and time. And we won't mock out stuff that is built into the language. We're not gonna mock out numbers and lists, tuples, strings, and stuff that in your world is as good as built-in.
20:41
So if you're doing lots of NumPy pandas, don't mock it out, leave it in there. And there's probably, in whatever you do, there might be other types, which is, you know, like the oxygen you breathe and you shouldn't be taking them out. But it still leaves a lot of other code, and what do we do with it? If you read the internet, you'll find people ranting on all sorts of things,
21:01
as you know, but there's two schools of thought on this particular problem. There's like a classical TDD approach, which says that we should only mock or stub or fake objects when we really have to, and we should try to use the real objects whenever we can. And there's a mockist approach, which would value having unit tests, testing a single unit as being the most important thing.
21:21
So they would mock everything. Where do I stand? Kind of pragmatically towards the mockist end without being crazy about it, but with a real eye on not wanting to make my tests overmocked. Overmocked tests are brittle to changes in our code, expensive to maintain, but tempting because they're easy to write and boost coverage statistics.
21:45
A quick example, we're gonna add a feature to our conference delegate, which is the ability to rate talks. Rating talks is good, it's good for speakers, it's good for conference organizers. You should rate this talk, but be kind, it's my first ever one. You should maybe consider using the ISO standard for talk rating, which, as we all know,
22:01
is the number of kitten pictures in the talk multiplied by the sum of how useful it was and the clarity of the presentation. So how do we test this code? We could go crazy and we could write, we could put mocks in instead of our numbers. This is a contrived, horrible example. You would never do this, but to show how bad overmocking could be, let's look at the worst ever case.
22:22
So we pass mocks in and then our asserts end up looking like some kind of weird, twisted, inside out representation of the original code. It's a horror to behold. It'd be much better to test our code, in this case, with meaningful examples and edge cases. And you should do that when mocks don't fit.
22:42
Okay, to summarize, do write tests, do use mocks, they're easy and fun. Patch is brilliant, love sentinels like I do because they're awesome. Use function side effects as an occasional treat. They are powerful, but don't overdo it. Never overmock, love each other, and thank you very much.
23:13
I believe that we have some time for some questions, so do you have any?
23:22
Oh yes, the disclaimer is, the question was, don't I have a large disclaimer? And the answer is, yes I do. It's a side effect of working in the finance industry, I'm afraid. But basically, to summarize it, it says, if you use this talk as investment advice,
23:43
don't, but do read it, don't break out my summary. I don't want to undermine small print. Okay, any other questions? Okay, maybe it's an easy one,
24:01
but I was just wondering, what's the real difference between magic mock and mock? I've seen a sort of overlock. I kind of skipped over that. So here, I use magic mocks. Magic mocks are just the same as mocks, but they also mock most of the so-called magic or dunder methods, so they're kind of the operator overloading in Python. So because here I wanted to test
24:22
that things were multiplied by each other and added to each other, and that's the dunder mall and dunder ad methods, I need to use a magic mock. As a quick aside, Patch always puts magic mocks in, but really, you don't ever need to worry about the difference. They just kind of work.
24:42
Okay, any other questions? Okay, if you don't have any questions, this talk will be followed by the lightning talks, so in 15 minutes. So let's thank once more, Andrei. Thank you. Thank you.