We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

A Magic Implementation of NotImplemented

00:00

Formal Metadata

Title
A Magic Implementation of NotImplemented
Title of Series
Number of Parts
141
Author
Contributors
License
CC Attribution - NonCommercial - ShareAlike 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
[Dirty Equals] is a new python library by Samuel Colvin, the creator of Pydantic. It will transform how you write tests, especially for APIs. I made some contributions to it, which forever changed how I thought about `NotImplemented`. I thought it was a placeholder for unfinished work and unexpected use cases. I thought the language quirks it created in equality comparison were annoying. But in **DirtyEquals**, it’s a magic way to transform Python’s built in equality operator... And that changed how I think about language quirks, full stop.
114
131
ImplementationJames Waddell Alexander IIExecution unitSoftware testingInformation securitySoftwareCybersexSoftware engineeringComputer animationLecture/ConferenceMeeting/Interview
Library (computing)Computer networkSource codeComputer animation
Library (computing)InformationComputer networkProjective planeMaxima and minimaMereologyBitCASE <Informatik>AuthorizationCodeMultiplication signLibrary (computing)Equaliser (mathematics)
Core dumpLibrary (computing)ResultantRun time (program lifecycle phase)IntegerError messageElectronic mailing listComputer programmingType theoryEndliche ModelltheorieProgramming paradigmMathematicsComputer animation
Dependent and independent variablesOpen sourceMathematicsQuicksortCodeOpen sourceBitSoftware repositorySoftware maintenanceVideo gameDependent and independent variablesMultiplication signChecklistLibrary (computing)Computer animation
Software testingOperator (mathematics)Equals signLibrary (computing)CodeExecution unitContext awarenessDependent and independent variablesContent (media)DatabaseRevision controlTable (information)Type theorySource codeMusical ensembleCuboidGreatest elementProjective planePersonal identification numberOperator (mathematics)BitType theoryLibrary (computing)Software testingRun time (program lifecycle phase)Home pageEquals signComputer animation
CodeLibrary (computing)Execution unitContext awarenessDependent and independent variablesSoftware testingContent (media)DatabaseRevision controlTable (information)Type theoryPauli exclusion principleMaximum length sequenceDependent and independent variablesElectronic mailing listLengthCASE <Informatik>BitLine (geometry)Library (computing)Software developerSoftware testingQuicksortString (computer science)Pattern languageOperator (mathematics)Equals signComputer animation
Client (computing)Operator (mathematics)State diagramControl flowInformationComputer networkObject (grammar)Pairwise comparisonSoftware testingUniversal product codeDifferenz <Mathematik>IntegerObject (grammar)Line (geometry)Computer fileBitDependent and independent variablesOrder (biology)String (computer science)Right angleLibrary (computing)Regulärer Ausdruck <Textverarbeitung>HookingLogicFerry CorstenTracing (software)Statement (computer science)Pattern languageClient (computing)Slide ruleEquals signGoodness of fitQuicksortPairwise comparisonCASE <Informatik>FlowchartCodeSocial classSource codeAvatar (2009 film)Expected valueDynamischer TestComputer animationLecture/ConferenceSource code
Library (computing)Parameter (computer programming)String (computer science)Matching (graph theory)Equals signSocial classError messageLogicBitObject (grammar)Regulärer Ausdruck <Textverarbeitung>Core dumpCASE <Informatik>Slide ruleComputer animationSource codeLecture/Conference
Dependent and independent variablesSoftware testingType theoryHash functionLibrary (computing)Validity (statistics)String (computer science)LogicKey (cryptography)Regulärer Ausdruck <Textverarbeitung>Social classWriting1 (number)Statement (computer science)CASE <Informatik>Electronic mailing listQuicksortCybersexCore dumpVideo gameComplex (psychology)Process (computing)Right angleBitFunctional (mathematics)IP addressComputer animation
Computer networkFormal languageCodeMathematicsProjective planeDivision (mathematics)CodeStructural loadComputer animation
Task (computing)Operator (mathematics)Exclusive orDivision (mathematics)Hacker (term)CASE <Informatik>Operator (mathematics)MathematicsHypothesisParameter (computer programming)Software testingSocial classRange (statistics)LogicHookingNumberWordSlide ruleObject (grammar)Library (computing)BitDivision (mathematics)System callRandomizationCombinational logicDifferent (Kate Ryan album)Information overloadApproximationLecture/ConferenceComputer animation
HypothesisSlide ruleSource codeMaxima and minimaCodeLibrary (computing)Link (knot theory)BitoutputRange (statistics)Combinational logicLecture/ConferenceComputer animation
Execution unitPattern languageSoftware testingObject (grammar)Software developerMessage passingEquals signPiInformation overloadProjective planeShift operatorElectronic mailing listPower (physics)BitGoodness of fitLibrary (computing)String (computer science)FreewareError messageProgramming paradigmLogicTracing (software)InformationSlide ruleOperator (mathematics)MultiplicationOrder (biology)Division (mathematics)Subject indexingLecture/ConferenceMeeting/InterviewComputer animation
Transcript: English(auto-generated)
Thank you. I'm very encouraged to see this many people care about testing. I was worried not everyone would show up. Thank you for coming. Yeah, I'm Alexander. I'm a software engineer. I work at Palo Alto Networks, which is a cybersecurity company.
And this talk is about, oh, that doesn't work. Sorry, give me one sec. There we go. This talk is about my love affair
with Pydantic and ultimate rejection. I really, really like Pydantic, and I definitely feel like I had a crush on the source code when I first found out about it, and so I really wanted to contribute to try and be part of the project.
But when I had a look at the project, I realized it wasn't really the right time to contribute. They had kind of shut it down. And so I had a look at some other projects that the author had written, and I found one called Dirty Equals. So this talk is going to be about my story of contributing to Dirty Equals and things that I learned from that.
But just to kick things off, in case you don't know what Pydantic does, this is like a really simple, minimal example. And I wanted to focus a little bit on how Samuel Colvin, who wrote Pydantic, thinks about code, because that's kind
of the thing that I learned the most from contributing to his projects. And I think what he does is takes things that are internal to Python and changes them in a small way in a library. That has very interesting results. So in Pydantic, sorry, I don't know how to change that,
but I'll just keep going. He is very good at, well, yeah, sorry, here he's made Python check type hints at runtime. So you can see here that that B value to my model is, sorry, the A value, I mean, is supposed to be an integer,
but later, when it says my model A equals 100, I've actually passed it a list of integers. So I then get an error, which says, oh yeah, this works, that it should be an integer, but you've actually given it a list of integers.
So this has changed how Python normally works. Type hints are normally just type hints. They're just hints, and they don't get checked at runtime. But if you use Pydantic, your program will actually error if the type hint is wrong. So something does happen at runtime. And this kind of paradigm of making stuff change internally
to Python in a sort of contained way is what I think is really cool. So, but just first of all, I wanted to talk a bit about what happens when you come across a new open source library and how to figure out if it's worth contributing to, since I think people don't always talk about this, but it can save you a lot of time. If you don't do this, I think you can do a lot of work
on a pull request, and then nothing happens, which is quite disappointing. So this is sort of my checklist. The first thing I would check is when was the most recent commit? Hopefully it was a merge commit, so it was someone's merged in a pull request from someone else, rather than it just being the maintainer of the library, adding their own code.
I also like to see what the activity is like on the issues and get a sense of what's happening with the library, and then I'm not going to name any names, but on some repos I've seen really, really frightening responses from maintainers, and I think life's too short. So these are things that I think are worth checking before you make a open source code contribution.
But yeah, onto dirty equals, which is the project that I did end up contributing to after running some of those checks on Pydantic.
Dirty equals is a little bit like Pydantic for testing, so if you want to do cool different things with your tests, this library is good for you. It ultimately helps you make tests that are easier to write, I also think a bit more fun to write. I personally find writing tests can often be a bit boring and tedious. To me, this is a way to add some syntactic sugar that makes it a bit more fun.
And it fundamentally lets you misuse the equals operator in Python. So just like Pydantic let you check type hints at runtime, this is the kind of trick pulled in dirty equals. It changes the way that equality works in Python, which is what it says here on the homepage of the library
in that box at the bottom. And yeah, it's really useful when you're testing responses back from APIs, which I'll explain in a bit more detail later, but I think probably a lot of Python developers work with APIs, and so that's kind of
one of the main use cases for this. This is just a very simple example of the library syntax. And yeah, here you can hopefully see how equality is getting misused. We can write a really simple test case and do these sort of unusual checks with new objects from dirty equals. We can check in the first line there,
does the list have length three, and then does it contain the string A? And you can do this with quite interesting syntax with like an ampersand there, a pipe operator and not equals, and you can chain and combine these. So this is a simple example now, but hopefully you'll see as we go through the talk
how you can kind of build on this and make it all a bit more complex. Then this is probably the kind of pattern you'd be most likely to use in actual production code where you are writing an API test. You here are gonna maybe mock out a client
and then test that some JSON is equal to something like this. And I think hopefully here you can see how versus just kind of testing this is exactly equal to the kind of response you're expecting, which is what at least I used to do before discovering this library. You can write much more dynamic test cases in a way that's quite readable
using these dirty equals objects. So like here on that avatar file line, you can check that the string matches a particular regex. So I think that's probably a nicer pattern for testing. It's also gonna make things arguably a bit more modular if you wanna reuse some of these dirty equals objects when you're expecting these kind of responses.
And you get a pretty nice diff in pytest. So this is an intentionally failing test where I gave it the wrong value for the avatar file. And you can see in the diff, you get the dirty equals object back. For me, this is way easier to read. Not right now. Normally it's way easier to read than just seeing like a pretty long pytest trace back
with some regex that I'm not really sure where it is. So I think the advantage of making things more declarative and explicit is that when you get around to failing tests and have to fix things, you get these quite nice readable objects. So how does this really work though under the hood? If you think about it, this feels like bad Python.
It's not bad Python. I wouldn't be giving a talk on it. But normally equals equals is a strict equality check. So like that second line there should definitely fail. Hello definitely does not equal true for a whole bunch of reasons. But if we use this is true like thing
from dirty equals it passes because it's a truthy string. It's not an empty string. So something funky has happened in order for this to take place. And that's what I'm gonna kind of move into. For the rest of this talk, I'm gonna get a bit into how equality actually works in Python, which I did not know before contributing to this library.
So it turns out that when you have a statement like X equals equals Y in Python, the first thing that happens is that X checks, can I compare itself to Y? The thing on the right. And it calls the done equals method,
which is what you can see up here at the top. And the important thing from this slide is that this code never runs, which is why if you run this Y equals doesn't get called. And the reason for that is that this returned true here. So it exits here.
This might make more sense if you have a look at this slide. So in Python, equality is designed such that if you return not implemented from the thing on the left, it then proceeds to check equality at the thing on the right. So here we have X equals equals Y. The first thing that fires is done equals here.
You return not implemented in Python if you don't know how to compare yourself to Y. So X is saying, if I don't know how to check I'm equal to Y. If Y was an integer here, rather than some weird class I wrote, there would be a quality logic here to check if it's the same kind of integer,
assuming X was also an integer. But the really important thing here is that Y equals gets called. So that basically means if we write a custom object in Python, which is what all the dirty equals objects are, things that aren't in the standard library, then whatever we write in this done equals method here
is going to control how the equality logic works. So using this approach, we kind of get to hook in to how to do equality checks in Python. And this is from the C Python source code for how PurePath works, just to kind of show you what a sort of good blueprint by someone who knows Python
a lot better than me looks like for this. So if a PurePath object is trying to compare itself to another object, PurePath is designed so that it doesn't know how to compare itself to anything that isn't a PurePath object. So this is basically saying, if it's PurePath, then go here and do the PurePath kind of comparison
that is all good. But if it's not, the object is getting compared to, if it's not PurePath, then return not implemented. So then we go to the object on the right and check if that knows how to compare itself to the object on the left. So just to wrap this up, hopefully this makes sense now.
This is a kind of flow chart of what happens. If X knows how to compare itself to Y, it just does the comparison straight away. If it doesn't, then we see if Y can compare itself to X. And that's this bit here is what I'm gonna dive into next. Why would we care about this? This, when I was first looking into this, seemed pretty weird, equality works well in Python,
why would you wanna change it? But you can now use this to write your own comparison logic and make things give back that they're equal or not equal based off your own logic. So a really simple example of that is here.
And the dirty equals library is a little bit more complicated than this, but this is just some kind of hopefully clearer examples to follow. So this is how you would create a class in Python that you say is equal to the object on the left if the object on the left is bigger than five. So this other argument here that gets passed to the done equals method
is the object on the left. So that's why two equals equals X here is false and six is true because obviously six is bigger than five. And you can also do this for not equals. So the same kind of logic that I've been talking about
also works for not equals. So the idea here is Beyonce is only equal to herself. So hello is not equal to Beyonce, you too is not equal to Beyonce. But it is true that
sorry it is false that Beyonce is not equal to X because Beyonce is indeed equal to herself. So this is a more realistic example for how you would do something like the is,
I don't know how to say this out loud, is stir, is str class that was in dirty equals. So here when you build the class you give it a regex argument and then in the check I'm catching the case where it's not a string
because this from the name hopefully it's gonna be obvious to people that it's only for a string. I'm gonna raise a value error but if it is a string then I have my own logic here to basically say yes it's equal if there's a regex match but otherwise it's false. So that's the kind of core idea of how you would implement this.
In the next slide, trying to work with these dramatic pauses. This is how we would use it. So this regex here is basically saying alphanumeric only which means that I get false for only this one at the end. All the other ones are alphanumeric strings
so they pass fine. So this is, yeah we've kind of done it now. We've done our own way of writing an assert statement with a quality that does really, really funky stuff. Just in case this is still feeling very confusing this is the kind of core logic for how this all worked.
And next you might be wondering okay this kind of seems fun, this seems interesting but would I actually wanna use it at work? So I thought I would give you a concrete example of some tests that I've rewritten using this that have made my life easier.
Because I work in cyber I often have to, well not in this job but in my old job we've had an API people were paying for and we were sending back data like IPs and hashes. And it was pretty important we'd send back the right kind of hash or make sure that an IP was valid otherwise obviously people are paying for it
but also it would break things. And this I think is a much nicer way to check an API response. So suppose you get an API response back that has JSON that we turn into a dictionary that has a hashes key and IPs key. Where I worked often we had mixed
SHA-256, MD5s and SHA-1 hashes but we now were sort of transitioning to only send back SHA-256s but often there was some bad hashes in the response. So this is really easy to test in dirty equals and by bad hash I mean one that's not the right kind of type. You can just do here fail the test
if the list contains a MD5 hash and also check just what that was that it contains a SHA-256 hash. And then another thing that I think is pretty cool is the way this library is designed it's really easy to create your own logic
to run these tests. So suppose you don't like his hash or you don't like his IP for some reason and then you want to do your own stuff you can use this function check class here and here you just pass it a callable so here check IPs that I've defined above and then you can do whatever logic that you want here.
Here I've just done a generator comprehension that basically says make sure everything in the list is an IP so this will fail if I had any kind of invalid IP here. But there's a lot more sort of complex stuff that you could build up here and this is for me where the writing test bit
becomes a bit more fun. This is I think quite a lot more fun than just normal sort of I don't know I can't really be bothered to write IP validation logic in most of my tests personally maybe everyone else is more diligent but that's kind of what it would be like at work and so I'm now just gonna recap on some of the things that I took away from this
and I hope might be useful for you after that I'm gonna move on to a couple of other kind of fun things that I think you could take with some of the stuff I've discussed. So the main thing for me was if you really like someone's code style or their way of thinking about code to try and get involved try and contribute to one of their projects even if one isn't active.
For me the main thing I learned doing this was that it is kind of okay to change Python internals if you do it intelligently and I feel like I learned from this code base how to do that well and I don't think I would have learned that if I hadn't contributed to it and I think there's a lot of other stuff
you could do with that so I was gonna show you an example next about how PathLib actually does this. I gave a dry run of this talk to some colleagues and they told me PathLib does something similar which I had never really realized. While this loads PathLib basically overrides the dunder true div method in Python
so it kind of, hacking is the wrong word. I'm gonna stick to overloads. Here true div is what happens when you'd say something is divided by something else. It'll be a bit clearer on the next slide but in PathLib it basically says if you're dividing an object by another object
just join the path which is how you get, should I, gonna do this slide first, really cool syntax like this. So you get this whole path gets put together by doing this call whereas this operator normally in Python
would mean you have to divide stuff so I think this is a really nice way to do pretty cool syntax tricks and this is a really good little example in my opinion in PathLib but I think this kind of approach is something you could definitely put into other libraries
and I also think it's quite a cool feature of Python that you have these dunder methods that are exposed and you can hook into and change things. One last thing that I took away from the workshop on pytest a bit earlier is using hypothesis for testing and I think this would play quite well with dirty equals as a combination
so here's just like a very quick example I put together of how that could work. If you're not familiar with hypothesis what it's basically doing here is saying given some hypothesis give me a random bunch of numbers within this range 100 to 102, 10 to 12 and apply them to these arguments X and Y
so this is kind of the same syntax as parameterizing fixtures in Python but the difference is that you don't specify the parameters instead hypothesis gives you a bunch of random data so the idea is that you have a more robust test since you might be checking for more edge cases but I think if you then add in some dirty equals logic
so like here I've said, sorry, that if is a prox is a dirty equals class that lets you check a number within a range so it's gonna say approximately 110 with like delta of four so max 114 minus 106
I think that's quite a nice combination I think that lets you kind of have a bit of breathing room around the random stuff that gets input and I often if I'm on a bigger code base I write a ton of fixtures and parameterize them and it gets really messy and I thought this is quite a lot cleaner so yeah that's pretty much the end of the talk
I'll share the slides afterwards here are a lot of the resources I use for it including some of the source code links if you wanna kind of dig in and really get into it and I think I will finish there thank you very much for listening.
Thank you and I admire how you were calm when the slides were not working I would freak out completely
I also have a question actually so I use Pytest a lot and if you assert that something is in the list for example and the assertion fails it will tell you nicely this item here was missing on index three or whatever but if I always just assert equalities
with custom equality methods how do I propagate more information to the actual developer who's running the test that why the equality failed other than returning true or false or not implemented? Yeah that's a very good question I think it's a bit of a paradigm shift with this library
is that you don't get that kind of Pytest trace back so you actually get a really unhelpful error unless your dirty equals object is very clear so if you did like list equals equals contains X and then your list didn't have X in the test failure message
it will say it failed because it didn't contain X so I think that's pretty clear to read but if you wrote a messier dirty equals object like it should just contain strings then it's not very useful and one thing I should say is that if you use dirty equals it doesn't stop you from using everything else in Pytest so if it makes more sense to use the pattern you describe
you can just use that for that test. Thank you. Hi I have maybe a bit of a stupid question but considering all these magic methods that you mentioned could we go even more postmodern and do it like dirty subtraction, dirty power
dirty multiplication doesn't have to be equals right? Yes so you could but you probably have to be smarter than me so I mucked around with them after doing this talk and it was kind of harder than I thought but I think the true div example is like a simpler way to do it where you don't have to get into the logic
of the order and not implemented you can just overload a method so I don't know what the dunder method is for multiplication but I mean why would you want to do that out of curiosity? How would you want to change multiplication? Just because it's fun no? Yeah it is fun. So if you want to do that I would recommend having a look at the Python like object operator docs
that's one of the resources in my slide where it has all the dunder methods that you can overload. Also if you find Maxim Danilov he has a very cool project where he's doing something similar to that yeah.
Let's give a big applause to Alexander.