We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Bulletproof Python – Writing fewer tests with a typed code base

00:00

Formale Metadaten

Titel
Bulletproof Python – Writing fewer tests with a typed code base
Serientitel
Anzahl der Teile
141
Autor
Mitwirkende
Lizenz
CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 4.0 International:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
A fully typed code base requires less test code to achieve the same level of confidence in its correctness. We'll analyze specific code examples and see how dependent types and exhaustiveness checking make certain classes of tests obsolete.
114
131
MultiplikationsoperatorTypentheorieZeichenketteFunktionalFolge <Mathematik>DifferenteCodeParametersystemComputeranimationVorlesung/Konferenz
FehlermeldungZeichenketteTypentheorieAusnahmebehandlungFunktionalE-MailSystem FProgrammierungKomponententestSoftwaretestHydrostatikAdressraumBildschirmmaskeParametersystemQuellcodeGeradeKlasse <Mathematik>ProgrammfehlerAnalysisMessage-PassingCASE <Informatik>PasswortSicherungskopieDatenbankElektronische UnterschriftMathematische LogikSystemaufrufProgrammiergerätObjekt <Kategorie>ComputersicherheitInzidenzalgebraInformationsspeicherungAbfrageInstantiierungRauschenBimodulVererbungshierarchieVersionsverwaltungKontextbezogenes SystemValiditätOrdnung <Mathematik>SystemzusammenbruchKategorie <Mathematik>Regulärer GraphEin-AusgabeSoftwareHash-AlgorithmusWeb-SeiteCodeResultanteSchnittmengeUmwandlungsenthalpieTabelle
ART-NetzNotepad-ComputerCASE <Informatik>CodeFehlermeldungCASE <Informatik>PunktENUMDatenstrukturFunktionalResultanteSoftwaretestATMDimensionsanalyseTypentheorieZahlenbereichDifferenteInformationsspeicherungMatchingAutomatische IndexierungQR-CodeBildschirmmaskeSoftware EngineeringFamilie <Mathematik>Quick-SortInformationsüberlastungElektronische UnterschriftBefehl <Informatik>Physikalische TheorieVerzweigendes ProgrammZeichenketteRohdatenLesen <Datenverarbeitung>ProgrammierungBimodulElektronische PublikationMereologieKryptologieProgrammiergerätGüte der AnpassungÄhnlichkeitsgeometrieRückkopplungInternetworkingEDV-BeratungMultiplikationsoperatorSchaltnetzMultiplikationDeskriptive StatistikStandardabweichungSoftwareentwicklerOrdnung <Mathematik>BinärcodeGeradeProgrammbibliothekKlasse <Mathematik>SchnittmengeImplementierungUmwandlungsenthalpieParametersystemComputersicherheitStatistische HypotheseQuellcodeVersionsverwaltungFunktion <Mathematik>KomponententestService providerKoordinatenIntelligentes Netz
Regulärer Ausdruck <Textverarbeitung>ProgrammBitrateCloud ComputingIkosaederKonstruktor <Informatik>GenerizitätTypentheorieParametersystemKlasse <Mathematik>DifferenteKonfigurationsraumGanze ZahlCodePunktPoisson-KlammerStrategisches SpielSchnittmengeDokumentenserverPasswortWort <Informatik>BitBimodulInformationsüberlastungPerfekte GruppeVarianzComputervirusProdukt <Mathematik>DickeThumbnailRechter WinkelGebäude <Mathematik>FunktionalElement <Gruppentheorie>FehlermeldungMigration <Informatik>Physikalisches SystemArray <Informatik>Meta-TagFormale SpracheÄußere Algebra eines ModulsMultiplikationsoperatorResultanteInklusion <Mathematik>ÄhnlichkeitsgeometriePi <Zahl>DatenbankProgrammbibliothekVorlesung/KonferenzBesprechung/Interview
Klasse <Mathematik>Vorlesung/KonferenzComputeranimation
Transkript: Englisch(automatisch erzeugt)
I don't remember exactly when I wrote my first Python code, but it must have been around the time when Python 2.4 came out. Back then, the situation was that there were two different types of strings. There was the string type, which essentially was a sequence of bytes, and there was a
Unicode type, which was a sequence of Unicode characters. The confusing thing was that both of those were used to represent text. That means if you had a function that processed strings and you gave it the correct argument
type, the correct type of string essentially, you didn't have any problem. But if you had a function that processed Unicode strings and you gave it byte strings, then you would either have a crash or you would get garbled output, like in the example here. In order to solve this issue, I started to essentially validate my input arguments.
I checked the type of my input arguments. I said, if the thing I get is not a string, then I'll exit the program with an exception. Back then, I wasn't a professional developer, but in a professional setting, you probably
would want to write a test for this. So you have this validation and you test your validation. So this works, this is effective, but it creates a lot of code, a lot of noise in the code, and arguably it's not very Pythonic to do it, actually. To check, to do this instance check on the function arguments.
Fast forward a couple of Python versions later, we got type annotations. Type annotations allow you to specify the kind of type you expect for a function argument or even for the function return type. And the cool thing about type arguments is that it allows you to use something called
a type checker. A type checker will look at your source code and will tell you whether it thinks your source code is reasonable and it will work or not. It will check that you don't pass any illegal types to a function, for example. So if I have this function, prepend foo, and it says it accepts a string and it returns
a string, and I call it with a string, the type checker will report no issues. It's fine. But if I have the same function but I invoke it with a byte string, then the type checker will give me a lengthy error message and say, look, I expected a string but you gave
me a byte string. Something may be wrong. Type checkers, they are interesting because they actually run, they don't have to run your program. They run before your program runs. So they are a form of static analysis. Whereas the test that we saw before needs to execute a program somehow to find this
error. And type checkers are actually able to eliminate some classes of bugs. So, now that we are all on the same page, what type annotations and type checkers are, let's see what they can do for us. Let's say we have a data class that represents a user.
A user has a name, an email address, and a password, and all of those are represented as strings. I can initialize a user object like this. I can just pass a name, an email address as a string, and my secret password as a string.
I also have a function, I call it send message, and it sends any message to a specific user. So the function takes a recipient argument, I call it two, and it takes a message. Both of them are strings. I think it would be fair to assume if you just saw the function signature that you could
use the user's email address to send a message to that user. But it would also be fair to assume that you could use the user's name. Maybe it's a unique username, and maybe it can be used to send a message to that user. So both of them are valid calls because the function signature expects a string, and
both the email address and the username are also strings, so anything is valid for the type checker. It's quite unlikely that your function will support both, so let's assume that it requires an email address, so one of the calls will fail, and one of them will succeed.
If we apply the logic from the introductory example, and we start adding validations, we would probably do something like this. So we would check if the two argument is an email address, that's fine, but if it's not an email address, then please exit with the according value error. And of course, we're professionals, we test our software, so we would add a unit test
that reproduces this error. Yeah, now aside from the fact that it's quite hard to actually check whether a string is a valid email address or not, we again end up with the problem that this is not very Pythonic, but there's a nicer way. So the typing module provides something called new type, and new type essentially allows
you to make a type more special. So this line here says an email address is a special type of string. We can now change our user to use an email address as an email property, and we can also
change our send message function signature to require an email address argument instead of a general string. When we initialize a user object, we can no longer pass a regular string, we must use this new type that we defined, because the user data class requires an email address.
So we need to wrap it more or less. And when we call the send message function with a regular email address, the type checker... For the type checker, everything is fine, that's no problem. But when we try the same as before and we accidentally use the username, maybe because
the function is badly documented or poorly named, then the type checker now would report an error and would say, I expected you to give me an email address, but I just got a regular string. So new type makes the type more special.
That means every email address is a string, and that way they can be used interchangeably, but it's not the other way around, so not every string is an email address. Does that make sense? I brought another example for new type. One programmer on the team, they coded the data class for the user and all that, and
the other programmer, they want to add persistence, so they write a function called save to database. The function just creates a SQL query and stores the user's email, name and password in the user's table.
When we execute this function, we actually created a security incident because we just stored the user's password in plain text into database, and this can be quite annoying because you might already have backups of your database and you kind of have to clean up your backups, not ideal.
But again, new type would help us to at least make this kind of error less likely. We can define our own type for hashed passwords. We can say a hashed password is a special type of string, and again, we adjust our user to use a hashed password in the password property, and when we try to initialise the
user object like before with a plain text password, a type checker would complain that it expects a hashed password, but we passed a regular string. So at least this would force the programmer who initialises the user object to at least
think about how about, it would force the user not to forget thinking about hashing the password before storing it somewhere. So I hope you agree that this is handy. But the takeaway is that new type does not perform any validation.
It just makes a type more specific. As I said, every email address is a specific type of string, but not every string is an email address, you know? It's like inheritance for types, if you will. And the nice thing is that the more specific your types are, the more you narrow down the use cases.
So if your function requires a specific type, instead of a general string type, for example, you have fewer contexts in which the function is used, and as a result, you have to write fewer tests to cover all the use cases of your function. New type was one of the examples that the typing module provides.
Another thing is that I brought today and that I promised in the talk description is dependent types. Python doesn't have dependent types, unfortunately, but it has a light version of it, so we get to that. A type is basically a set of possible values.
So point 1d is a set of all possible one-dimensional coordinates. Point 2d is a set of all coordinate pairs, and we can, you know, we can go on. But we can also generalize over that and say there's a hypothetical point nd which has
n dimensions, but the specific n depends on a value. It could be 8, it could be 3, whatever. And if the exact type of your class or whatever you have is dependent on a value, it's called a dependent type.
So a dependent type is basically a family of types, and you only know the exact type when you know the value that you need to index that family, sort of. This is not real Python. This is made up pseudocode, but in theory this would be a dependent type, and a function
that returns a dependent type is called a dependent function. So if you generate a random point with an arbitrary number of dimensions, and we get a point nd with a specific number of dimensions, then this is a dependent function. So far for the theory, but we can do something similar in Python.
Let's say we have a function that reads a file from a path, and the function requires us to specify the read mode. We can either read the file in the R read mode or in the read binary mode. And the function returns either string or bytes.
We know as programmers that if we open a file in read binary mode, we will always get bytes as a result and never string. But the function signature as it's written here does not account for that fact.
So we're good programmers, we're professionals, so we write unit tests that say if my mode is R, I return a string, if my mode is RB, I return bytes, and so on. But the typing module provides this overload decorator. Overload essentially allows you to specify multiple function signatures.
So in order to fix this ambiguity that we have, we can just define the read file signature multiple times with different types. So we can say when we read a file in read mode, we get a string.
When we read a file in read binary mode, we get bytes. And then we can actually type our function implementation. Actually this is, even though I said Python doesn't support dependent types, this is sort of a dependent function because the return type of a function depends on the exact literal
that we pass for the mode. So overload helps us to make ambiguous function signatures more specific. And more importantly, it makes dependencies between function arguments and the function
outputs more visible. So it could help you make your code more readable, essentially. The third and last example that I brought with me is exhaustiveness checking.
Let's say we have an enum which stores different characters from a play. They're called Alice and Bob because the play is about cryptography or something. And we have a function that draws the characters, I don't know, maybe it's a cartoon. It's a cartoon and we want to draw different characters and the draw function does different
things based on whether the character is Alice or Bob. And it uses a match statement and the match statement has a catchall. So in case we pass anything else than Alice or Bob, we would run into this value error.
But the error is only discovered if we actually execute the code. So only if we ever call the draw function with some other value, we would raise this error. That means if we extend our enum and add another character to the play, we rely on the fact
that we write some test that calls draw character.eve. Otherwise we would never uncover that we forgot to adjust our match statement draw function. Otherwise the program would run fine until we actually needed to do that and we want
to catch errors before we run our program. That's why we write tests. The typing module provides something called assert never. And assert never essentially causes the type checker to report an error when the type
checker thinks that this code is reachable. So when the type checker says you have this match statement but there's one combination of your, I don't know, there's one code path that you have where it's possible that the match statement does not, the argument to the match statement does not contain either
Alice or Bob, then the type checker would report an error. This is actually quite handy because as we said in the beginning, we don't have to actually run the code, we can just run the type checker and it just reads our source code
and tries to find mistakes in there. So remember, assert never causes the type checker to report an error when it thinks that the assert never statement is somehow reachable in your code. These are all my examples. I would like to wrap it up. So we said that new type can make assumptions visible.
New type can even prevent some security issues or at least make them less likely to happen. And there's also overload which uncovers dependencies between function arguments and the function's return type.
And it also, yeah, it also makes the function signatures more specific and remove ambiguity from them. And lastly, assert never can uncover code branches that you as a developer have not accounted for.
I want to be clear that typing in Python is still optional. And I think this is a good thing. If you include type ins and use type checkers, this is not a replacement for tests. I just say you can save some. I'm not saying that you can get away without writing any tests.
But I wanted to just present a couple of things that are included in the standard library that you can use to make your code have less lines. Less lines if you have to maintain. And in some cases even more readable and easier to understand. Yeah.
So, my name is Michael. I work as a consulting software engineer and as a trainer. And you can find me on the internet. And I also learned that there's a feedback form for talks so that you can fill out. So this QR code leads you to the feedback form. You can approach me directly. But if you feel like you want to leave feedback anonymously, please do.
And do that via the feedback form. Thank you. Thank you for your talk, Michael. I also have an Alan Perlis quote in my lightning talk.
So let's wait for that. So I know very little about typing. So I don't even know if my question makes sense. But could we maybe use generics to do something similar to dependence types?
Like a point ND and inside the square brackets I would have the integer that I care about? And that's actually why I added this warning that this is Python pseudo code explicitly. Because when I tried to write it down, I always ended up using something like generics.
The difference is in generics, the argument to the generic class is a type. And then the dependent type, the argument to a dependent type is an actual value. But could you create your own type and then just change class, the vendor class get item?
Or did you play with that? Is it something you can tweak? Sorry, I don't get the questions. Can you rephrase? If I create my own class and... Yeah, because I think you get to change what happens in the square brackets. Maybe it's in the meta class that you can implement a vendor that does some things.
But like I said, maybe I don't even know enough to ask the question. Honestly, I never tried to make dependent types work. There's a possibility to do so, but it would require a lot of work to edit the Python language. And it's a nice thing to have from a theoretical standpoint of typing systems and stuff like that.
But the question is if it's really that useful. So the example that you see over and over again is if you have two arrays of a specified length and you kind of concatenate them together, then you can return a type where
you add up the two integers that make up the array types and you know the exact length of the array as a return value. But yeah, the use of dependent types is probably limited, at least in Python. Thanks.
Hello. Thanks for the talk. I'm going to be selfish and ask a question, because I have run into a problem recently related to typing, and it is regarding subclasses. So I have a class, and I have a function that I defined that takes two elements of that class. What I actually mean is two elements that could be subclasses of this class.
And currently, mypy considers those different types. So it tells me it's an error if I call that function. So my question is how do you do this? So you have a function that takes two arguments, and the two arguments should be subtypes of
a specific subtype. Right. And the subtypes, I don't know what they are, meaning like they can be I'm building a library, and I want to say that these types can be defined by someone else, you know, and I just want to be sure that they inherit from the class that I defined.
This sounds to me like you should look at type var, and the type var, yeah, it's called type var from the typing module. This is maybe what you need. But we can take it out and hash it out. Perfect.
Hi, my boy, and thank you for this talk, it was amazing. My question is about the overload decorator. I don't know if it's possible to use that decorator in that way that, I mean, to create
different constructors in a class, you know, like the overloading constructor in Java, something like that? I believe no. Don't take my word for it. I never tried. But I would say no. Okay. Thank you. Hi, Michael.
So, my question has to do with working with legacy repositories where you run the linter and then everything fails, and you kind of want to have to target just the stuff that you're just starting to type. Do you have any strategies or tips to do that? So you have an existing code base that is untyped and you want to introduce types gradually
in the code base. Yes. There are settings in mypy that allow you to, like, come up with a migration strategy for typing. So you can decide to start typing just one module or just one package.
This is dependent on the actual type checker, I think. So it's not, if it's not the question of whether to type or not, then you just have to start somewhere and start extending the types around your starting point. I think this would be a good strategy to introduce types in the code base. Okay. Thanks. Yeah. The problem is I feel like the configuration starts to really expand.
So then if you're doing it explicitly, like, one step at a time, and then, yeah, I was wondering if you... Ah, okay. Because you have to list every module in the configuration. Mm-hmm, mm-hmm. Maybe you can try to, yeah, no, even if you have a big code base and you set your type
checker to be a bit more lenient. So there are some settings that, for example, mypy allows you to omit optional in keyword arguments and something like that. So this could help.
But if you really don't have any type hints in the code base, I think you don't have any way around that. Okay. And if no one else has a question, alternatives to mypy? Have you tried any other? Honestly, no. There's pywrite and pyre.
But I haven't used them in any production project, no. Okay, thank you. Yeah. So I think I need to ask what everyone else was thinking. Was that your password, and what did that mean? The ZMR error, yeah, because...
Was that my what? If that was really your password, and what that means, because it felt intentional. Ah, okay, I see, yeah. I always include emojis in my password because it makes it hard for other people to type.
Now I have to choose a different password. I wanted to include something, Jack, and forgive me if I pronounce it incorrectly, but I think it's similslinna, which is ice cream. You see it around the city. I got a thumbs up, thank you. It's just ice cream, yeah. So is there any more questions?
Thanks Michael for the awesome talk, and yeah. Thank you.