Writing Python like it's Rust - more robust code with type hints
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 131 | |
Author | ||
Contributors | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/69522 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
EuroPython 20243 / 131
1
10
12
13
16
19
22
33
48
51
54
56
70
71
84
92
93
95
99
107
111
117
123
00:00
CodeDesign of experimentsBlogStudent's t-testSupercomputerBlogOpen sourceProjective planeStudent's t-testUniverse (mathematics)Computer scienceComputer animationLecture/Conference
00:33
Student's t-testSource codeCompilerMetric systemSuccessive over-relaxationType theoryInterior (topology)CompilerContinuous integrationProgramming languageData typeType theoryBoolean algebraIntegrated development environmentCodeRun time (program lifecycle phase)Web browserPresentation of a groupSlide ruleModule (mathematics)Electronic signatureRight angleFunctional (mathematics)String (computer science)Complete metric spaceData dictionaryPatch (Unix)Dynamical systemWritingSoftware bugBitCrash (computing)Computer programmingOnline helpLibrary (computing)Latent heatParameter (computer programming)Bookmark (World Wide Web)CASE <Informatik>Programmer (hardware)Flow separationForm (programming)Fluid staticsError messagePhysical systemSoftware frameworkIntegerInformationBoundary value problemPower (physics)Confidence intervalRevision controlInterface (computing)Configuration spaceFormal language1 (number)HypothesisNoise (electronics)ExpressionSynchronizationAdditionSoftware testingField (computer science)Software maintenanceVariable (mathematics)Social classMultiplication signFeedbackPrimitive (album)Electronic mailing listMathematicsMappingWeb pageSimilarity (geometry)Computer animation
09:02
CodeString (computer science)Run time (program lifecycle phase)Ordinary differential equationDesign of experimentsBit rateEnterprise resource planningPrototypeInformation securityIntegrated development environmentType theoryDatabaseCodeInformationBitComplete metric spaceFunctional (mathematics)Social classError messageDomain nameString (computer science)TupleData dictionaryRandomizationField (computer science)WordSummierbarkeitPhysical systemCode refactoringQuicksortPrototypeSoftware bugNormal (geometry)Information securityAuthorizationRun time (program lifecycle phase)Interactive televisionProduct (business)Radical (chemistry)Loop (music)Fiber bundleFeedbackDevice driverScripting languageTypprüfungConfidence intervalLevel (video gaming)Formal languageHacker (term)Functional programmingSuite (music)Software testing2 (number)LaptopOnline helpLibrary (computing)Dynamical systemRepository (publishing)NumberSynchronizationPersonal identification number (Denmark)Line (geometry)Invariant (mathematics)Computer programmingRight angleIntegerBenchmarkComputer animation
17:28
Device driverOrdinary differential equationAreaPattern languageClient (computing)Design of experimentsString (computer science)State of matterPasswordBuildingType theoryFinite-state machineType theoryMessage passingClient (computing)Invariant (mathematics)Software bugCodeIntrusion detection systemSocial classDevice driverIntegerMultiplication signComplex (psychology)WritingQuicksortRadio-frequency identificationPasswordFunctional (mathematics)Variable (mathematics)Finite-state machineData structureAuthenticationToken ringParameter (computer programming)Block (periodic table)Run time (program lifecycle phase)Right angleFlow separationQR codeSet (mathematics)Connected spaceSoftware testingState of matterError messageBlogElectronic visual display2 (number)BitAttribute grammarRepresentation (politics)Closed setContext awarenessData managementConfidence interval1 (number)Computer programmingDatabaseSlide ruleMachine codePhysical systemCASE <Informatik>Different (Kate Ryan album)RandomizationSelectivity (electronic)Row (database)CuboidComputer animation
25:54
Slide ruleMultiplication signLibrary (computing)Right angleFormal languageLecture/Conference
26:23
Type theoryLibrary (computing)Software developerLevel (video gaming)Machine learningInterpreter (computing)Formal languageCore dumpPhysical systemMultiplication signSound effectCodePrime idealOnline helpRaster graphicsContext awarenessLecture/ConferenceComputer animation
Transcript: English(auto-generated)
00:04
Thanks, so my talk writing Python like it's rust is based on a blog post that I wrote a year ago And it has generated a lot of discussions Maybe some controversies online so I thought that it might be fun to turn it into a conference talk My name is seku Baranek. I'm a computer science PhD student got about two months left
00:22
So wish me luck. I teach at a university I work as a researcher at a supercomputing center And I also like to contribute to open source and I contribute mostly to the rust project where I'm a member of a few teams that work on the compiler performance and rust infrastructure and I have been writing Python for more than 10 years, and I always did it in a very dynamic style
00:45
So I used to write Python without type hints. I use dictionaries everywhere my code was Stringly typed not strongly typed, and I was using monkey page monkey pitching and all these kinds of hers and it was fun But I realized that this code was causing me some very bad symptoms
01:04
Especially for larger programs and these symptoms were that it was very easy to cause bugs I had to debug too many runtime failures and crashes which was very annoying It was hard for me to understand my code in retrospect And it was also difficult to change and refactor it and you might say, you know
01:21
This is a skill issue if I was a better programmer I could do this, but I didn't actually had a lot of confidence in my code and kind of independently on that I started using rust and it quickly became my favorite program programming language and my experience with rust was quite different So yeah, I mean I first I had to compile my code
01:41
Then I had to wait a little bit longer for it to compile Then I had to fight the compiler a little bit But once I did all that my code was usually just working like without having to debug a lot of stuff I had a lot of confidence that my work then my code would be working and doing what it should be doing and I wanted to kind of port this great experience from rust to other languages including Python. So this talk is essentially about
02:04
Why do I write Python in a way that is a little bit similar as I do write rust and how do I do it? And essentially I will be trying to port some notions of static typing and strong type systems to a language That is otherwise very dynamic a small disclaimer everything I show here is just like my opinions
02:22
I'm not trying to tell you how to write Python code I will just be showing you how I write Python code and maybe you can find some inspiration in that Also, this is one of those talks where if you blink you'll miss something. So I apologize for that in advance So step one is using type hints everywhere. I really use them pretty much everywhere
02:42
Type hints are these small annotations that you can add to your Python programs to say that some value is of some specific type They were added in Python. I think 3.5 and there are being extended and improved in every new Python version Where do I use them? Well pretty much everywhere but most importantly in interface boundaries
03:01
So in function signatures, I want to see what types are going into the function And I want to see what type is going out of the function I also use them a lot in data classes in their fields. Obviously, I will be talking about that more later and sometimes I also do use them for variables but also but only quite rarely because if there is some complicated expression I can
03:22
Annotate it to help the type checker or the ID, but if it's something very trivial I think that adding the type in in this case is just additional noise. So I don't really use them a lot for variables What you can actually use for the type hints. Well, you can annotate Primitive types like integers, booleans or your custom classes like a person you can annotate built-in data types
03:45
Like a list of integers a dictionary that maps strings to integers or you can use some complicated things Like this value is either an integer or a string or this value is optional It is either bool or the value none and you can also do some more crazy things Like you can say this value is the literal get or the literal post and nothing else
04:05
So there is a lot of stuff that you can do with this typing module But I won't be actually talking about this module a lot in detail I will be talking about the mindset and the motivation of why we should use type hints in the first place So why type hints I think the previous presentation had about one slide about this I have about 70
04:25
So it will it will be more more extended For me, the most important reason is that types help me understand code, right? So if I see a function like this Just from the signature. I want to be able to find out what is it doing? But here I have no idea Like what is items? What is an item a singular? What is check? Is it a boolean?
04:46
What does the function return so without type hints I have no idea But if we do add type hints, we can see that items is something that can be iterated and it contains items Okay, so what is an item? Well, I can just click on it in my IDE and I will immediately see the definition
05:02
Of what is the type? What is check? Well, it is not a boolean after all in this case it is a function that will be probably called for each item in this iterable and I also see that this function is returning an item again I can see what it is and it is optional. So I need to handle the case where it is missing So types provide documentation for me and it is the kind of documentation that never gets out of sync because it is code after all
05:28
In addition to understanding they also help me remember So this is code from my bachelor thesis that I wrote about eight years ago And when I checked it last week, I had absolutely no idea what is going on Like I really I didn't understand what is going on
05:42
Why because I didn't have any type hints at the time I think they didn't even exist But if I wrote this code now and I added the type hints I would have at least some cursory glance about what is the code doing even if it was after five years Types also helped me write code faster, which might sound a little bit unintuitive But if you take into account, for example an IDE if you don't have type hints
06:05
It can be quite difficult for the IDE to provide you with the correct auto completion But if you do annotate your code with type hints, you will essentially help your IDE help you write code faster And this is also also not just for auto completion But also for navigation like when I encounter a new code or a new function or a function that I wrote a month ago
06:26
I want to be able to click on the types in the signature of the function and go to the definition of its types so that I can understand what is going on and Types also helped me detect when the code gets a little bit too complex Like for example if I was writing this function and I was trying to annotate it like this
06:45
I would probably realize that maybe this code isn't really the best code in the world and I should refactor my function for example into several other functions because when it's very hard to type it can be a symptom of The code being too complex and too hard to understand But on the other hand, we are still in Python, right? So type hints are still optional
07:04
So if I need to have a case where I really want to just stuff anything into a function I just say that this is any then this is perfectly fine. I mean, this is still Python It's it's not rust. I can still say that I don't really care about types in some specific circumstances So that's that's also a nice thing
07:23
Types are also introspectable at runtime and this can be used by several libraries or you can also use it in your own So for example in the fast API library when you annotate a parameter with the data type not only you are documenting What is going on in your code? but the framework itself will use this information to parse this parameter from the URL in a specific way and if it's not an
07:46
integer it will return an error to the HTTP client Notice that everything that I showed so far all the advantages I didn't even mention any form of type checking I was only just saying the advantages from the form of documenting and understanding your code
08:03
but of course if you want to move further you should actually configure some type checker like pyrite or mypy and If you were here at the previous presentation you we saw some additional ones and these are essentially programs that can type check your whole code base without executing it and Show you these nice errors that essentially tell you that you are using doing something wrong in your code
08:25
And you should of course configure type checking in continuous integration and once you do that you'll get a superpower because every type hint will become a mini test and It is a great type of test because it doesn't need any maintenance or refactoring unlike normal tests because it is it is always in sync and
08:45
It also provides you with very low latency feedback What do I mean by that so normally when you have a Python code and you make a change to it? It will look something like this you will run your program or your tests It will run up to the first error then it will crash and you will fix the error what happens then well you do this
09:03
repeatedly Until you fix all the errors right but a lot of those errors at least from my experience will be very silly Trivial type level errors, and this is a very slow feedback loop to fix all of them But with a type checker you can just run the type checker over your whole code base Get all the trivial type errors at once fix all of them at once and then only continue with the actual annoying slow
09:28
Run fix run fix loop and again in my experience the second approach tends to be quite faster than the first one But again, we're in Python So you can combine both you can use whatever you want if you know that you have type errors after some refactoring
09:42
And you don't want to fix your code. You can just run it. It's perfectly fine. It will work in Python I Did this sort of very I wouldn't even call it a benchmark, but just just to demonstrate something I took the fast API Code base And I run its test suite and it takes about 30 seconds on my notebook and type checking this code base takes just a second
10:04
And I think this will be true for very Large number of repositories that's type checking is much faster than running your whole test suite, so it's much lower latency feedback and If you want to move your type checking to another level you can actually also do type checking at the runtime
10:21
So there's for example this bare type library that you can use to annotate your code And if all day then type check your code as it is running for example if you have very dynamic code and like Normal type checker aren't enough you can also try to type check your code when it's running So to kind of sum up this type hint discussion
10:43
I would say that using type hints improves my the confidence I have in my code And there is a really great feeling and it also improves my confidence in other people's code Because when I see that they are using type hints, and we have type checker in the CI I will have much more confidence about their code I will need to study it in such a detail and it also helps me when I refactor and change my code now
11:05
I put this fearless Word into quotes because in rust refactoring is really fearless in Python even with type hints It's not so perfect, but still I feel like type hints. Helped me a lot when I need to change my python code I
11:22
Mentioned a lot of advantages it wouldn't be fair to not mention some disadvantages of type hints, but to be honest I really couldn't think of any like to me personally I don't I couldn't think of any disadvantage of this system It just seems so obvious to me, but it wouldn't be fair to skip this so I looked online
11:40
What do people say about type hints? What don't they like so for some people? It's more characters to type I agree, but also like Whatever like I my productivity is not bottom liked by the amount of characters. I type so I don't really care about this The second concern I saw is that using type hints is useless for throwaway code for a lot of Python scripts
12:05
So I also used to think this but then I was encountering the following situation very often I got some idea I Implemented some prototype for this idea. I thought to myself should I use type hints No, this code will be long gone by next week. It's just a prototype. It doesn't need type hints
12:24
It's fine, and then sometime later. Oh This code that I thought would be just a prototype is actually running in production and it is crashing and I have no idea What is going on not only because of missing type hints, but also because I didn't use them
12:41
So if this sounds familiar to you I would really encourage you to try to use type hints as much as you can For me personally unless I'm literally in the interactive terminal with Python. I use them everywhere Even if if it's just a five-line script One concern that I also saw is that type hints in Python give you some sort of false sense of security and
13:05
I think that this is a valid concern But you really need to remember that type hints are not perfect like the type system of Python is not as I would say bulletproof as of some functional languages or rust or some Other other languages so you still need to remember that even if you type in all your code
13:23
It will not be bug free, but I mean as long as you keep this in mind I think it's still a good idea to use type hints. I think they really helped me to understand my code and I also found this on hacker news and I
13:40
cannot really say that I disagree but a Pig with a lipstick can still be better than just a pig so I would just I would just close it with that Now so then there was step two of what I do how to write Python in a way that resembles a rust a little bit more The second step is actually very simple, but also very useful
14:01
And that is that I try to use data classes again as much as possible When I encounter code that looks like this some function that returns a person and It returns some couple I have exactly zero idea how to work with this written type like what is this string? Is it the first name last name? I don't know. What is the integer? Is it H is the social security number?
14:24
So this doesn't really tell me what is going on and it doesn't help me like the ID won't auto-compete and Tell me what what is going on Then sometimes people try to improve they use dictionaries So now you have like a string for each field So, you know what is going on, but you need to take a look inside the function to actually understand
14:44
What are the names of those fields and they can get out of sync very quickly and again, you don't get any auto completion You don't get any auto navigation in your ID. So what's the solution? Just write a data class just say that this returns a specific type. Its name is person
15:02
This type is a data class of class person. It has some fields They have some types and even though this is a little bit more cold to write It gives me a lot of advantages because again, I have auto completion with my IDE I have navigation if I rename some of those fields not only the IDE can rename them everywhere in my code
15:22
But if it forgets to do that or it doesn't work, it will immediately give me a type check error So and also I have introduced a new name into the domain of my program, right? Like I'm returning a person I'm not returning a dictionary of random strings or a tuple So this was this was a very easy step just I try to use data classes as much as possible and
15:45
The first step is actually the most I would say interesting But also the hardest one to do in Python properly and that is embracing a concept that is called soundness in the in the land of rust So what is this when a code is sound in my definition like don't don't take this
16:03
Very very precisely I claim that it is impossible or at least very difficult to misuse So this might sound a bit weird. So let's unpack this. So what is misuse? This means that if you misuse code you break its invariant You do something that the original author of the code didn't anticipate. It is an unintended usage of this API and
16:27
Typically, it will lead to runtime failures and bugs and all sorts of annoying stuff What is impossible Well in the rust it would mean that your code wouldn't compile you would get a compile error if you try to misuse This code which is sound in Python sadly
16:42
it's not not that good let's say you you will get a type check error and Also, the Python type system is not as advanced as in the rust So there are some things that you just cannot do how to express The impossibility to misuse some API's but there is still some low-hanging fruit and you can try to do
17:00
So in the rest of the talk, I will be showing some code examples of how we can make Python code more sound So let's imagine this very simple function get car ID it returns some car ID from a database We have a similar function for returning a driver ID from a database And then we have a third function that takes a car ID and a driver ID and returns information about some race
17:25
So how we can misuse this code? So imagine for example that we get some car ID from the database Then we get some driver ID from the database and then we get some race I like this is a perfectly seems like a perfectly normal usage of the API. What is wrong in this code?
17:49
Okay, I don't have a lot of time So I but I think a lot of you saw it these IDs are switched Right, like I have by mistake I have passed the drivers ID as the car ID and vice versa and this is the kind of bug that can be very
18:02
annoying to debug because in tests it will just work like we have car ID one driver ID one. Yeah, and It can be very hard to spot because it won't it doesn't even need to cause any runtime failures, right? So what we can do about this Well, we can just separate these types of the different IDs So for example using this typing new type we can say this is a new type called car ID
18:26
Which is backed by an integer and this is another type called driver ID again backed by an integer And those two types now cannot be really combined together or used in the same places then we modify our code so that we return and
18:41
receive the correct types and Then if we would run this function and we would set the parameters The type checker would immediately yell at us and tell us that we have done something wrong Right. So not not only improves this the soundness of our code. I would also claim that in it improves the
19:01
Documentation of our code because now we are telling the user of this API this first parameter is a car ID It's not just any random integer Okay, another example Let's imagine that you have this client It could be like a TCP IP client and it has the following API connect Send some data and close right and the problem with this kind of API that I see very often in Python
19:26
Is that it has some set of invariants and these invariants are described only in documentation, right? So if you do for example send before connect, it's a bug. It's a runtime failure Connect twice. It's a bug
19:41
Close and then sent it's a bug forgetting to call close Might not even lead to a runtime failure, but it's probably again a bug Right. So what we can do about this Well, we can change the API so that these kind of situations are not even possible So we can create a connected client that only has one method sent and then we can build a different API for
20:04
Connecting and closing the client. So for example, it could be done in many ways But for example, we can use a context manager We can create a client Return the client to the user of this API and then whatever happens always close the client And then when we use this API
20:21
we will just do with connect as client sent as many messages as we want and It is now no longer possible to cause some of the previous issues Like I cannot I cannot send before connecting. It's just not possible with this API I cannot For what I cannot do I cannot connect twice again
20:41
It's just not possible with this API and the client will be always closed Like there are still some bugs that I can do I could store this client variable into Another variable outside of the wait block and then use it after his it has been closed This would be prevented in rust hard to do how hard to prevent in Python But I would claim that you need to really go out of your way to do this mistake
21:04
whereas the previous mistakes could be just done, you know as an honest mistake without you realizing that you are doing something wrong and This is a very I would say known idea To make illegal states unrepresentable right like if you have some states in your data structures That should that is illegal that shouldn't be possible to happen
21:24
You should make sure that you write your code in a way that it cannot even be represented Another simple example. Let's say that we are building a request and this request needs some authentication You can either use API tokens or username and passwords and once you authenticate you can build a request again
21:40
What can be the errors you can call build without? authenticating you can call both API token and password or you can call for example API token twice all of these are runtime box and This API allows them so how to do it differently for example we can create separate types for request with tokens and because we request with passwords and
22:02
Then we can modify the builder so that when you configure the token It gives you a different type when you configure the password it gives you a different type so now you can no longer called both of them and you need to You need to actually call the build method on the created type you cannot call it on the builder so you can no longer build your request without authenticating and
22:25
Now the final example which is a bit more complex so I hope I can make it in two minutes Last year I was implementing Some AI exhibit in a museum and people were kind of like waving their hands and selecting stuff on a virtual Display, so they had to put their hand over a virtual button keep it there for three seconds
22:45
And this this selected a button and then when they put their hand away It was unselected so how to implement it how to implement this it isn't record science, right? So I just created a button state that had some attributes like how are is the hand on the button?
23:01
Selected is the button selected and some timer Like how long did the hand stay on the button, but the problem with this simple representation again? Is that it allows me to represent invalid stuff right so if my hand is not on the button, but it is selected That's invalid, but it can be represented or if the button is selected
23:20
But the timer is not yet at three seconds you can represent it with this state, but it is not invalid it is not valid Sorry So what I didn't so and I had some issues with this for example when the hand left the button I was resetting this one attribute But I was always forgetting to reset the other ones and it was just a complete mess so I deleted that whole code
23:41
And I rewrote it as a state machine So I created separated types only for the valid states in the program when the button is inactive There is no hand over it you don't need any data where there is a hand over the button You only need to remember how long it was there and when the button becomes selected again You don't need any additional data, and then I can just create a button state type
24:04
That is the union of those three types right this concept is called some types Or it is a some type and then I just implemented a function that took the previous state and returned the new state And I could explicitly in a manner and enumerate only all the cases that should happen in the code
24:23
I actually had this code on my slides, but I didn't really have time to go it in detail But the idea is that with this representation. I can explicitly enumerate everything that happens So it is very explicit in the code And I have confidence that I didn't forget anything right if you want to examine it in detail you can
24:44
Find in the slides, and you can also find more examples of these soundness Codes in my blog post I will put it on discord and on the conference system So don't worry if you don't Have the time to take the QR code and just to wrap this up
25:01
The idea of soundness is to make it hard to make mistakes in your API To create a sort of a pit of success so that is very easy to do the right thing But we also shouldn't forget that it is not always possible to write code like this especially in Python And it is not always worth it like creating this sound code can lead to an increase in complexity
25:22
So you should always consider the trade-off whether it is worth it or not and To sum up to write code in Python like in rust you should use type hints you should use data classes You should make it hard to misuse code, and then you can perhaps profit So thank you for your attention and that was all
25:51
Thank you. We still have about two minutes for questions So if there's one person who wants to run up here to the hallway to quickly ask a question at the speakers
26:02
Just a very simple question for someone such as yourself who's obviously adept with rust When would you? Go with the rustified Python over just using rust right so I actually had this as my penultimate slide So I actually I use rust for I would say almost anything
26:23
But at the same time like Python is still a great language like it has so many awesome libraries I do a lot of data science and machine learning and That's not really it's not really possible to do everything in the rust these days So when I just want to use rather sorry want to use Python or need to use Python
26:41
I find that this approach really helps me to write better code So that's it but of course if I'm writing something low-level that needs to be performant or a distributed system I will just use a raster. Yes Guilty, I think we have one time for one more question if there's somebody with a quick question to ask
27:01
Sorry good question. Do you notice a performance enhancement using team heights or this approach? When you need to enhance the performance of your code, or it's just the same I'm not sure if I understood the question if do type hints in Help you improve the performance combined with other practices or it's the same
27:25
So currently type hints are not used for For any performance related things in C Python as far as I'm aware. There were some proposals to maybe try to prime the just-in-time interpreter by reading those types, but Yeah, I think some of the core developers said that it wouldn't be worth it. Anyway, so
27:44
Yeah, I was thinking that it could be a nice idea to maybe try to use type hints to speed up Python But I don't think it's really going to happen anytime soon, but there are things like Cython and other approaches, okay Thank you