Subclassing, Composition, Python, and You
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 141 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 4.0 International: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/68734 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
EuroPython 202339 / 141
8
17
22
26
27
31
42
48
52
55
56
59
64
66
67
72
73
77
79
83
86
87
95
99
103
105
113
114
115
118
119
123
129
131
135
139
140
141
00:00
Food energyDirection (geometry)CodeLevel (video gaming)Right angleAttribute grammarUniform resource locatorSocial classSoftware testingDifferent (Kate Ryan album)Module (mathematics)CASE <Informatik>Java appletSoftwareHierarchyCoefficient of determinationGame controllerFormal languageInstance (computer science)Default (computer science)Run time (program lifecycle phase)Functional (mathematics)BitImperative programmingInheritance (object-oriented programming)Library (computing)ReliefSoftware developerDecision theoryMultiplication signObject-oriented programmingDrag (physics)Error messageCodeUnit testingData miningObject (grammar)Point (geometry)Encapsulation (object-oriented programming)Reading (process)Real numberStandard deviationProgrammer (hardware)Thread (computing)EmailPattern languageConnected spaceWritingMathematicsTemplate (C++)Dynamical systemParameter (computer programming)Computer animationLecture/ConferenceEngineering drawingDiagramProgram flowchartXML
06:57
Computer programDataflowArchitectureSoftware testingTime domainEvent horizonService (economics)CodeActive contour modelWeb pageInheritance (object-oriented programming)Data storage deviceVideo gameWeb serviceDataflowSocial classAbstractionAsynchronous Transfer ModeBoolean algebraDatabaseCodeImplementationComputer fileTrailInterface (computing)System callHoaxClassical physicsElectronic mailing listRepository (publishing)Product (business)Cartesian coordinate systemObject (grammar)Software maintenanceTemplate (C++)MereologyBookmark (World Wide Web)Image resolutionInteractive televisionHierarchyDemosceneComputer programmingAbstract data typePattern languageMobile appOrder (biology)Complex (psychology)1 (number)Dimensional analysisRhombusFormal verificationRule of inferenceOptical disc driveOverhead (computing)Object-relational mappingSet (mathematics)BitMultiplication signSound effectControl flowCASE <Informatik>Execution unitLevel (video gaming)Computer animationXMLLecture/Conference
13:46
Product (business)Inheritance (object-oriented programming)Set (mathematics)CASE <Informatik>Type theorySkewnessAbstractionDemosceneSocial classRepository (publishing)HierarchyTraffic reportingLevel (video gaming)Attribute grammarProduct (business)String (computer science)Mechanism designSet (mathematics)CodeDirection (geometry)TrailCoefficient of determinationMereologyRegular graphDynamical systemException handlingFunction (mathematics)Interface (computing)Instance (computer science)Run time (program lifecycle phase)Data storage deviceLogicInheritance (object-oriented programming)Communications protocolElectronic signatureModule (mathematics)Moment (mathematics)NumberComputer configurationOverhead (computing)Multiplication signGoodness of fitFormal languageSoftware repositoryComputer animation
20:50
Product (business)Repository (publishing)Set (mathematics)Strategy gameSatelliteNamespacePauli exclusion principleFood energyField (computer science)Subject indexingGame controllerEndliche ModelltheorieCASE <Informatik>NamespaceComputer fontEncapsulation (object-oriented programming)Total S.A.Control flowSocial classRight angleTheory of everythingSurfaceInstance (computer science)Attribute grammarSubstitute goodElectronic mailing listComplex (psychology)SurgeryError messagePoint (geometry)Software repositoryFluid staticsMultiplication signCodeData structureComputer simulationWordLevel (video gaming)SynchronizationHierarchyMathematicsModule (mathematics)TrailBitRun time (program lifecycle phase)Line (geometry)Communications protocolSoftware testingInterface (computing)Overhead (computing)Object-relational mappingSlide ruleType theoryHypothesisComputer architecturePatch (Unix)Pauli exclusion principleObject (grammar)Default (computer science)Goodness of fitFunktionalgleichungInheritance (object-oriented programming)Pattern languageImplementationStrategy gameSoftwareRepository (publishing)NP-hardFunctional (mathematics)Hand fanLibrary (computing)Field (computer science)RhombusBuffer overflowRule of inferencePower (physics)Core dumpComputer animationXMLLecture/Conference
27:48
Table (information)Meta elementFood energyAxiom of choiceObject (grammar)Price indexCategory of beingData modelReduction of orderSicComputer programComputerLibrary (computing)Client (computing)Computer programmingSoftware maintenanceException handlingError messageAttribute grammarMultiplication signSocial classScripting languageSoftware testingFunctional (mathematics)Object (grammar)Web pagePolymorphism (materials science)Endliche ModelltheorieSystem callCategory of beingComputer programmingSoftware bugNamespaceProjective planeCore dumpRight angleDecision theoryNumberComputer chessStructural loadDependent and independent variablesComputer architecturePerspective (visual)LogicSoftware engineeringInformationOpen sourceInheritance (object-oriented programming)Electric generatorKey (cryptography)Repository (publishing)Control flowJava appletCodeEncapsulation (object-oriented programming)Term (mathematics)Loop (music)Regular graphStatement (computer science)WritingHierarchySoftwareObject-relational mappingGame controllerTouch typingAddress spaceComputer animationXML
34:47
Type theoryInheritance (object-oriented programming)Formal languageBitMechanism designMeta elementGame controllerObject-relational mappingSocial classRule of inferenceComputer animationXMLUML
35:30
Exception handlingError messageResultantOntologySocial classPosition operatorMeta elementHierarchyCodeException handlingThumbnailDefault (computer science)MultiplicationConstraint (mathematics)Projective planeRhombusShared memoryXMLUMLComputer animation
36:34
RectangleCodeSubstitute goodClassical physicsMereologyLatent heatDesign by contractSquare numberObject-oriented programmingElectronic mailing listXMLUML
37:30
EmailAddress spaceString (computer science)Meta elementTelephone number mappingSocial classParameter (computer programming)CodeDatabaseTemplate (C++)Address spaceMereologyPoint (geometry)HorizonSocial classLink (knot theory)Multiplication signPasswordRight angleSpeech synthesisHierarchyEinbettung <Mathematik>Data structureDirection (geometry)Decision theoryPattern languageFlagShape (magazine)Clique-widthAbstractionLine (geometry)Domain nameSource codeHypermediaException handlingHand fanObject (grammar)WritingEmailSet (mathematics)Musical ensembleStrategy gameWebsiteHash functionType theoryDrag (physics)GodTable (information)Rule of inferenceCASE <Informatik>Field (computer science)Shared memorySubsetBitEndliche ModelltheorieDesign by contractFluid staticsElectronic mailing listFocus (optics)WordBlogQR codeComplexity classDot productComplex (psychology)Error messageComputer animationLecture/ConferenceXML
44:30
Coma BerenicesRoundness (object)Scheduling (computing)Lecture/ConferenceComputer animation
Transcript: English(auto-generated)
00:04
Levels, levels. Is it loud enough? All right.
00:20
I'm Hynek. I really care about classes. I care about them so much that I wrote a PyPI package to make writing classes easier. You might have heard of it or not. It's called errors. If you haven't heard of it, it's fine. But I also helped creating a module that is based on errors, which is called data classes,
00:44
which you probably have heard of. And thanks to errors and data classes, writing classes has become boring, which is good because writing more classes is not a drag anymore and it doesn't affect your design decisions anymore. Like before, sometimes we were like, hmm, this should be a new class.
01:02
But it's too much hassle to implement all the dunder methods. So let's just add another attribute or method to the old class and make it work. And this leaves you more time for the much more interesting topic of relationships between classes. And that's really what subclassing and composition are.
01:21
It's two ways in which two classes can relate to each other. Like a dog is an animal and a dog has a collar. Now, asking two programmers which is better is like asking two Spaniards if it's Sevilla or Sevilla. It can get very loud. I've witnessed it last night. So the more controversial one is, of course, subclassing.
01:43
So a quick primer. We say sub is a subclass of base class base and subclass sub consists of everything from class base and potentially base's own base classes, which we do not know about when we look at sub. This relationship between two classes is called an is-a relationship.
02:04
So you can say sub is a base, which makes no really sense, but you can say like a dog is an animal. Now, base classes are also known as super classes, but that sounds ambiguously close to sub classes, especially with a weird accent like mine, so I'm going to stick to base classes. Now, in Python, there is no protection for base classes
02:25
from the subclasses whatsoever. This is by design. When you subclass a base class, you have access to any attribute on the base class and to any method on the base class, both reading and writing. So sub can do anything it wants to base, and again, this is by design.
02:42
There is no encapsulation between sub and base. So far, so obvious, but it goes both ways. Like the self argument is the exact same object for both methods here. So if you don't control the base class, and the base class decides to use an attribute with the same name as you are using, well, lol.
03:04
And if you do control the base class, you have to always keep that in mind, that there's this bidirectional relationship, because you can break either by changing the other. And in Python, you can kind of prevent this by using the dumbbell underscore prefix,
03:20
but nobody really uses it anymore. We have somehow decided to use the consenting adults principle, which to my taste is a little bit lewd for an engineering principle, but more importantly, you can't consent to something you don't know about. This is like asking someone to promise to not get mad before telling them what it is about. This is not how consent works with anything.
03:45
What all this means is that the base class has no control over sub, sub has no control over base, but both can modify each other at run time by trying to modify themselves. So we say that base classes and sub classes have a bidirectional relationship,
04:00
and it sounds harder to reason about, and it actually is, very much so. Admittedly, there are situations where bidirectional relationships can be really useful, it can make code shorter and everything, but I'm going to argue that it shouldn't be the default in your design decisions. Now, on the other hand, composition, if a class A wants to use data or code from class B,
04:21
it has to keep an instance of B as an attribute. So this is a has-a relationship, a dog has a collar. If class A wants to access something on class B, it has to say so explicitly. And I hope it's uncontroversial if I say that reading this is much clearer than the subclass example,
04:43
because the relationship of the classes, A and B, is expressed in code. The location of X is also expressed in code. You can see it, it's right there. Now, the downside, of course, is that if you instantiate an A, you'll also have to instantiate a B.
05:02
And whenever you access attributes on B, it's two attribute lookups instead of one. But it also means that you can easily replace B with a different class, for example, when you're testing or when you're in development and you don't want to actually send out the emails to your example customers.
05:21
And often, you can also share B among the classes. For example, a thread-safe connection pool. You just need one, and you can pass it in all the other classes. So this leads us to the question, why do we subclass in the first place? Like, there's a few mechanical reasons that I will briefly touch on, but in Python, we mainly subclass to share code.
05:42
And there are various popular techniques and patterns, like both in the standard library and beyond, like template subclassing, dynamic dispatch, and they're so widespread that they feel indispensable at this point. But are they indispensable for writing real-world software nowadays?
06:02
It sure felt like that for a very long time, and it's easy to forget today how incredibly dominant Java used to be for everything. A language doesn't even have functions. The best they can do are static methods. And you can see that all over the Python standard library, all the modules that were born when Java was raining,
06:23
most notably the unit test module or the logging module, which both even inherited Java's camel case names, which we usually don't do anymore. But after decades of object-oriented programming being a synonym for class hierarchies, something changed. We've got two extremely popular imperative programming languages
06:43
that eschew subclassing completely. So clearly the answer is no. You do not need subclassing to write great software. Now, Rust and Go prove it, which for me personally is a great relief because I was always low-key suspicious of subclassing because it always felt like it's kind of eluding me a little bit,
07:01
like there's some complexity that I don't completely grok. And I always thought maybe it will come with time, with experience. Maybe if I do it often enough, it won't feel as weird. But it looks like much smarter people than I decided otherwise. So what complexity do I mean?
07:20
So my main problem with subclassing is that it adds another whole dimension when you're trying to reason about program flow, which I like to call the goose chase program flow, trademark pending. Giving you an example of this is a talk by itself, but I'm sure everybody of you have been there when you try to trace across class hierarchies what method is actually being called.
07:42
Here's an example, like, quick, which one is called? Honestly, I don't know. I keep forgetting whether the MRO, the method resolution order goes left to right or right to left. And this is easy mode because you see all the classes all at once and it's not like long listings that are distributed across multiple files.
08:02
So what about now? This is a fun mystery. But this is still easy mode because it's all there. You can see it all. But you have to know the rules to make sense out of it. This code is not self-explanatory. It is mental overhead that you have to employ when you are reading this code.
08:20
And in practice, this is not what you see. You see this. And C can be a class. But C can also be a huge diamond hierarchy hidden somewhere. Like, you don't know. So jumping up and down hierarchies is such a common pattern. In Python, it has a name, and it's called templates of classing,
08:40
and I absolutely hate it. And I'll use a listing from my favorite Python book to show you a more realistic example and how to get rid of it. So despite everything that follows, if you're interested in building maintainable applications around databases and web service and whatever, get this book. You can read it for free online, too.
09:00
It is a great gift from Harry and Bob for the Python community. I mentioned almost all my talks, and I still mean it. Like, I wish it existed 10 years ago. Like, my life would have been so much better. And before you started crucifying me for talking shit about other people's work, I was a technical reviewer for this book, and I actually did bring it up.
09:21
Quite vehemently, in fact. But my threats of plush violence went unheeded, so you can see that it only added as an exercise to the reader. So let's go together and see if we can do better. And what we do is we will implement a special kind of repository.
09:40
A repository, in a nutshell, is an abstraction over storage. So instead of writing SQL or using the ORM directly in your business code, you have a class that offers methods to get, modify, and list objects to and from a database or whatever storage you're using, usually using vanilla classes,
10:01
which are classes that don't even know that they are coming from a database. They are very lightweight. And this is cool because it isolates all database interactions into distinct classes, which, of course, makes even more sense with AsyncIO. And as a side effect, it makes your app a lot more testable. But we don't implement a regular repository.
10:21
We are going to implement a tracking repository that additionally keeps track of the objects it has seen while getting or setting in a set. Why? Don't matter. It's an example in a book. If you want to know, read the book. I recommend it. Now, the interesting part we want is that we want to take a regular repository
10:40
and add tracking to it. So let's see what the book is suggesting. Like, this is classic template subclassing. It starts with an abstract class, which is a class that cannot be instantiated by itself. You must subclass it and fill in missing parts. In this case, it's an underscore add that takes a product and stores it and underscore get that returns a product
11:01
based on a stockkeeping unit, also known as SQ. Now, a class that does that, that subclasses and implements those methods, is called a concrete implementation of an abstract class. And this so far is not bad at all. ABCs were Python's original take
11:21
on abstract data types and interfaces, and I know about Zop interface. You don't have to tell me. Now, the interesting thing that makes it template subclassing is that we additionally share methods with the concrete subclasses, methods that use the missing ones.
11:40
So in this case, we've got add product, which is a public method, as you can see, and it calls self underscore add and adds the product to our scene set. And then there's get by SQ, and it tries to get it using underscore get, and if it finds it, it also adds it to the scene set.
12:01
That's it. Now, one thing to notice is that the lower abstraction level, like add and get, live in a subclass. So depending how you look at the hierarchy, it is higher in the class hierarchy, which is weird. Now, let's look at the book's concrete implementation of this,
12:20
which uses sets for storage. Now, since the data storage is transient, we will call it a fake tracking repository, and we subclass the abstract tracking repository. While initializing, we have to call super dunder init, because the abstract base class used its own dunder init method
12:41
to initialize their own storage. And then we make the class concrete by adding the missing method. This shouldn't be very surprising. I mean, it's odd that they chose to implement it on a set, which I would not, I would use a dictionary, but whatever. And it's also weird to implement private methods in a concrete class.
13:03
If you look at a concrete class, you are not looking at the API that you're going to use later. Now, yeah, you would instantiate this and call the public methods from the base class. And this is my first annoyance, like this.
13:21
It's weird. The second annoyance, even bigger, is that we have to remember calling super dunder init, which introduces bidirectional flow between those two classes. And there's no way to express this requirement in code. You have to use documentation, but people don't read documentation.
13:41
And if they read it, they forget about it. Documentation like this cannot be verified. Documentation grows stale. Now, one reason why it gets so weird and ugly in this case is because we use kind of specialization, but it goes the wrong way. It's the lower level of code that we want to be able to replace.
14:00
And so this abstract tracking repository is actually a regular repository plus tracking. And you want to be able to replace the regular repository part on the right side. That is like implementing an animal class by subclassing a dog. Maybe you can make it work, but it's going to be awkward, like in this case.
14:21
So if you'd be adding higher level features to a lower level class, that would be a regular specialization, and I will talk about it later because there is a value in that. But as it stands with this, with this direction of our dependencies, we have to use other tools. So how can we make this clearer? I've already hinted at it.
14:41
We need to invert the dependency, and it's super easy if we switch to composition. So what does this abstract repository actually want to do? It wants to add and get products from another repository, and keep track of those products. So what varies here? It's always the big question. What varies here? And it's a storage, right?
15:02
So this logic lives in a lower level repository, as we've seen. So let's start inverting from the bottom and extract the part that varies into an interface. And I could use an abstract base class, like in a book, and it would look like this. This is the low level abstract part. It's extracted into a separate interface,
15:23
except that the methods now are the public API for this concrete class. So there are no underscores anymore. Now if someone wanted to implement this one, they would just usually subclass it and implement add and get. And this is called nominal subtyping
15:40
because you have to tell Python, hey, this class is a subtype of abstract repository, okay? Now, subclassing this uppercase ABC is optional. It's popular because it's shorter, but you could also use a metaclass for it, which I personally prefer because it's more explicit. And one interesting tidbit about using ABCs and actually type hints
16:04
is that a class that subclasses an abstract class and does not implement all the methods that are required cannot be instantiated. It will explode at runtime. However, the type hints are ignored. They don't matter at all at runtime. On the other hand, if you are using types and type checking,
16:21
like, mypy will catch both. So that makes the runtime overhead a little bit unnecessary that you have for checking those interfaces, but it's the original pre-type hints behavior, and it allows you to define interfaces without using type hints, which I understand are still a little bit controversial in the community.
16:41
Now, without going into type theory, being a subtype and being a subclass does not have to be the same thing, which, as you will see next, because in the brand-new Python 3.8, we've got something that is better to define interfaces, and I'm going to use it now.
17:00
It's called a protocol, and a protocol lives in a typing module, so the typing hints are not optional, and you define an interface like this. And through the type checker, any class that has an add method and a get method with exactly these signatures is considered a subtype of repository.
17:22
And if you decorate the class with runtime checkable, it even works with its instance checks. Again, you do not subclass this. You just implement the interface. Python will know this is a subtype. And as long as you don't use its instance checks, there's also absolutely no runtime override from this, because it's only used statically when type checking.
17:42
Now, this is called structural subtyping, and it's been popularized by Go, and in my opinion, this is a much more natural way to deal with types in dynamic languages like Python. And my favorite feature is that it allows me to define interfaces that the classes that implement them don't even know about. I can define a little sliver of an API that I need from a third-party class,
18:05
and then the type checker tells me whether I'm using it correctly, like if the class actually implements it, and whether I'm using only the parts that I've been promised without running the code. So I would say this is a straightforward interface. So let's implement it using a set like the example in the book.
18:23
And we are off to a good start, because there's no subclassing in the class definition. We just initialize our internal storage. There is no super that we could forget. And now we add the methods. We add an add. It looks exactly like before. Does it fulfill our API protocol?
18:42
Takes a product, returns none, check. Let's get next. It also looks exactly the same. Takes a string, returns a product or none, check too. Now, take a moment to take in the beauty and elegance of this, because this class is not coupled at all to any other class.
19:02
All it is coupled to is a set, basically. Right? It has a single purpose, and it is adding and retrieving data from a set. Nothing else. With the abstract base class approach, it would look almost the same. It would look like this, except that now you subclass abstract repository,
19:20
which has the legitimate upside of being explicit about the intent of implementing an abstract repository. That is something to be weighted. Like, this has an upside. Alternatively, you can also use abstract base classes without any subclassing at all, because every abstract base class has a method called register. This is still nominal subtyping, because you still tell Python
19:42
that this repository is an abstract repository, but you don't have to subclass. And since this register method takes a class, you can also use it as a class decorator. So that works too. So it's, again, still nominal subtyping, but it shows that you can use abstract base classes
20:01
completely without any subclassing whatsoever. So even for these mechanics, you don't need it. Now we have a fully functional repository. Let's make it tracking. So this is our tracking repository. It has no base class, which is good. It has two attributes. Our repository, that's private, because it's none of the business of the user,
20:22
and a set called scene as before. Now, both addProduct and getBySkew look almost the same as before, like in a book, except now we can see that add and get live on the repo attribute. Instead of being hidden somewhere in a class hierarchy, you can see right away where they are coming from.
20:42
And I find even in this trivial example, it is radically easier to understand on one glance what is going on and what the relationships are. I mean, this is like the smallest font I've ever used on a slide, and I've removed the type hints to make it more succinct. But I hope you can see that, yes, it is a little bit more verbose. It's more lines of code.
21:00
But we have very clear dependencies here, right? Those two classes are independent from each other and can be tested as such. And each initializes only itself. There is no runtime overhead from using the interface, because we use a protocol, and the static type checker will ensure that we use underscore repo correctly. Like, we don't need to run the code to verify that.
21:24
And this is like my key point. The structure of the code is expressed in a code, not in some comments or whatever. Or it doesn't have to be deduced by looking at hierarchies and trying to remember the MRO. I mean, look how great the control flow is. It's just one way, right? Instead of jumping, like, hands and forth through class hierarchies.
21:43
And there's even a fancy word for this, and it's a strategy pattern. The repository is a pluggable strategy. We inject the behavior we want to customize, which makes it easy and safe to replace it by a different implementation, if you are switching the storage or if you want to write tests. It only has to fulfill the API protocol that we have defined.
22:02
And this alleviates many reasons for monkey patching and testing. And to bring in a hot take, because this talk is not controversial enough yet, you shouldn't need monkey patching, neither of modules nor of classes, to be able to test the vast majority of your code. Only the lowest level of low-level code
22:21
at the very fringes of your architecture should need monkey patching to simulate errors or something like that. Everything else should be possible to parameterize within composition. And what this means, in other words, is that testability is a direct function of your architecture, of your software, right? There's no magic feature or package
22:41
that will make it easy to effectively test a tangled mess or, like, tightly coupled software. It's always going to be hard. There's nothing that helps you with that. Learning to write testable software is learning to design software that is good for testing, which is, in my opinion, a good design by itself.
23:01
It is not about learning cool testing tools. Like, there are really cool testing tools, like PyTest or Hypothesis, but it starts with the architecture, not with the tools or with the tests. Now, this small aside, we get our first takeaway. Subclassing is super powerful. But it requires knowledge and self-discipline from you. Instead of explicitly reading the relationships
23:23
between an object within your code, you have to interpret the hierarchy through a lens of rules, rules like the MRO or how super works. Like, one of the most upvoted questionnaires, like overflow, is like, what is even super for? Then there's, like, tons of rules to follow
23:42
to not end up with a chaotic mess, like the avoidance of diamond hierarchies, Liskov substitution principle, the open-close principle. There's a lot to keep in mind. And you have to have it on your mind. On the other hand, composition mechanically forces discipline on you, sometimes to the point of clunkiness, but it leaves less room for errors by you.
24:03
And it expresses all relationships in explicit, straightforward code, which means that sometimes you have to type a little more. It means that it gets awkward to have bidirectional relationships, but it's a feature because those relationships are not free in a sense of complexity.
24:23
They have a cost, and they should be used thoughtfully, and they shouldn't happen by accident. Now, these problems around confusing control flows is a special case of a larger problem, which is like the muddling of namespaces. And namespaces are great. There's even a pep for that. Well, it's not a pep for that.
24:41
It is in a pep. It's in pep 20, where Uncle Timmy bestowed to us namespaces are one great honking idea. Let's do more of those. And subclassing is literally the opposite of that because you take two or more classes with perfectly good namespaces each, and you unite them into one, breaking encapsulation between them completely.
25:00
As I've said before, there is no protection whatsoever. So what you get is namespace pollution. You lose the control over your classes' API surface, which is a huge deal for public APIs, whether inside your company or on PyPI, because whenever the base class changes, which happens even in the standard library all the time, even in ways that technically do not change the public API,
25:22
it will affect your subclass in ways that you cannot predict because you cannot tell the future, like if it adds or removes attributes or methods. So you will feel it and so will your users, especially if you get, like, name clashes. So this doesn't affect you only with public APIs, though.
25:40
So I want to show you why I've chosen the class decorator approach for adders instead of subclassing and why it's also been used or taken up by data classes. So a quick refresher first. A class decorator works like a function decorator. It's a callable that gets a class and returns a class.
26:00
Then you apply it to a class, and then the class has that attribute. But it's really just syntactic sugar for this. And a class decorator is like precise surgery because the decorated class only gets what you explicitly put there, which are usually fewer things than if you get everything by default. So let's compare and look at an instance API surface
26:22
of a simple Django model, which is straight from the tutorial, that uses a subclassing-based API. And I'm specifically using Django as an example here because it's huge and it's less likely to step on anyone's toes if everybody wrote it, nobody wrote it. Of a data class that carries the same data, right?
26:49
It has those two fields and it uses a class decorator. Now, data class first. It's exactly two attributes from data classes and the rest is your fields,
27:00
which gives you a grand total of 32 attributes, only two extra, which is the data class's fault. The rest is from Python. Now let's look at the Django ORM model. Yeah, it's 91. So you get 59 extra attributes that you can't do anything about.
27:20
And try finding something in there. Like, this is new Django, by the way. This is a problem that all ORMs have. There's a meta class somewhere because they have to keep track of changes and synchronization and everything like that. And this is why I'm not a fan of these kind of heavy objects inside of my business code. Like, I'm not saying you are not supposed to use ORMs at all.
27:41
Like, I don't, but that's not my point. My point is that this is bad to have in business code. And that's the reason why I personally love the SQLAlchemy's core API together with repositories and vanilla address classes or data classes if I have to. This gives me fine-grained control.
28:00
It gives me lightweight objects that are quick and light. It gives me clear I O boundaries. Like, if you access the wrong attribute on an ORM, you might get a network call from it. It can't happen if you just return a vanilla data class. And it allows me to write business code in terms of regular classes, which I find much more pleasing.
28:21
But it is more typing. Again, more typing. But again, I don't get accidental I O by accessing the wrong attribute. Anyhow, if you find yourself in a situation like this with your own code and you want to improve the situation but you don't want to, like, restructure your whole project, one way is to just encapsulate everything private so everything starts with an underscore
28:41
into a private namespace attribute that lives in a base class. And this is what errors and data classes do, too. Like, data classes, you've seen those two attributes. Those are pretty complex things hidden behind that. But they don't pollute the whole class like this.
29:00
So, of course, you can say a Django model is too performance-sensitive to instantiate an extra class with all its private members every single time. Or it is more convenient to have everything on self. But you have to acknowledge that it makes other things harder, like testing, finding, understanding. And that's a discussion to be had. Not whether a subclassing is morally reprehensible
29:20
or what the best practice is. So this leads to the one big takeaway here. Like, it's a tradeoff, of course. On the one side, we have readability. Composition is explicit. Subclassing gives you implicit behaviors. You get less information from just looking at a class definition, like how deep and wide is the hierarchy. Where is this attribute coming from?
29:40
Or looking at a method called attribute lookups. You have no idea. You can't tell. Then there's control. Subclassing breaks encapsulation. So many consequences, some of which I've mentioned before. Now, on the other side, having everything on self is convenient. Right? Until the number of everything gets really high,
30:00
and then you can't find anything in the everything anymore. Sure. It's still a tradeoff. Also, subclassing gives you brevity, which sometimes can make the code more readable. It's like you can see it all on one page. That can be good sometimes. And then, of course, there's performance. Typically, with composition, you instantiate more objects.
30:21
You have more attribute lookups, which, in a real hot loop, can be a problem. Usually it isn't, but can be. So, yeah, but this is not new. Readability versus performance is as old as programming. If statements are faster than polymorphism, tangled, optimized code is faster than clean layers. Function calls slow code down, especially in Python.
30:42
But we still write functions to have a more understandable code. And this is what software engineering is. Weighing tradeoffs against each other. But for myself, readability and thus maintainability weighs the most. And I don't think that I'm alone with this. Software engineering is programming integrated over time.
31:03
Which means that you will keep coming back to the code you wrote to fix bugs, to add features. And readability is a core requirement for long-term maintainability. And that makes maintainability an extremely important property for everything except, like, one-off scripts. But there is no such thing as a one-off script.
31:23
Like, they all become load-bearing eventually. Of course, the importance of maintainability only grows over time. The longer you didn't look at the code, the harder it is for you to understand the code. And especially in a brave new world of generative AIs, the value of writing code will go down.
31:40
There's no question about that. People are hitting the tab key a lot nowadays. While the value of understanding your code will go up, AI will make writing code a lot easier. It will probably get better while they're serving us. But AI is not going to debug your spaghetti code in the foreseeable future.
32:01
If anything, you'll need clearer architectures so you can judge their very confident suggestions better for correctness. And, of course, you can legitimately disagree by the way I weigh my trade-offs. I'll be the first to admit that my perspective is very skewed by the fact that I work for a small company where I have a lot of responsibility and sometimes don't touch projects for years.
32:21
And that I maintain too many open source projects that I also don't have time to look at the whole time, right? But you have to always acknowledge the consequences of your decision. So my goal for today is not to stop you from ever subclassing again.
32:41
My suggestion, based on my experience, is to err on the side of composition. But I'm not your dad. And I probably won't have to read your code. So my goal for today is making you more cognizant of this trade-off and draw your own conclusions, whatever they are. But you have to be able to argue for them. And my ultimate hope is that we'll get more nuanced discussions in the future.
33:03
And just to be very, very clear, functions that call other functions with some arguments and return a value and then look at that value and call other functions are absolutely the best in regards to clarity. It is like going from 3D chess to checkers. And I've noticed that over time the number of classes that I've read went up
33:20
thanks to adders and data classes. But the number of methods I'm writing went down. And I increasingly resort to functions, especially when more than one class is involved, which helps me to break dependencies between classes. And that's usually very beneficiary if you want to test them. So I'm going to give you an example. Like, imagine you have two classes, A and B.
33:42
Now, you need a method that calls methods from both. Do you put the method on A or do you put the method on B? This, by the way, shows that even with composition you can end up with a tangled mess. I've already given it away. It's neither. This isn't Java. Let's write a function.
34:00
Not only does this decouple A and B, it also allows for easy testing because you're literally passing A and B into it. So you can pass it a fake or a mock or whatever you want. So nowadays for me, business logic is almost always a function that's coordinating a bunch of classes. And, of course, if you really need a class here, which happens,
34:22
sometimes you need to pass things around, you just write a third one and use the magic of composition. It is not as simple as a function, but it has the same upsides. A and B are decoupled. And making this an easy decision is why I've wrote errors and helped making data classes in the first place.
34:40
Otherwise, if it's too much work, you would have just chosen A or B. Now, having said all this, let's tackle the elephant in the room. Why anyway? We've enumerated many reasons for the famous composition over inheritance principle. And we've seen you don't need subclassing. However, Python is not Go or Rust.
35:00
And assuming we at EuroPython want to write Python that looks like Python, and we don't want to fight the language at every corner, so you're going to have to subclass sometimes, even when you are not chasing the last 1% of performance or are too lazy to type a little bit more. Now, if you follow some rules, it's going to be okay, though.
35:21
So one reason is, of course, mechanics. You need more control over your classes? Well, you probably need a meta class, like with the ORM that I showed you. So there you go. It's going to be hard to use meta classes without subclassing. It is what it is. If you need that, I personally do not use meta classes at all. Like in none of my projects, I use a meta class.
35:42
So you can live without it. Another example, of course, are exceptions, where even the position in the hierarchy is actually an important and useful feature. And it's the only Pythonic error handling we have. Neither Rust nor Go have exceptions, and some have tried to carry over their error handling to Python, but the results are unlikely to go mainstream from what I've seen.
36:02
And it's even okay with multiple inheritance, because if it makes sense for ontological reasons, why not? As long as you start building diamonds, it's fine. There's usually little to no code or data sharing, and yeah, there's very few headaches by default. And now with multi-exceptions, we have some really great tools.
36:24
So thumbs up. Don't be afraid. Now let's go to the one thing that gets closest to what one understands under subclassing, and that's okay under Python's constraints, and it's specialization. And I'm going to channel a wistful Sandy Metz here, who, despite being a Rubyist, is one of the best object-oriented programming teachers I know.
36:45
The OG subclassing in Smalltalk back in the day was only about specialization, and then we lost our way and tried to do other things with it. Specialization is about taking a general class and make it more specific. And the more specific part can be sometimes a little bit unintuitive,
37:03
because it means that you have to take the whole thing with all its contracts and add more contracts that don't contradict the original one. This is the list of substitution principle. And the classic example, of course, is what is more specific for programming, a square or a rectangle? Look it up. The answer will surprise you.
37:23
So, but this kind of subclassing can be worthwhile in Python, and I would like to demonstrate to you using a simple example that is actually from one of my code bases that made me think about this more. We want to model email addresses, and there's two types. There's a mailbox that has a database ID, the address, and a password hash,
37:41
and a forwarder that shares ID and address with the mailbox, but it has a list of forwarding targets, too, instead of a password hash, because it's a forwarder. Now, these examples are not supposed to be executable code. I'm alluding all the data class or address decorators, so don't be surprised. And this is a very simple case, and you should absolutely leave this as it is,
38:04
because it's not worth deduplicating anything. You can even create a type that covers both of them, and the type checker will know that an email address only has the overlap of the two classes, which is an ID and an address.
38:26
This is true for more complex classes, too, though, because if you start deduplicating too early, you may end up with bad abstractions. You don't want that. You want to see the data first.
38:42
You write it all out, see it, feel it, get a feeling for the shape of the data, and only then, when you're done, when you see it all in front of you, then start trying to save some lines of code. This is especially important when you are thinking about data which source you do not control, like database tables.
39:03
Maybe they are coming from somewhere else. And the first instinct is to try to replicate the data into your classes, but it doesn't always make sense. So first, write it out, look at it, feel it, because if you do it too fast, you end up with abstractions that are very hard to get rid of from a code base.
39:22
Once they are there, it's a problem, and copying a little bit of code, hence and forth, is not that big of a deal. So don't commit haphazardly, start verbose. But for the sake of science, we are going to try to unite them anyway. One way is by adding a type field, and this used to be a pretty popular solution,
39:42
like we have an email address, it has a type, which is either mailbox or forwarder, there's the shared thing, and then we add both. And they are either none or they have a value. Now, this is a very bad solution, and you can tell that because it uses comments to explain contracts of those fields. And static typing also will get very obnoxious
40:02
because all the fields can be none, so the type checker will yell at you. Again, these designs used to be really popular, especially in the times when writing more classes was a drag. We just added more and more to the one class we had, ending up with so-called god objects. All right.
40:21
Now, thank god those times are passed, so let's go and do the composition thing. So we define a class with the basics that are shared, and then we add it to the other two classes. Now, the naming is awkward, right? And that's a signal that something is off. Usually when you cannot give a thing a name,
40:42
it means that you did not fully understood the thing. It's like a general rule. Also, like a mailbox has an email address, this is like super weird, but it works. Its composition must be good, valid solution. But still, it's clunky, right?
41:03
So mailbox really is an email address, right? So let's do the dirty thing. We still define an email address with basic data again, but now we subclass. So this looks a lot more idiomatic to me.
41:20
As a subclassing hater, this is much better for my eyes. It is as little as the word means Pythonic. But the trade-offs are still valid. Python enforces no rules between them, so you have to know them, and you have to follow them, so you don't end up with a mess. But as long as the methods of email address
41:41
only work on email address, and the subclasses only mind their own business, and as long as you don't go to town with the depth or even the width of the hierarchy, and the relationships only go in one direction, you've got a very efficient and readable solution for storing data whose structure actually is hierarchical.
42:01
And all this brings us back to the cross-eyed rodents, because now everybody who knows Go wanted to tell me how unfair I am can relax, because Go has no subclassing, but it has embedding, which allows for this and this only. The syntax is that you simply make it part of the structure, and you don't give it a name.
42:20
Now, you have to be explicit when you're constructing your objects, but then you just can access them like they are part of the original structure. Like, look ma, only one dot. But there's no super. There are no hierarchies. There's no going sideways. Just limit subclassing to the part that is not terrible, the part that I just showed to you.
42:42
You take a class and add something to it. You specialize it. It's more of a syntactic sugar to save some typing than anything else. And I'm not a big fan of all the Go decisions, but this is a good pragmatic one. And to me, this has become kind of an epiphany and a heuristic how I think about subclassing in Python.
43:01
Like, can I do it in Go? Then I will do it. If not, no. So what have you learned today? Despite everything, remember you write Python. Idiomatic Python has a certain shape. There's no shame in that. Using some types for error handling so you don't have to subclass exception is not going to improve your code.
43:22
Focus on the shape of your data before you de-duplicate your data. Write it all out, be verbose so you can really see it and feel it. And finally, it's all a tradeoff. There's no right or wrong. Only this aspect is more important to me than this one. For me, subclassing is like an emergency hatch to bend third-party classes to my will
43:42
and a nicer way to model hierarchical data. But your decision can be different, and that's fine. But as with absolutely everything, you have to do the work to make these decisions. If all you know is template subclassing and not the strategy pattern, you're going to use template subclassing and think that you're being pragmatic
44:00
and those composition hipsters are not. At that point, you are not waiting tradeoffs. Your horizon is too narrow. But I hope I've set you on the right path today. And speaking of today, that's all I have for you today. If you want to go deeper, this QR code and this link will take you to my blog post about the same topic. I still had to leave out a lot of things, although I'm almost out of time.
44:21
So follow me on the bird and elephant sites. Get your domains from viral media. I should speak German. Sam Hinek. Thank you for the great talk. I couldn't agree more, frankly.
44:41
Unfortunately, our schedule doesn't allow for questions, but I'm sure that Hinek will be around at the venue and you can ask him there. Come and fight me. Come and fight him, yeah, exactly. Please give another round of applause to Hinek.