PEP 557* versus the world
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 132 | |
Author | ||
Contributors | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/44950 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
EuroPython 201855 / 132
2
3
7
8
10
14
15
19
22
27
29
30
31
34
35
41
44
54
55
56
58
59
61
66
74
77
78
80
81
85
87
91
93
96
98
103
104
105
109
110
111
113
115
116
118
120
121
122
123
125
127
128
129
130
131
132
00:00
Social classTotal S.A.Execution unitCategory of beingIntelCurvatureDefault (computer science)Inheritance (object-oriented programming)PC CardElectronic mailing listTupleMetadataField (computer science)Hash functionType theoryLibrary (computing)Analog-to-digital converterDatabaseClient (computing)Real numberLoop (music)Positional notationDirected graphView (database)NamespaceBenchmarkCategory of beingField (computer science)Social classLibrary (computing)Key (cryptography)Electric dipole momentCore dumpComputer configurationRevision controlPauli exclusion principleValidity (statistics)Default (computer science)Different (Kate Ryan album)Series (mathematics)WordWhiteboardRight angleType theoryDatabaseBitMultiplication signCuboidView (database)Real numberProcess (computing)RepetitionObject (grammar)InternetworkingFacebookElectronic mailing listEmailNumberInstance (computer science)2 (number)Data conversionEquivalence relationRow (database)Axiom of choiceSemiconductor memoryDifferential (mechanical device)Information overloadAddress spaceInheritance (object-oriented programming)Level (video gaming)TuplePole (complex analysis)Cache (computing)Computer programmingPositional notationSlide ruleExtension (kinesiology)Rule of inferenceRAIDData dictionarySoftware bugMetadataFreezing1 (number)Module (mathematics)Musical ensembleError messageAuthorizationParameter (computer programming)Self-organizationPhysical systemTerm (mathematics)CASE <Informatik>Context awarenessSystem administratorOrder (biology)QuicksortMoment (mathematics)Factory (trading post)CodeWritingFunctional (mathematics)Scaling (geometry)Endliche ModelltheorieTheory of relativityNoise (electronics)WebsiteRhombusFilm editingPhysical lawPiRootWrapper (data mining)TopostheorieRepresentational state transferQuery languageString (computer science)Execution unitSound effectStandard deviationAttribute grammarFluid staticsSet (mathematics)Source codeResultantDomain nameRepresentation (politics)Computer iconFile formatSoftware engineeringOptical disc driveTable (information)Total S.A.Software developerDomain nameHash functionInclusion mapSystem callFlow separationObject-oriented programmingKlassenkörpertheoriePlastikkarteDecision theoryComputer animation
Transcript: English(auto-generated)
00:06
Hello, everyone. So just before we start, I'd like to thank the Python organizers for this awesome event. It's been like five years and we're sponsoring it and it's always a great pleasure. I also want to thank Eric Smith, the author of this book for all his
00:26
work on Python and also for his unexpected answers to my emails. So thank you, Eric. So my name is Guillaume Jollon. If you want to find me on the Internet, I'm called Ramness.
00:42
So if you want to add me on LinkedIn, Facebook, anything, Ramness.eu is the place. I'm working as a lead software engineer at Numboli. Numboli is a data marketing company. We have a booth at the Ministry floor, so feel free to come and see us. There's a lot of company
01:03
around. We are recruiting, so if you're searching for a job, you're welcome. Just a quick word before we really go into it. I started playing with data classes not so long ago, so please don't take my word for everything I say. I made as much research
01:24
as I could to give you the best talk possible. And the more research I did on the subject and the more I wanted to add on this talk, so it's a bit long. I'm not sure we'll have time for questions, but if you have some, feel free to come at the booth and to hit
01:44
me up. All right. So it's going to be pretty fast. Are you ready? Yeah? Okay. Let's go. So let's go back one week before. I was in vacations in the south of France near the
02:03
sea and that was great. I'm kind of like an entrepreneurial guy, so I was thinking what startup could I launch today? So the first thing you look at when you're searching for to build a startup is a domain name, right? So I found this domain name, which
02:21
is amazing.com. I thought it's a good domain name, I bought it, and I thought I could make money out of it, right? So I want to, my idea is to make a marketplace of seller and buyers and to have that marketplace, I need an inventory of all the times I have,
02:45
right? So I have a friend who told me, well, Python 3 is not great, it's slower than Python 2, you should use Python 2, so it's been a long time that I didn't use Python, but I remember like this name thing, right? So the first thing I did was to create
03:04
some items in my inventory with that name tuple and it was great, it did the job, but as my cut base went larger and larger, I ended up doing not really DRI things, repeating
03:22
a lot of stuff and one thing really annoyed me is that I couldn't define default. So I took a look at Python 3 and I discovered that new version of name tuple, which you
03:42
don't have from collections but from typing and which has uppercase letters, so it arrived in Python 3.5 and the annotation support that you can see here arrived in Python 3.6 So it's way better because rather than calculating every time my total cost of all the times
04:07
I have in my inventory, I can define that function once and then call it whenever I need and I have some kind of default too, like I have a default entity which is zero,
04:21
but the problem is what happens if my default is immutable, like if it's a list, for example. Let's say I want to, for my items, I want to have all the related items to that item, so I want a list of items. If I define it directly in the class just like I did
04:42
for entity, it's not going to work very well because each time I'm going to modify the list in an instance, it's going to be modified in all instances, so it's not that great. So I thought maybe I could overload the new method so that when I get my related items,
05:07
I check if it's there or not. If it's not there, I create an empty list. That's why I don't follow that. It's weird because Python used to be much more permissive than that, right?
05:27
So we're almost there, but not yet. So what I thought I could do because I don't like when
05:44
people tell me no, you can't do that, is to define a real new method to assign it to the new method and it works. Well, it's a bit creepy, right? So yeah, nothing to see here. Please move on. So this is the idea behind PIPE 557. It's to provide a real data class
06:12
that you could use for that kind of stuff. So it arrived in Python 3.7. It's available in Python 3.6 through a backport called data classes, which is on GitHub. And that's great
06:25
because you can take a look at the source code, so very interesting. It's made by Eric Smith, which is the owner of a small IT business in New York called Trueblade. What they do is mostly written in Python. Eric is a co-developer of CPython, the author of several major
06:46
contributions. He started to use Python a long time ago to escape C, but ended up doing a lot of C to code Python. And he's happy because data class is actually a place where he could
07:04
code in the standard library in Python and actually use F string that he also coded himself. So that's great. So what Eric wanted is something that soons and feels like name tuples, but with real defaults and that can be mutable because name tuples are immutable, right?
07:26
So he created data class. I'm pretty sure you've seen that example already. So if you compare it to the example with the name tuple, it's really close, right? The only difference is that you don't have the name tuple in every tense,
07:44
but in place you have the data class decorator. And the reason you choose the decorator is to avoid inheritance so that you don't have diamond problems. And also decorators are a great way to inject methods inside a class.
08:06
And maybe this is something that we'll see more and more in the standard library. Also, data classes are meant to be used with type annotations. So it's Python 3.7 and Python 3.6, so annotations are there to be used. So think of it.
08:26
So what data class give you? The decorator, it injects a init, a dunder init method, and dunder init representation, dunder init string. What the init gives you is that
08:40
for each field that you've defined, it will expect them to be given at the instantiation time. The representation is pretty good also. It's like you can evaluate it itself. Also, you have an explicit hash saying that the explicit hash is hash equal none,
09:06
saying that you can't hash a data class. And it has lots of metadata like dunder data class fields that contains all the field names, the types, the default values, the field options, all the data classes. You have another metadata which is dunder data class
09:26
param that contains all the parameters that were used to generate the class, but we'll see that after. So when I speak about real defaults, oh, yes, it's almost properly aligned.
09:41
So here for the relative example, so I've used field, which is something that you import from data classes, and it gives you an option which is default factory, which you can give like any function. Here I just want to instantiate a list each time I instantiate an exam,
10:08
and it's working just as expected, as we can see. So that's great. There is that dunder post init method, which is specific to data classes.
10:23
Maybe I'm not sure. And it's great because you can specify things like if we don't want the total cost to be a property, for example, we can compute it only once here, and then just use it when we want. What we only have to do is to define the field directly on the top.
10:46
The init false is to say that don't expect that field at instantiation time. I have very hard time to pronounce this. Just expect it to be created later, so for example in the post init. You can freeze data
11:05
classes, so you can pass frozen equal true, and that does what you expect it to do, which means if you try to modify a field, it will raise an error, B1, so real freezing in Python is really
11:26
hard. There is like acts to change this. So one act is to use object, and this is actually
11:40
useful because if you want a field to be computed at the instantiation time, but you don't want it to be modifiable after, you can do it that way, and it's actually the recommended way by Eric, so feel free to use it. Also, like I said, by default, dunder hash equals none,
12:05
so by default, you can't use data classes as keys for Dix, for example, but if you really want to, you can use the option unsafe hash, and you can also, inside fields,
12:21
define which field you want to be considered in the hash or not. For example, here, I'm saying that related items is not to consider into the hash because I think that related items is a table list and it's maybe going to change very often, so I don't want my hash to be fucked up,
12:40
so I'm not going to take it into account, I'm just going to consider name and unit price, and so if I instantiate three items, two hammers with the same price and one spanner, then if I try to make a set out of it, we see that effectively we have two items and not three.
13:04
Also, for those who don't know mypy, mypy is an experimental optional static type checker for Python. What it does, it analyses the code without running it and uses annotations to detect when incorrect types are used, so data class works well with mypy. That being said, the version of
13:29
data classes is not released yet, so it will be released pretty soon, but it works. And if for some reason you want to go back to that name tuple way of doing things,
13:46
you can. There is a make data class function. The only difference with collections.name tuple is that you can give type hints and you can also give defaults and fields options, just like I showed before. And it's basically the same as the name tuple behaviour.
14:10
Also, one great thing is that if you want to go back from your data classes to more standard types like dicts or tuples, you have helpers for this, so as dict and as tuples inside data
14:22
classes allows you to do that. The order is defined by the field definition orders, so if you define the name first in your class and then the price in second, then you will have that order in your dicts and tuples. So that's pretty great, right? We have exactly what we
14:43
want. It's full DRI. We have properties, everything we want. That's really great. But let's go back in time a second time. Maybe what I've showed you remembers you something. Maybe you've seen already a library called ATTRs, or I don't know how to pronounce it,
15:03
but ATTR something. So it's Python 2.7 and 3.4 plus library that you can install on pip, obviously. It's made by Inej Shlawak. Maybe he's here? Yay! Please give him a big hand.
15:22
Well, he's in Europe, but not in this room. But the ATTR library is really what the original idea of data classes, right? So Eric, when he created the data classes pip,
15:42
he went through a lot of mails, issues, et cetera, of ATTRs to understand their decision making, what choice, what trade-off they took. And as you can see, it's really, really close to data classes, right? You have that ATTR.S, which is the equivalent of data classes.
16:03
You have that ATTR.IB, which allows you to give a default. So that's really, really similar. But it's much more powerful. It gives you much more methods. As you can see, there is in its rep, but you also have N-E-L-T-L-E-G-T-G, et cetera.
16:29
It has validators which are not present in data classes. So, for example, here, I have a price. And I don't want an item to be over $9,000. So I can just validate that
16:45
with the small snippet. And it uses the decorators and setters for that. Pretty smart. You also have built-in, well, built-in ATTR validators, like instance off,
17:04
which allows you to dynamically verify the type of your field. And not just with MyPy or some static analyser thing. You also have converters. Converters, like, for example, if you're
17:25
working with external APIs and you know that the types on that external API are fucked up, but you want in your database to have proper types, then you can use the converter option
17:40
of ATTR.IB, which means, by the way, attribute. I took one hour to understand that. So, right now, you may be wondering, like, why I would use data classes if ATTR is doing basically the same thing and it's much more powerful, right? And this is answered in the PEP,
18:02
actually. So the idea is ATTR is a third-party library that wants to keep its freedom to move fast and to implement any features they want to implement. Also, Python is really oriented towards simplicity. And a lot of features that are in ATTR are not useful, like,
18:28
100% of the time. So, I don't know you, but, personally, I tend to value a lot simplicity. So, if I don't need the extra features, I'm mostly going to use data classes.
18:44
And also, I prefer data classes syntax, which is, I feel, much more clear, because, as we've just shown before, ATTR syntax is a bit strange. But, hey, I guess it's just a matter of personal preference, right?
19:05
So, let's go back to my wonderful amazing.com website. What if I had a NoSQL database behind, right? Something like MongoDB. The data class example wouldn't work, because, in a database, you don't know the field in advance, or you can
19:26
know them, but you want to have the freedom of letting someone modify them directly in the database without it breaking all your code base. So, here we can't use databases, data classes, right? So, for example, with MongoDB, if we connect to the database and get
19:47
documents inside a collection, we get a dictionary. The problem with dictionaries is that it misses a real object-oriented programming types, so, for example, if I have users on the side and
20:02
items on the other side, if both are dicts, how do I know which one is what? Do I just have to make a reference of item? You miss properties, and also the dot notation,
20:23
because the dict notation where you have braces everywhere is a bit cumbersome. So, one smart guy could make, could inherit from dict and just make the properties that would work. That's a bit creepy also. So, we're not going to stay on this.
20:45
And there is a thing called simple namespaces. It's in a module called types, so there is collections, typing, and types. It's not the same thing. And that simple namespace class
21:01
is really neat. You can see it like a very bare version of data classes. It just gives an init method and a representation to the class that inherits from it. And what's great is that it doesn't ask you to define fields. So, in case of the MongoDB
21:26
example, it's nice, because you can just take whatever is your dict and throw it inside your class and you have the type, you have the dot notation and you have the properties.
21:42
But one thing that's missing here is that we can't go back to normal types. We can't have a dict with a simple namespace. We can't have a tuple. So, you should end up doing that kind of stuff. Maybe we could have something more powerful. There is that library called
22:02
box, which is on GitHub, which is basically a wrapper around dict. It works a bit like simple namespace. But you can you have a method like to underscore dict that allows you to go
22:22
back to a dict. I think if I'm not wrong, there is also a to tuple method. I'm not so wrong. If you have dict inside dict, it's going to take the whole thing and make box out of everything. So, you can do item one, item two, item three, et cetera.
22:46
But the problem with box is that it's super slow. I'm not really much into performance usually, but here it's really problematic. So, as you can see here, like
23:08
here it's a microbenchmark, which is mostly about measuring the time it takes to create a bunch of objects, like 50,000. And also with some reads inside, you can see that using a dict
23:25
or a simple namespace is super fast. Then you have I can't read my slide. Name tuples, like collection name tuples, the non-typed one.
23:43
The typed one is also great. And I will talk about it later. And then 80 tier and data classes are pretty much at the same level of performance. But box is like three times, like you take three times more time to create a box than a data class.
24:07
So, that's not very nice. So, let me introduce you to Singy. Singy is a library we have developed at Numbly. It's Python 2.7 and 3.4 plus compatible. You can install it on PIP. It's
24:25
on GitHub. It looks like simple namespaces, but it has that view method which allows you go back to a dict. And the real nice thing about Singy is that you can have different
24:43
kind of views. You can define new views. So, for example, if I want to be able to have in my dict the result of the property I've defined, I can define a view called with total, for example, and include the total cost inside. And when I will call item view with total,
25:06
I'll get dict with the property included. You can also exclude fields, select exactly which fields you want, that kind of stuff. So, for example, when you're in the context of a REST API, for example, backed by something with MongoDB, it's really nice because
25:22
you can say exactly for each view of your API what field you want to return. And, by the way, we have a Mongo thingy, which is basically an ODM object document mapper based on Singy, which is something a bit like Mongo engine, but much more simple and much
25:46
more close to PyMongo query language. So, it's Q&A time, but I'll start with a self Q&A. So, the reason I made this talk in the first place is because I was developing Singy and
26:01
I saw the data class thing and I thought, shit, maybe what I'm doing for a year is completely useless, maybe they just implemented something in Python that's doing exactly the same thing, but as we've just seen, I think there is a place for both, because data class is more on type stuff and really defined field, that kind of stuff. Singy is much more about
26:28
freedom and you can put whatever you want inside. So, the answer is no. Does data class deprecate name tuples? Honestly, yes, definitely. Whatever modules are from.
26:46
Does it deprecate SQL alchemy? SQL alchemy is a big, big thing. It's used by a lot of people, so I don't see it deprecated anywhere soon, but I could imagine a rebase of SQL alchemy on data classes. I don't think it's
27:08
going to be soon, but it could happen. Does it deprecate Marshmallow? For those who don't know, Marshmallow is a validation library where you can define class with each field being like
27:25
a rule that your field must comply to. So, once again, I don't think so, but I think Marshmallow could rebase on data classes, especially since data classes was sought
27:42
with extendability in mind. So, it's really like you can plug sub-party libraries on it. So, maybe that's a good idea. I don't know. And the last question would be does it deprecate class? And I leave that question to you. Thank you.
28:13
All right. So, we have time for maybe one more question. Okay.
28:21
I still got live questions, so please speak well. Did you run more benchmarks in all the competitions? Because I can imagine that you would use data classes or address or name tuples if you have a lot of instances like database rows, and then memory consumption may be important too, for example.
28:44
I didn't make more benchmarks. I really just made that benchmark because I wanted to use in a real world situation, I wanted to use bugs, and that's really what made me retire bugs. So, that's what I showed that benchmark. But, yeah,
29:03
feel free to do more benchmarks. Any other questions? Last one. It could be faster. It's a bit of philosophy. Don't you think that sometimes it's good to have a clear model because you said mongo is
29:23
great to scale, but the moment you get too many fields? So, actually, no. In the latest version of mongo, you can enforce schemas directly inside the database. But it's like the database administrator work. It's not your
29:44
work as a developer. But, yeah, definitely. There are situations where you want clear schemas, right? And I'm not saying that that's not the case. I'm just saying that sometimes you don't want that. Yeah, it's basically when your system becomes larger with a lot of parameters,
30:03
a lot of properties, it's easy to make a mistake. That's a trade-off, as always. Thank you. All right. Thank you.