Python Descriptors for Better Data Structures
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 112 | |
Number of Parts | 169 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/21082 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
CodeData structureNetwork topologyBuildingWorkstation <Musikinstrument>Level (video gaming)Position operatoroutputSatelliteData storage deviceFunction (mathematics)Physical systemCellular automatonStatistical hypothesis testingStatistical hypothesis testingBinary codeDatabaseDependent and independent variablesSocial classRevision controlData typeCommunications protocolRegulärer Ausdruck <Textverarbeitung>Condition numberAddressing modeFreewareSicValue-added networkChainAttribute grammarElement (mathematics)Price indexPattern languageBitTraffic reportingConfiguration spaceServer (computing)Programming languageGame theoryComputer programmingObservational studyProjective planeSummierbarkeitAreaField (agriculture)Algebraic structureNetwork topologyVariable (mathematics)Special functionsBinary codeElectric fieldLevel (video gaming)Statistical hypothesis testingVideo gameForm (programming)Social classAttribute grammarContext awarenessRevision controlServer (computing)Cellular automatonData structureSoftwareCartesian coordinate systemFormal languageProgrammer (hardware)Functional (mathematics)MereologyLine (geometry)Dependent and independent variablesDefault (computer science)Workstation <Musikinstrument>Element (mathematics)Polar coordinate systemWordConfiguration spaceObject (grammar)Process (computing)SatelliteParameter (computer programming)Inheritance (object-oriented programming)Module (mathematics)Student's t-testFilm editingSystem callTraffic reportingLetterpress printingPositional notationVolume (thermodynamics)NamespaceFitness functionPresentation of a groupMortality rateStatistical hypothesis testingCodeDatabaseBitDifferent (Kate Ryan album)Serial portExtension (kinesiology)WeightRight angleComputer animation
07:32
Data typeSocial classLibrary (computing)ClefRevision controlInterior (topology)Element (mathematics)Function (mathematics)Field (agriculture)Algebraic structureInstance (computer science)Element (mathematics)Social classDifferenz <Mathematik>Electric fieldTraffic reportingAttribute grammarSet (mathematics)ManifoldFitness functionGenderLibrary (computing)MaizeSystem callPhysical lawData structureField (agriculture)Potenz <Mathematik>Parameter (computer programming)Theory of relativityDefault (computer science)Letterpress printingData recoveryWordPresentation of a groupDifferent (Kate Ryan album)MereologyNetwork topologyCommunications protocolStatistical hypothesis testingInheritance (object-oriented programming)Projective planeMortality rateExecution unitMultiplication signInstance (computer science)Field extensionLecture/Conference
11:35
Social classInstance (computer science)UsabilityClefTransformation (genetics)Set (mathematics)Mechanism designAttribute grammarFunctional (mathematics)SequenceLine (geometry)Sign (mathematics)Reduction of orderMathematicsComputer animation
12:47
Inheritance (object-oriented programming)Social classDefault (computer science)Instance (computer science)Value-added networkReading (process)Attribute grammarRevision controlData typeCASE <Informatik>Field (agriculture)MereologyData dictionaryAlgebraic structureError messageCASE <Informatik>ChainingElectric fieldNumberElement (mathematics)State of matterGodPerformance appraisalArithmetic meanInstance (computer science)Field (agriculture)Inheritance (object-oriented programming)Attribute grammarPhysical lawSet (mathematics)1 (number)Social classDefault (computer science)Range (statistics)Revision controlFunctional (mathematics)LogicSystem callInterior (topology)Maxima and minimaHeegaard splittingValidity (statistics)Type theoryComputer animation
16:31
PressureFunction (mathematics)Element (mathematics)Letterpress printingReading (process)Statistical hypothesis testingRight angleLinear regressionMatching (graph theory)TouchscreenProjective planeAlgebraic structureDifferent (Kate Ryan album)Category of beingData structureField (agriculture)Statistical hypothesis testingStatistical hypothesis testingLetterpress printingAttribute grammarComputer animation
17:46
Proof theoryPareto distributionCASE <Informatik>Statistical hypothesis testingBitTraffic reportingServer (computing)Configuration spaceCommunications protocolSCSIScalar fieldClefSocial classAttribute grammarReading (process)Revision controlIntercept theoremField (agriculture)Data typeAlgebraic structureMultiplication signDemosceneSet (mathematics)Proof theoryGroup actionRight angleParameter (computer programming)Field (agriculture)Volume (thermodynamics)Moment (mathematics)Data structureForestSystem callMereologyGraph (mathematics)Statistical hypothesis testingTemplate (C++)Computer animation
20:46
Inheritance (object-oriented programming)Instance (computer science)Social classClefNetwork topologyDimensional analysisInsertion lossNoise (electronics)MereologyAlgebraic structureWind tunnelSocial classElectronic mailing listElectric fieldNumberDefault (computer science)SpacetimeData structureIntegerMixed realityAttribute grammarHierarchyField extensionLecture/Conference
22:19
Element (mathematics)Letterpress printingReading (process)Statistical hypothesis testingInheritance (object-oriented programming)Social classInstance (computer science)ClefData typeLibrary (computing)Software development kitField (agriculture)Finite element methodRevision controlInterior (topology)Point (geometry)Different (Kate Ryan album)TouchscreenNetwork topologyWeb-DesignerProcess (computing)Line (geometry)Arithmetic meanState of matterPairwise comparisonData structureCodeConfiguration spaceDifferenz <Mathematik>BitComputer animation
Transcript: English(auto-generated)
00:00
organizing teams and privately I'm a cyclist I make bikes I repair bikes I ride bikes I make photography and also I'm an ultimate frisbee player if you don't know that game this is something like Python among the programming languages the same thing is about ultimate frisbee in team
00:22
sports so check it out it's a great thing to do so the talk is going to be about how I already discovered descriptors and I did it when I was working on tree structures the tree structure that's basically you have you
00:46
have the parent note you have some fields inside so maybe other way so you have many nodes one can be put inside another then we call this now the field of the parent node and you can also put value inside the field so the basic tree
01:08
structure is like in Django you know the moles in Django you know Django so so we have fields in there this is one level tree and what I discovered when I
01:21
looked when I was watching the presentation by my friend is that I many people have seen the descriptors have even heard about where they are used but they didn't see the obvious way to use it in other places or to override the default behavior so have you heard anything about descriptors can you raise your hand yeah and have you ever overridden the default
01:45
behavior that's what I thought and so I'm gonna start with introducing the project to give you some context I'm gonna tell you about fighting the legacy code and how I when I got the opportunity I have written the tree
02:04
structure that was used over there and what I learned from that so there's gonna be a legacy code ahead first the project project was a positioning system it means we have a mobile phone which sends the GPS position along with what it sees in the radio so the what base stations of the cell network
02:27
it sees and we process thus we calculate the positions of the radio stations we gather this data and then when the another mobile phone the one down sees only the net the cell network stations we can give it an
02:44
approximate position and this is used in an assisted GPS when you because otherwise the phone would take like 10 minutes to lock onto the satellite because it doesn't know anything about the satellites and the receivers are cheap inside mobile phones and that was the c++ project and we had a test
03:02
project alongside that was written by a c++ programmer and this project basically replaced the mobile phones and also had access to database so we were sending binary data we were comparing the responses against what we want to see and we are also checking within the database that the positions
03:22
are updated in the right way so that's basically it and after the code so there were a few problems with the code we had start imports because the c++ programmer didn't know and another way to deal with the modules he wanted everything basically in the same namespace so every package was a
03:41
of itself and it was importing everything from the sub modules themselves that's the thing we didn't deal with because the project was used in other areas also and we had a few things that I think stem out of not having introspection in c++ in such an extent as in Python so we have a
04:04
repeated names as you can see here there are slots that protect us from assigning to the wrong name we have an initialization where we call the the base element and we tell okay so under these names we want such fields and we
04:23
also give those names of the fields again to the fields themselves so that's a bit of a hassle then again in the top-level structure we have all the functions defined in such way that they are manually looking at the tree below
04:42
and they are doing the printing the serialization the comparable the comparing to other objects and that doesn't have to be like this it can be automated because we can know everything about this structure but there is also something very peculiar and which I would like you direct your
05:04
attention to this is the Mortal Kombat guy over here to show you this is the call function and the call function was used to store the value inside a field so instead of assigning where you would lose the field and instead had a different you would put a simple object inside this attribute we have we
05:26
are calling the field and this way we put the value inside a field and the field is still usable with all these methods so what happens when you have a structure like this so you have to be very careful with references and so
05:40
having variables to collect some parts of data and then constructing it it's it's rather risky so all the test definitions were done in long chains of attribute access where the lines were copied by something it's not very usable also we had another idiosyncrasies from C++ where we used the static
06:03
methods within the class so we are basically making a singleton out of the module which already is a singleton so this is another thing it's funny here we have some special functions that deal with some of the data like the said version we have those calls as you can see where the cut is the second
06:23
cut that actually said the data and and that's about it so when you think about such structure and what is your goal when you are doing such test project I thought my goal was actually to provide for the testers so they are
06:44
the guys that are that are needing my work and what I wanted to do is give them easy to use trees so that they can easily define the data and that's how I imagined it so so I would not have this repetition I would have clear structure I would know which element goes where and I want to look at it
07:06
because those three structures were used everywhere they were used in server configuration they were used in difference reporting they were used in feeding data they were using feeding database so so when you look at it this definition looks very much like an assignment and so I thought this is the
07:25
right way to go and I thought well I'd like to have something like Django where you have those fields where you can assign values but you can still use them and call method on them so I tried to make it like that and the first step was to use the keyword arguments and so I added those cute
07:45
arguments into the base element and I used them everywhere in the hierarchy below so we have at least ABC element which inherits from base element and I have keywords there and the second thing I did is I actually
08:03
used the printing to give the differences so I use the diff library to give the reporting to the duster and this way you didn't have to walk the tree for for different supporting but again this mortal combat guy shows you the place
08:21
where there is something interesting happening so in the unit we call the get out on the item and then we call it with the value to set up the value and so we have this cooperation between the call in the parent which is the neat one and the call in the child which is the call one which
08:44
intercepts the value and puts it inside and so that was the first thing and pretty soon the another protocol came and I didn't have to deal with the rest especially because other testers were also using it not only my
09:01
team so what I left there was the start imports I read the repetition when defining the data I left the recitalization because it was using some C plug-in and I said because the other projects are were using it so
09:31
next time I had a chance I took the opportunity to actually implement those data structures and you because we had a new protocol to implement and and again
09:45
I was looking at Django and I thought well maybe not only the the initialization can be dealt with but also like assigning the values to the attributes so when I assign I want to the value to get into the field and I want to feel to be usable still and so the first straightforward way as
10:06
seen it was to use the set at which which I in which I have overridden the default behavior and I redirected the assignment to the assign method and so in this place I use so the setup is used in the parent when you access the
10:27
attribute and then this assignment is redirected to the child and this child is dealing with putting this attribute in the right place in the field and also putting the other fields that can be in the subtree so we are I actually
10:42
had this in two separate classes but I couldn't fit that in here so this class is basically both the parent and the node so each node has this ability to intercept the assignment and each node has this ability to take care of it when it's a child and so we are taking care of the value and we are
11:01
taking care of the subfields that are in the in the in the chart in the chart and another thing I took from Django have stolen it it was the creation counter which let me get rid of the repeating the names in the
11:20
slots so we have a creation counter we copy it from the class into the instance and this way we can easily sort the stuff because as you go when you define when you call those classes interest instantiate those fields they each get a creation counter in the sequence like like the lines go in the
11:42
file and so it was almost there we it happened to be that actually I invented something like a data descriptors and the data descriptors just go in a different way so I already read in the set attribute magic method and I called
12:06
assign and what the descriptors data descriptors do they actually have this mechanics within the get attribute and they use set for setting the value inside the child so those those two being said that means and assign change
12:24
the functionality and I I put them instead there is a get attribute which would call our set and the set is in in place of our assign so so this is the transformation that happened and basically I'm using the data descriptors
12:42
and what I think this doesn't seem scary I think people get get scared because they see this cold tree and think oh my god this must be complicated I don't know why is it here and this is complicated because this is actually few things mixed up together there is inheritance here there
13:02
is descriptors and there is data descriptors so the ones in red and again I have those those two matching pairs marked pink so there is a parent call and there is a child call and the red ones are the data descriptors so we
13:21
are basically checking within the class because we have to do it before we access the field of the instance and if there is a get and set we call set and the set is doing our work and now the green part is basically an
13:40
inheritance chain so we take care that we take stuff from the instance and in case there is not in the instance we take care to take it out of the class and if we take it from there we check again if there is a get method so that we can override the behavior and at the end the white one is a fall back
14:02
which you can use and which is also quite fine to to define something like a dictionary with attributes when there is an attribute missing you just you can override the behavior when what happens when when you get no such
14:21
attribute and in the in the default is that attribute error that you get and so after you have done all of this and then you make an a structure all inheriting from this base element you can do things like put some fields in
14:43
there to give you the ranges for the validation put some fields to give you default values and all this is safe from writing when you are when we are defining data so this is a type definition I was I had actually two kinds of the first split was between the structure and the atom so the
15:01
structure was dealing with copying the fields the atom was dealing with copying the value within the field so so the structure was basically like a subtree the the atom was like like a single element within the structure also at least and then you simply define the data say something about
15:24
the max value and when you have a you in then for example you inheriting from int and the minimum value of zero and so on and so on and this becomes more readable but the most important thing is that you actually get this kind of use that let's look at the down the lower part and you have this ABC
15:44
element which has a version and this has a minor field so we want to change this minor field and we don't have to call it so it doesn't look strange but you are when you assign to it you still can validate it you can serialize it and this way you move all the logic down with the structure so
16:06
you only care about what this particular field is doing in respect to serializing and then the field above is doing is calling this field to get the serialized data and then construct the bigger parts so this way you can
16:24
spread the functionality and actually in most cases you don't have to write it again in in the fields in those small items over there and so the descriptors are basically that they are overriding the access to the
16:42
attributes and this gives us the possibility to to put it somewhere else to put the thing that we are calling it with like the assignment somewhere else and then still use the magic field that has some properties and and methods to be called and as a bonus in this project it gave it the
17:05
structure started looking so easy that I was actually able to redefine the printing in such way that I was able to print those structures onto the screen so from the previous situation on the right where we had a different way
17:22
of printing and a different way of defining data we came to the situation that the printing and defining was basically the same and so we could do regression testings within the project and the testers got a really big performance boost on that because they were able to copy the structure from the screen and just put it in place modify some things and and test
17:43
regression progressively so and that's about it so I think the moral of this story is that you can prove anything with a concrete example and I like how it's a how Joe as Polsky put it and I also seen it in the morning on some
18:06
other talk that there is this part of principle and I wanted to say that you should actually when you are doing something you should choose the most bang for the buck and try to not get get into rewriting stuff until you get
18:22
to new features because it doesn't bring business value and and as I did the I did the new solutions on new features so if it's broken don't fix it do you have any questions I think I rushed through it yeah I think I still
18:53
have time so so so the story is first I I took care of the low hanging fruit
19:00
and I put the keyword arguments in there to initialize the value and then I realized that when you look at the data it looks pretty much like an assignment and I wanted this assignment to not only be used in in the initialization but also to be able to assign again to that data structure so
19:25
when constructing the data structure I could have some definition but then I could copy the data structure and then assign again and modify it for some other tests so have some basic template and modified it modify it later and this led me to try to override the assignment which is
19:44
basically attribute access plus plus the assignment and I tried to do that with the setup and assign and this works but then I realized that actually those things are already there so the how I think about it I think the strict
20:06
descriptor API data descriptor API is mostly about this kind of thing where you try to take over at the moment where somebody is assigning to something then take this data put it inside the field or a node and then still have
20:26
this magic node that does all the work for you but you have the ability to write the code like like that where you assign to something and then call the stuff that is needed for you and I think it led me to thinking that
20:43
descriptors are not very much the the big call graph over here but this is only the the red part which checks if the methods are there and calls them if they are there so this is the default behavior of the get attribute stuff get attribute method in Python and I thought I'd like to share it no I
21:19
just called those subtrees which contained another other fields which
21:25
could be also subtrees struct so I build a class hierarchy so I didn't have to reimplement the stuff that would be common for example two integers two numbers two billions and so this these are the atoms and for the
21:41
structures I'm by that I mean lists trees and so here I'm actually showing something that is a mix of the two because I'm copying over here I'm copying value that's what the atom does and I'm copying fields because I didn't have space and I didn't want to complicate it so I wanted to say
22:03
there is a note you can put a value in it but you can also have subfields and those subfields cannot again be a note you can put a value there you can put subfields and this way you build a tree from that as I said I think it
22:26
had a big impact on the performance of the team by by means of this one where they could actually copy paste the data from the screen and I think that was the biggest thing from the other stuff I also changed the comparator so where
22:43
you had two structures normally it was walking down the tree and trying to compare each of them and then said okay this is different from that one but maybe even didn't give the value so I changed it to such way that I actually use the printer I printed the both structures and then put a diff on
23:01
and removed all the non-differing lines from the diff which I don't show there actually and this is this is a diff I copied from the from the code so some these were I think the two biggest improvements the other improvement was
23:22
about the configuration but this was a bit a bit different tree so so I think it's not related well I'd no longer work at the company so right now I'm
23:41
doing web development other stakes and this is a much better line of work and I think we have refined process with a lot of we have almost 100% coverage we have the ends to run the stuff we have a working scrum and I think I'm not going back over there to fix that stuff and I think more so I
24:14
guess thank you