We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Don't rely on discipline

00:00

Formal Metadata

Title
Don't rely on discipline
Title of Series
Number of Parts
50
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2019

Content Metadata

Subject Area
Genre
Abstract
In the programming field we often rely on discipline. We expect from ourselves and from others that we will not introduce bugs and cause problems. That we will use the libraries and APIs as they are intended. That we will not cut corners. Sadly, tales from the industry tell us otherwise. This talk explores why we should not rely on discipline as a bouncer against bugs, and what to rely on instead.
Plane (geometry)OctahedronMusical ensembleElectronic mailing listOffice suiteProgrammer (hardware)BitPerpetual motionJSONXMLLecture/Conference
Plane (geometry)Game theoryHacker (term)Execution unitSoftware testingPoint (geometry)Category of beingWindowFamilyStress (mechanics)Goodness of fitProcess (computing)WritingType theorySoftware testingProgrammer (hardware)Product (business)Unit testingCASE <Informatik>Endliche ModelltheorieMobile appInheritance (object-oriented programming)CodeRule of inferenceSocial classHypothesisMultiplication signNetwork topologyDivisorConfidence intervalRandomizationControl flowCode refactoringSoftware developerComputer animation
Plane (geometry)AreaFamilyNetwork topologyObject (grammar)TouchscreenCodeSoftware testingMessage passingComputer fileSoftware developerAttribute grammarMobile appError messageGoodness of fitFitness functionComputer animationSource codeXML
Plane (geometry)Attribute grammarData typeComputer fileDew pointComputer animation
Plane (geometry)Pointer (computer programming)Object-oriented programmingError messageDecision theoryCompilerVulnerability (computing)Physical systemCrash (computing)Core dumpType theoryLibrary (computing)
Mathematical optimizationCategory of beingPlane (geometry)Limit (category theory)Attribute grammarMathematicsPoisson-KlammerType theoryLibrary (computing)Parameter (computer programming)Attribute grammarSocial classInstance (computer science)CodeCore dumpInformationXML
Attribute grammarError messageMusical ensembleCategory of beingSource codePlane (geometry)CASE <Informatik>Software testingPoint (geometry)Attribute grammarSoftware bugObject (grammar)Computer animation
AreaPlane (geometry)Category of beingAverageProduct (business)Software bugEmailTraffic reportingCrash (computing)Software testingError message
Plane (geometry)Category of beingElectronic mailing listAttribute grammarSystem callParameter (computer programming)MathematicsXML
Musical ensembleForceCategory of beingPlane (geometry)Source codeError messageElectronic mailing listAttribute grammarObject (grammar)Software frameworkFinite element methodError messageCodeCASE <Informatik>Software testingWritingAttribute grammar2 (number)Mobile appInheritance (object-oriented programming)EmailProduct (business)Software developerRepresentational state transferObject (grammar)Scripting languageSoftware frameworkElectronic mailing listType theorySoftware bugExecution unitMultiplication signComputer animation
Function (mathematics)Plane (geometry)Musical ensembleSoftware frameworkDefault (computer science)Data typeError messageLocal ringComputer fileRevision controlCore dumpObject (grammar)Module (mathematics)Product (business)AdditionInheritance (object-oriented programming)Software testingInformationType theoryDependent and independent variablesBoom (sailing)Front and back endsEmailComputer animation
AreaDefault (computer science)SerializabilityObject (grammar)Software frameworkLocal ringComputer fileRevision controlMusical ensemblePlane (geometry)Category of beingElectronic mailing listSoftware developerString (computer science)Key (cryptography)InformationType theoryXML
Execution unitError messageAreaPlane (geometry)Bellman equationMusical ensembleElectronic mailing listCategory of beingAttribute grammarFunction (mathematics)Punched cardFunctional (mathematics)Parameter (computer programming)InformationProper mapString (computer science)CASE <Informatik>CodeMatching (graph theory)Object (grammar)Product (business)Type theorySystem callResultantXML
Category of beingPlane (geometry)Instance (computer science)Ordinary differential equationAsynchronous Transfer ModeError messageNatural languageEXCELInformationText editorString (computer science)Type theoryCloningRun time (program lifecycle phase)Parameter (computer programming)Integrated development environmentCodeMassMultiplication signResultantMathematical analysisNatural languageError messageDifferent (Kate Ryan album)Functional (mathematics)Line (geometry)Software bugDeclarative programmingComputer configurationCoefficient of determinationSoftwareMereologyComputer animation
Plane (geometry)Software testingSoftware bugType theoryMathematical analysisProduct (business)Similarity (geometry)2 (number)CodeComputer animation
CodeComputer programAlgebraic closureExistenceSoftwareDataflowError messagePhysical systemTelnetModemData typePlane (geometry)Execution unitSurvival analysisComplete metric spaceNewton's law of universal gravitationLine (geometry)Repository (publishing)Software bugType theoryProduct (business)BitCloningTrailOpen setSoftware developerCodeSocial classSoftware testingBlogFormal languageLine (geometry)Core dumpRepresentational state transferComputer animation
Form (programming)Type theoryOpen setPattern languageInterface (computing)Communications protocolDirection (geometry)Social classDescriptive statisticsPhysical systemPlanningImplementationMusical ensemblePower (physics)Library (computing)Intelligent NetworkUniverse (mathematics)Model theoryExtension (kinesiology)Lecture/Conference
Plane (geometry)Kolmogorov complexityExecution unitSystem programmingAbstractionVideoconferencingMusical ensembleJSONXMLComputer animation
Transcript: English(auto-generated)
How many years does a medical doctor need before you say she's an experienced doctor? How many years? Three, five, ten? Ten? You know, you come into your doctor's office and there's
a new doctor and you say, yeah, I'm this and that, and I have six years of experience. Would you think to yourself, yeah, that's an experienced doctor? Or would you be like, maybe a bit young? Another question, how many programmers were there in the 1940s?
One? Two? Alan Turing plus a couple of his colleagues, maybe some people around the world. But today we have 20 million programmers. So how do you get from, you know, a couple to 20 million in a span of, you know, 60, 70 years? You have to double every three to four
years, which means that our industry is stuck in perpetual juniorness because, you know, if the population of programmers doubles every four years, and we say that, you know, for experience we expect someone to have five to ten years of experience. That means, you know, half of the people in our industry, by definition, are inexperienced or juniors. So that we shouldn't
write our code in a way where we expect that the person working on the code after us is going to be experienced because there's at least 50-50 chance that there won't be. And we shouldn't write our code in a way where we expect the person after us to have a ton
of skills because, you know, in a lot of the cases they simply won't. But really we shouldn't write our code in a way where we expect the person, the next person, to be perfectly disciplined, to obey all the rules and do things right, because that person will in
a lot of cases be us in the future, possibly in a hurry, under stress and throwing discipline out of the window. The concept is nothing new. We learned it in the past decade with unit testing. There's a bunch of posts and online from, you know, the
2008 and those years around where people are asking themselves is testing, does testing make sense? Like, should we write unit tests? And there's a lot of people saying that we shouldn't because tests are not flexible, they slow you down, you have to write, you know, you have to write them and that slows you down. And, you know, after about 10 years we kind of learned as a
community that that is not true, that tests, that what we do in our day-to-day jobs a lot of times is just, or most of the times, is just rewrites and refactors and tests enable refactors because they give you confidence to touch the code that you haven't seen in, you know, a couple of
months or maybe a couple of years. And with tests you have that confidence to actually move the code base forward and not be afraid that you will break things in random places. So here's a hypothesis. Types are the unit tests of the coming decade. Not in the sense that they
replace unit tests. I still am very much enthusiastic about tests. I, you know, I try to keep 100% test coverage with everything I do. But in a sense that people say today about types what they used to say about tests and 10 years ago that they're slow, that they're flexible and that they kill productivity. I believe in a couple of years we'll learn same as we did with unit tests
that types actually increase our productivity and bring it through the roof. But, you know, enough with the preaching. Let's look at some code. That's why we're here. We're going to say we're building an ancestry app, a competitor to 23andMe. And we're a startup so we need to ship.
We're in a hurry. We've defined this model which is, you know, a class for a person that has a name and some parents. And now like we need some good data to, you know, to have our dummy data when we're developing. So we need, I need help finding a like a famous
family possibly fictional where everybody knows what the family tree looks like. Anyone has any ideas? Maybe somewhere where it's really important who the father is. So we modeled the Star Wars family tree with our person objects and we're disciplined and
good developers, good engineers and we write tests for our code. We make sure that the init method works fine and we can get the name and then we can get the grandparents. And this is our entire code now. I split it in three. It's a single file.
You can get it on GitHub. I split it in three so it fits on the screen. Yeah, and let's run the test. Okay, so all passed. 100% test coverage. And then
a user comes and we get an error. How could this be? We have 100% test coverage. Like we're
we're disciplined developers. What's going on? And a non-type object has an attribute mother. Fine. This is their app there. Oh, cool. You don't see this. Awesome. I have to do this and
this again. Welcome to the billionaire mistake. This is Tony Hoare. He developed Algol which was the first object-oriented language and he said while developing Algol, also Python is
a descendant of Algol, his goal was to ensure that all the references should be safe and that the compiler would check their references but that he just couldn't resist the temptation to put in a null reference which is a non in Python world because it was so easy to implement and that this decision or this lack of a decision led to innumerable errors, vulnerabilities and
system damage and crashes that has probably cost billions of dollars of of damage in the past 50 years. However with Python 3.5 there's a new core library called typing and we can use this library to document the types of attributes and parameters. For example we can say up there that
the person class has two attributes a mother and a father and they have to be instances of the person class or they can be none. This is what the optional gives you so it can be you know whatever is in the brackets of the person or none. So this is now our new code so you can see
in yellow the changes that I did and we can run mypy again. Oh yeah not again. We can run mypy which is the tool that can read this information, this documentation about types and then try to understand what your code does and then give you hints saying like maybe there's something
wrong there and sure enough mypy tells us that in in the grandparents where we're appending down here we do self mother dot mother and we said that a mother can be a person object or it can be a none so we're in some cases we have self dot basically we're calling none dot mother
and you know you cannot none has no attributes so that's why our code fails. So mypy uncovered a bug that our tests didn't even though we have 100 test coverage and informed by the the warning that mypy gave us we can fix the bug before it gets to
production and we can also then make make the tests better so to make sure this doesn't happen again. We run it in production again no first we do some tests yeah still and now we can go to production and another user comes by and wants to get Kylo Ren's grandparents
and I'm not sure if this bug is now worse than before or not because like the first bug you see it in your error logs you see maybe you get a report in century maybe you get an email like the thing crashes but this one can go unnoticed for you know weeks months and
let's see how we can prevent it with with using typing it's very simple we just besides documenting attributes and parameters typing also allows us to document return values of methods so we say that the grandparents method should return a list of persons because
you know a grandparent is a person and not something else so this is the the changes to code nothing else we can run mypy again and gives us a new error or a new warning it says that we have an incompatible return type that mypy realized that our grandparents
method returns the list of of persons and none and we said that up here in yellow that it should return just a list of persons and then we know oh yeah it's because self mother that mother so the second mother can can be none and in that case we're returning a none
and and that's that's a bug in our in our code so we fix the code by just discarding all all the nuns before we return and we also fix the test because now we know that there's another edge case that we can cover because this is a problem with with unit tests a lot of the times you're just in your mind you only think about the happy path you know how the code should run and you write tests for that it's really it's really hard to uh to figure out
all the edge all the possible edge cases and you have to be super disciplined to really catch all the possible edge cases that that you can then write tests for and mypy hubs helps you to discover some of those edge cases obviously it's not going to catch all but if you can have
a script that runs in in a second or two in a couple seconds and check that you've caught all the edge cases that's that's super good you run your app in production and you get an email that like super nice they want to use it your cto wants to use it in their rest api framework and that all these objects need to implement a dot json attribute to so that they can use it
uh as because the rest api returns json it's very simple we just add a new method called json and returns a name add a test because we're disciplined developers all the tests pass good
and it's in production we get another email like this is really good but i want to optimize the amount of api requests i'm setting to the backend and i want to get the names for of parents like with additional requests why don't you just include the parents information in the in the request itself in the response itself i'm like yeah that's that makes sense let's do it
so besides the name we say we just add a add the mother and the father uh we again fix the tests uh be random still all the tests pass 100 test coverage we push push the production and the user comes and says that they want a json for
luke skywalker boom again and this one it's it's a different like it's not something with none it's object of type person is not json serializable and let's see if we can fix it
with types or if my pi could could catch this you will notice that we were not actually really disciplined developers because we forgot to add the the return documentation and when we do we say that this json method will return a dictionary that will have for keys it will have strings
and for values it's going to have strings or none nothing else changed just this type information and we run mypy again and again we get a a new warning from mypy so because we we gave mypy more documentation about what our code should do
it can guess more things about what our code really does and that tells them that if these two information information are not equal so what in this case mypy tells us that where we're we're giving uh so our method json returns none or person object but we're saying
it should return none or a string and these two don't match and then you start look at the corner like of course it should be self mother dot name because that's a string and json returns like it needs to be a string and but just before you push the production you run mypy again and there's another warning well you know sometimes mypal will know that self mother can
also be a nun because you said it can be a nun and actually you cannot call a dot name on a nun so that would fail if you push the production and then you finally fix it the way you should like if mother exists only then call dot name and you look at the code and like
are there more like should are there more documentation that is missing that and i'm not disciplined enough to to realize where they are and can we stop relying on being disciplined well yes we can we can call mypy with dash dash strict and it's going to tell us all the
parameters arguments and functions that don't have proper type documentation and we run through them and this is the end result this is our code fully typed there's not a lot of like there's not a lot of extra characters it's not like when you use types your code is going to look
massively different it's very slight differences and also this these types are completely ignored at run time so there's absolutely no performance impact and they're very handy for the person next like the next person reading it because it's basically the same as having doc strings writing doc strings but then having a butler making sure that all the doc strings
are always up to date that butler is called mypy and also your ide or your code editor uses this information to give you more and better suggestions when you write code like i said having types is like having always up to date and up to conventions doc strings in in clone api we agree to document function parameters using the autodoc approach and so the line above
generates documentation like this in in docs.plone.org and we basically relied on every contributor being a disciplined engineer that will not forget to document the return value and sure enough after five minutes of browsing clone api code i found a function that is missing
the return declaration and it's also missing in the documentation as a consequence and nobody across 60 contributors over the span of six years noticed this and you know today bugs like this are completely preventable using mypy because mypy will alert us if we use the dash
string strict option that we haven't documented or properly documented the return time side note it's been six years since blown api damn i'm old but the mass can be way bigger than just missing documentation a recent paper shows that you know one in five genetic paper
papers contains errors because excel converted gene names to dates the amount of of you know scientific hours that are rendered completely useless because of a typing mistake come on and airbnb said that in a recent post-mortem analysis they found that
38 percent of the bugs that that they surface into their production could have been prevented by using types and you always like you know that the earlier you catch a type the cheaper it is to fix it and mypy and similar type checkers run way faster than your tests so you don't have to
wait 10 15 minutes for your test to tell you that you have a bug in your code like you can wait 10 seconds 15 seconds for mypy to tell you exactly the same thing and there's there's even there's even a paper showing that you know typing does discover bugs so what these people did was they took public javascript repositories and then they found found commit commits that fixed
bugs and then reverted those commits and introduced types and then run the type checker and they found out that they could the type checker could find 15 of bugs that went all the way through testing qa into like production repositories of public javascript repositories and it's like
just for using a couple more characters in the code saves you 15 percent of you know public public bugs it's insane and finally just to give you a bit more uh food for thought there's a recent blog post from the creator of mypy showing how dropbox went from went to type
checking four million lines of python and they also said that without type checks they could never have been able to migrate from python 2 to python 3 and by the way Guido van Rossum is employed by dropbox and he's working on mypy so mypy is not some some obscure thing on the edges of the python ecosystem it's almost as core as it gets and sure this all looks fine but you know
you might ask yourself where to go from here well start using mypy today it's very easy to and just add it to your ci and see if there's warnings that make sense and start learning about it additionally there's there's all this talk about blown like what what blown is today
and blown today is an api but there doesn't exist a document that describes this api you know that's strange and one possible way to do it is is open api which is basically types for your for apis for rest apis and i'm giving a talk about open api in friday so
welcome to that and finally start playing with elm elm is a very delightful front-end language so it's just for front-end it compiles to javascript and it's strongly typed so it will not compile unless you do the typing the types right and it will absolutely change the way you
python code because you learn to recognize like whole new classes of bugs and avoid them and also you will understand mypy way better and mypy is becoming more increasingly more prevalent in the at least in modern python development thank you you net somebody have some
question okay hello hey do you know which is the logical theory behind the typing
library i mean the power as a logical theory what what does it check is there are some i don't know research paper do you know more the internals of that i don't know the internals
but like the guy that created it came straight from university uh and i if i remember correctly like he's a he's a house like he knows haskel and haskel uses helen miller's type system so i think it was inspired by that but i wouldn't know for sure okay thank you
other question hi nick uh one of them i think uh we do have in zope and plon is uh interfaces because interfaces also allow you to describe things python itself's more
going into the direction of protocols and is there also a way or an idea how to make a description that you let your class following a protocol in that way so yeah protocol is you
basic you could say that it's a reimplementation of zoop interface in the modern python and i don't think they're mutually exclusive there's a mypy extension for zoop interface and protocols are a very new thing new thing we're not sure if they're going to stay around in the current implementation so i yeah just they can both live side by side for a few years
easily