We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Automatic Testing of Python Functions Based on Contracts

00:00

Formal Metadata

Title
Automatic Testing of Python Functions Based on Contracts
Title of Series
Number of Parts
115
Author
Contributors
License
CC Attribution - NonCommercial - ShareAlike 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Good programs are correct programs. Testing is an indispensable tool making sure that the software we produce is correct. The bulk of today’s Python programs are tested using unit tests, on a case-by-case basis. While unit tests are important, thorough tests are tedious to write. A lot of edge cases need to be considered and human errors easily slip in. The edge cases are often simply omitted, e.g., for the lack of time. However, this leaves a lot of untested holes in the program. Instead of hunting for edge cases, what if we specified the behavior of a function and let the development tools test it for us? In this talk, we first look into how to specify the behavior of functions using contracts, i.e. the properties of the input and the output of a function. The contracts can reduce the amount of testing code, and allow the developer to focus on more tangible pieces of code, namely defining the behavior. Next, we examine two of our tools for automatic testing based on contracts (crosshair and icontract-hypothesis). We show how they can be used in everyday programming with little additional effort. A corpus of programs (Python-by-contract) is introduced to evaluate where the tools shine and what limitations they hit. Finally, we show how to use the tools in the classroom to help students with debugging, providing hopefully a better educational experience. The talk is aimed at Pythonistas familiar with lambda functions, Boolean logic (“and”, “or”) and quantifiers (“all”, “any”).
Design by contractLaurent seriesPoint (geometry)PurchasingOrder (biology)NumberStatistical hypothesis testingElectronic mailing listDomain nameInstallation artTotal S.A.Software frameworkStrategy gameLogicBitFunctional (mathematics)Parameter (computer programming)outputEndliche ModelltheorieFlow separationCurveProfil (magazine)String (computer science)CodeMappingQuicksortControl flowSoftware bugTerm (mathematics)Level (video gaming)Filter <Stochastik>Binary multiplierSuite (music)Physical systemIntegrated development environmentEntire functionExecution unitDecision theorySocial classCondition numberDemo (music)PreconditionerCASE <Informatik>Pattern languageBoilerplate (text)HeuristicStreaming mediaTheoremStructural loadRandomizationModule (mathematics)RootLink (knot theory)Single-precision floating-point formatRevision controlSymbol tableAbsolute valueInformationApproximationWritingObject (grammar)Different (Kate Ryan album)Presentation of a groupResultantSoftware maintenanceTouchscreenDecision tree learningUnit testingPerfect groupField (computer science)MereologyRight angleDistribution (mathematics)Sign (mathematics)AuthorizationCounterexampleNewton's methodProduct (business)IntegerMaxima and minimaMachine codeElectric generatorPropagatorConnected spaceComputer iconChemical equationStatistical hypothesis testingHypothesisLibrary (computing)Theory of relativityImplementationSlide ruleObject-oriented programmingData dictionaryPairwise comparisonGoodness of fitVariable (mathematics)Sheaf (mathematics)Statement (computer science)Type theoryWeb pageThumbnailLine (geometry)Category of beingProcedural programmingGreatest elementDampingLoop (music)AreaReverse engineeringFuzzy logicPlastikkarteLengthProcess (computing)Pauli exclusion principleIntrusion detection systemProjective planeData managementSquare numberRange (statistics)Limit (category theory)Pulse (signal processing)TwitterComputer configurationCommon Language InfrastructurePower (physics)CodeRepository (publishing)Mathematical analysisOverhead (computing)Formal grammarBenchmarkMusical ensembleMilitary baseMoment (mathematics)Set (mathematics)Computer fileFunction (mathematics)1 (number)Patch (Unix)Multiplication signRun time (program lifecycle phase)Matching (graph theory)Motion captureSemiconductor memoryExpressionLatent heatTask (computing)Coefficient of determinationFigurate numberVideoconferencingComputer programmingINTEGRALPlug-in (computing)Keyboard shortcutCorrespondence (mathematics)FlagInferenceMotherboardMultiplicationError messageInstance (computer science)Inheritance (object-oriented programming)Visualization (computer graphics)Observational studySubstitute goodThread (computing)Meeting/Interview
Presentation of a groupVideoconferencingOrder (biology)Laurent seriesMeeting/Interview
Design by contractHypothesisFunction (mathematics)CurvatureComputerAbsolute valueRadio-frequency identificationStatistical hypothesis testingNumberFunctional (mathematics)Exterior algebraRange (statistics)Latent heatUnit testingResultantMotion captureStrategy gameTheory of relativityCurveParameter (computer programming)Software frameworkType theoryDifferent (Kate Ryan album)ImplementationTerm (mathematics)Level (video gaming)outputPhysical systemFlow separationStatistical hypothesis testingCategory of beingPoint (geometry)Function (mathematics)Software bugCodeCASE <Informatik>HypothesisAbsolute valueIntegrated development environmentLoop (music)Product (business)Electronic mailing listXMLUML
CurvatureAbsolute valueComputerLambda calculusCodeLibrary (computing)Data typeStatistical hypothesis testingoutputAverageInferenceMathematical analysisSingle-precision floating-point formatDesign by contractElectronic mailing listRepository (publishing)Library (computing)CodeInheritance (object-oriented programming)PreconditionerFunctional (mathematics)Latent heatSoftware bugWritingType theoryLevel (video gaming)Execution unitInstance (computer science)Unit testingSubstitute goodParameter (computer programming)Product (business)CASE <Informatik>Statistical hypothesis testingFormal grammarCondition numberMereologyProcess (computing)outputNumberComputer animation
outputAverageHypothesisAbsolute valueStatistical hypothesis testingElectric generatorStatistical hypothesis testingUnicodeCompilerLambda calculusCodeDigital filterStrategy gameDesign by contractHypothesisPresentation of a groupLevel (video gaming)Electric generatorRight angleStatistical hypothesis testingCategory of beingCodeStatistical hypothesis testingProcess (computing)PreconditionerCodePattern languageMatching (graph theory)Maxima and minimaIntegerFilter <Stochastik>INTEGRALBound stateRandomizationoutputWritingRow (database)WebsiteParameter (computer programming)Library (computing)Power (physics)Functional (mathematics)String (computer science)XMLUMLComputer animation
Lambda calculusAbsolute valueGlass floatLoop (music)Invariant (mathematics)CurvatureCodeSquare numberHypothesisDesign by contractDemo (music)Strategy gameCASE <Informatik>Statistical hypothesis testingFunctional (mathematics)NumberStatement (computer science)ResultantControl flowPerfect groupPreconditionerCodeStatistical hypothesis testingCodeLibrary (computing)Boilerplate (text)CalculationApproximationSquare numberSingle-precision floating-point formatLine (geometry)outputParameter (computer programming)Bookmark (World Wide Web)RootComputer animationXML
Intrusion detection systemHypothesisGamma functionGEDCOMStatistical hypothesis testingStatistical hypothesis testingSuite (music)CASE <Informatik>Installation artoutputDirection (geometry)Functional (mathematics)Entire functionFlagStrategy gameDistribution (mathematics)Control flowHypothesisFunction (mathematics)Computer fileCodeCodeDesign by contractLine (geometry)
CodeComputer iconAbsolute valueVideo game consoleLambda calculusGlass floatSquare numberApproximationCurvatureOpen setAsynchronous Transfer ModeMachine visionDifferenz <Mathematik>Group actionComputer fileHypothesisCode refactoringLine (geometry)Loop (music)Integrated development environmentDesign by contractHypothesisPlug-in (computing)Visualization (computer graphics)CodeStatistical hypothesis testingCorrespondence (mathematics)Functional (mathematics)Suite (music)Computer configurationStrategy gameSource codeComputer animation
Interior (topology)TupleDigital filterLambda calculusIntegerInclusion mapFunctional (mathematics)HypothesisStrategy gameTheory of relativityCondition numberMultiplicationDesign by contractMultivariate AnalyseCodePreconditioneroutputData dictionaryComputer animation
HypothesisGEDCOMMaizeMach's principleLambda calculusMaxima and minimaSet (mathematics)MassWorld Wide Web ConsortiumLevel (video gaming)Order (biology)Link (knot theory)PurchasingSymbol tableFunctional (mathematics)Total S.A.Binary multiplieroutputString (computer science)MappingElectronic mailing listLoop (music)ImplementationIntrusion detection systemBitHypothesisLine (geometry)Demo (music)Physical systemWebsitePlug-in (computing)PreconditionerInformationDesign by contractQuicksortCondition numberSoftware maintenanceField (computer science)Error messageObject-oriented programmingReverse engineeringHeuristicTheoremLengthCodeRandomizationSet (mathematics)Object (grammar)Computer animationMeeting/InterviewSource code
Maxima and minimaDew pointMenu (computing)World Wide Web ConsortiumGamma functionApproximationProcess modelingKolmogorov complexityCodeHypothesisImplementationMathematical analysisBenchmarkSample (statistics)Coma BerenicesFuzzy logicCondition numberAreaFunctional (mathematics)Design by contractType theoryPreconditionerQuicksortDifferent (Kate Ryan album)Level (video gaming)RootImplementationSymbol tableStatistical hypothesis testingDecision tree learningGoodness of fitLine (geometry)Pairwise comparisonSquare numberMereologyApproximationSlide ruleHypothesisLimit (category theory)LogicOrder (biology)BitSheaf (mathematics)Physical systemPresentation of a groupComplex (psychology)Endliche ModelltheorieCodeBeta functionComputer programmingPropagatorCounterexampleSet (mathematics)Machine codeStatistical hypothesis testingMultiplication signPower (physics)BenchmarkMathematical analysisProjective planeTerm (mathematics)Computer animation
HypothesisThumbnailOnline chatNumberPerturbation theoryAuthorizationLibrary (computing)TouchscreenProduct (business)CodeGreatest elementComputer animationMeeting/Interview
Connected spaceStructural loadLibrary (computing)Repository (publishing)Design by contractMultiplication signLevel (video gaming)Meeting/InterviewComputer animation
Functional (mathematics)Patch (Unix)Design by contractQuicksort1 (number)Overhead (computing)BenchmarkMarginal distributionBitCodeLine (geometry)Meeting/InterviewComputer animation
ResultantDesign by contractSemiconductor memoryEndliche ModelltheorieDifferent (Kate Ryan album)PreconditionerRange (statistics)RootTheory of relativityMoment (mathematics)Absolute valuePauli exclusion principleTerm (mathematics)Physical systemMilitary baseCodeApproximationType theorySingle-precision floating-point formatCASE <Informatik>Software bugParameter (computer programming)Square numberoutputRepository (publishing)Core dumpProduct (business)BenchmarkComputer animationMeeting/Interview
Physical systemType theoryExpressionTerm (mathematics)QuicksortFunctional (mathematics)MathematicsMultiplication signControl flowBitMeeting/InterviewComputer animation
Transcript: English(auto-generated)
All right, folks, it's Friday. It's the last day of the conference. Don't be sad. We have the full day today. We have two days of sprints tomorrow and lots of fun still to be had.
I want to introduce three speakers for the first talk of Friday in Brian. The talk is titled Automatic Testing of Python Functions Based on Contracts. Very interesting subject. And we have here Marco, Philip, and Laurent.
I hope I pronounce your names not too badly. We will start with a video of Marco's presentation. He's here. He might be able to participate in the Q&A, but there are some technical issues. So we go for a pre-recorded video,
and then we will go to Laurent and then Philip. I believe this is the order for the live rest of the video, the presentation. So folks, you have 45 minutes. It's a very interesting topic.
I'm very excited to hear about it. And I will leave the floor to first the recorded video and then to the two of you. Thank you so much. Thank you, Philip, very much for the introduction. Now let's see what the contracts are. Think for a second what's a good function.
There are many qualities that the good function needs to fulfill. Efficiency, readability, maintainability, evolvability, and so on. But pretty high on the list is correctness.
A good function needs to be a correct one, and it needs to do what it should. If it doesn't do what it should, then it's not a correct, then it's a buggy function, and then probably also not a good function. Now if you look at this example where we have a function that takes two arguments,
we have an implementation that tells you how something works. But then you also have the specification of the behavior of a function, and that tells you what a function is doing. Now you need to specify this what
if you want to check how the implementation is correct. And our tools help you with these checks. Then you can add proper naming. Instead of do something, we call our function. Function approximates as qrt, and we also
name the arguments. There's a number, and there's precision. For many of you, this is probably already pretty clear what this function does. Now we can add the docstring. We can describe in human readable text what the function does. We can also specify the details. Here we say that the number, and the precision,
and the result are related in terms of the absolute difference. And the problem with naming and docstrings is that they are human readable text. They cannot be automatically checked. So this documentation tends to rot in large systems.
So you write a function, you implement it, then you go back to other tasks. Then you come back to the function. You change the implementation, but you often forget to change the documentation. For example, imagine here if you changed the absolute difference to a relative one.
Then the readers would still think that the function is computing the absolute difference, but then your function does something else. And the documentation is also ambiguous. This is one of the bigger problems. Here we did not specify whether the numbers can be negative, can be zero.
What about the precision? You can add type annotations. They already help a lot in Python. You can even add ranges to type up annotations. The problem with type annotations is that they cannot capture relations. So though you could add the ranges,
so for example, you can say numbers are non-negative, you cannot say that the result is related to the number and precision in terms of absolute difference, for example. Now you can write unit tests. You can pick a couple of input points and then assert that certain properties hold in the result.
Now the problem with unit tests is that we often tend to forget about edge cases. We only pick a couple of obvious input points. For example, I often forget or almost always forget to check for not to number.
And in this case, when you compute an approximation, not to number can even result, for example, in an endless loop. So alternative is to use property-based tests. Instead of picking only a couple of input points, you use a framework here. We use hypothesis.
You specify the whole input domain, and then you assert on the properties of the output. Maybe some of you saw also yesterday's talk with Zach Howard, who showed the framework in more detail.
Now the problem with property-based tests, of course, is that there's some learning curve. You need to use a framework, so you need to get familiar with it. And these strategies for defining and generating the inputs can become complex in certain cases.
Now another problem with property-based tests is that they only run in tests. They live in a separate model, separate from your code. So you cannot check properties during staging or production. And in our experience, a lot of bugs actually happen in these environments.
You catch some of the bugs in your tests, but oftentimes, it's actually the real users who will discover the bugs. Now you could write assertions. You write assertions at the beginning of the function,
and then you write assertions just before the end of the functions. These first assertions are called open preconditions, and the assertions just before you return are postconditions. So the assertion is good. They can be checked in production and staging, but they're hard to process by analysis tools.
You need to parse the whole body of the function, and you need to do some inference on the code to check, to figure out what your preconditions and postconditions are. You need to be careful about multiple returns. You need to repeat the assertions whenever
you return from the function. And when you have inheritance and when you deal with instance methods, that's when the assertions become really tedious because you have to be careful not to break the list of substitution principle. Now we present here our solution.
It's based on contracts. We use a contract library called iContract here, which introduces decorators for the function. So we write contracts here. We actually rewrite the assertions. We turn them contracts. There are preconditions. Here they're listed with a required decorator.
And we specify the postconditions here with ensure decorator. The preconditions need to hold before the function executes. These are the contracts that the caller of the function
needs to fulfill. And the postconditions need to hold after the function finishes. They need to be fulfilled by the function. So the nice thing about the contracts is that they live close to the code.
So when you change the implementation, you can immediately also change the contracts. They can be processed by analysis tools. Now we just need to parse the conditions. It's much easier than parsing the whole body. They can also run in staging and production.
They give you much better documentation because you can use a Sphinx plug-in, for example. And then all these conditions are also listed in your documentation. It's very practical when you, for example, write a library. They're formal.
So now if the reader does not need to think hard, a positive number, does it include the 0 or not? The contract is really clear. Number greater equals 0. So we used iContract.
The tools are based on iContract library. You can find the repository on GitHub. It provides a rich ecosystem. There's a plug-in for Sphinx. There's a plug-in also if you use fast APIs, you can include contracts in your REST endpoints. We collected a list of recipes to help you start it.
There's more to this ecosystem. But please note that there are also other libraries like dl and db contracts and others. So iContract is not a single library in Python. Please note at the end that you should
combine all these approaches. They're all complementary. Name your functions and your arguments properly. Use type annotations, write docstrings, write unit tests, write property-based tests. Also use contracts.
Use whatever you can to make your functions correct. And note that there is no solution that fits them all. So for example, oftentimes, a complete specification is not possible. So you can, for example, write a property-based test to cover some part of the input,
but some part of the input also cannot be formally expressed. You can write maybe only some contracts, so you don't have the 100% specification. But already these few contracts will reveal probably a lot of bugs in staging and production. So for example, even if you write an input must be positive,
whenever you pass in a negative input, you will reveal a bug. And oftentimes, there are bugs in the caller code as well. So do make sure that you also figure out something in the staging and production. And then to increase the coverage,
write unit tests for all those cases where you cannot formally express them and specify them. Thank you very much. And now Laureen will present iContract Hypothesis, a tool that infers the generative hypothesis strategies based on the contracts.
Laureen, the stage is all yours. All right. Thank you very much, Marco. So as Marco said, I'm Leven, and I'm going to present to you the first of two tools called iContract Hypothesis. So iContract Hypothesis is an integration
of iContract with Hypothesis, basically combining design by contract ID from iContract with property-based testing in Hypothesis. So for those who don't know Hypothesis, here is Hypothesis in a nutshell, or all you need to know for this presentation. Hypothesis is a property-based testing library in Python.
You can write parameterized tests and define data generators for it, all strategies in Hypothesis, which gives you an easy way of testing your functions against a lot of random inputs. So that's all you need to know for now.
There's, of course, much more on Hypothesis. But then you should attend one of Zach's talks or just go to the website. Why should you use something like iContract Hypothesis? It's because property-based testing is hard and tedious. Hypothesis already does a great job
of making it way more easier to write test records. And iContract Hypothesis will combine the power of Hypothesis with the power of iContract, so with integrating contracts in Hypothesis. So you can use iContract Hypothesis to generate property-based tests efficiently
and automatically. And although it does already a great job of assuring you of some properties in your codes, it does not cover all situations. So mind that you still need extra tests for your codes. What does iContract Hypothesis exactly do?
So you have your preconditions in iContract. You see them here on your left. And iContract Hypothesis will match common code patterns through Hypothesis strategies. For example, if you have bounds on an integer, then it matches those to an integer strategy with a minimum and maximum value. Or it can also describe patterns on strings, argument strings.
So we have some common code patterns that we can match. But there are still, of course, preconditions that cannot be matched. And in this case, we add them as filters. So here you can see, for example, a filter where you filter out all integers
modulo 2 have to be 0, so even numbers. So you have to pay attention if you have a lot of filters in your strategies or you have restrictive filters, because they may reject too much data. And at the end, you will end up with a test that does not work much because you don't have any data to test against.
So now we'll give a short demo of what you can do with High Contract Hypothesis. There are multiple ways in which you can use High Contract Hypothesis. And the first way is by using it as a library into your Python code. So we start here from the function
that Matt has shown previously, a person's favorite with your preconditions and your postconditions. And we will start with just writing a simple test in Hypothesis without taking into account any of the preconditions. So it takes the same arguments as our original function,
number and precision. And now we can just call our function approximate square roots because we have our postcondition and also an assert statement that will tell us if our test fails. Now, yes, the given decorator here, we specify the strategies in Hypothesis. So yeah, we just want a bunch of floats
both for number and precision. And with these three of lines, we have our test case. And we can just run the test case. And we immediately see the results if anything breaks our codes. Now, of course, we haven't taken into account preconditions. And in one of the preconditions,
we said we want a precision that's strictly greater than 0 because, of course, a perfect approximation cannot be calculated. So next, we're trying to make sure that all the inputs does satisfy our preconditions. The first way is a simple way is
by making sure that it just rejects all the inputs that doesn't satisfy our preconditions. So it's called assume preconditions. It's basically a function that you can generate with our contract hypothesis. And what it will do, it will just reject all the inputs that do not satisfy our preconditions.
And then we get already better results. As you can see, our function apparently doesn't handle big numbers very well. But although it's already an all right solution, it's still very inefficient. And we still need some boilerplate codes
to write the test case. But our contract hypothesis does allow us to make it way more easier by just having a single line of codes where it will use a more efficient way of testing it by inferring a strategy. So tested inferred strategy will infer the strategy
for approximate square roots and then test the function against the inputs. So as you can see here, we already have a different output, a falsifying example. And that's because, of course, under the root, it's different from the previous solution.
So now we have tested our codes in first strategy. But of course, most of the time, we want to know what was the strategy that was used. So we can also print the strategy that was used to test the function. So here you can see the two arguments, number of positions
in the test strategy. So this is the first way you can use it as a library in Python. The next way is a comment line tool. So you can just install py contract hypothesis.
And the first way you can use this tool is to test your codes directly. So you have the test function. And then you specify the file you want to test. Here is zero Python.py, which includes the same function as before. And as you can see, we get the same output as before as well.
So it's just going to test your code against the inferred strategies. And then you can get an overview of what input breaks your codes. Also, if you had a comment line, you can inspect which strategy was used. Here, it tells you the strategy. And you can also see where it was used for which function.
The next functionality of a comment hypothesis in the comment line is the Ghost Writer. It's very similar to Hypothesis Ghost Writer. And what it does is it writes entire test suits for you. So it's basically the easiest way to write tests.
It manages the imports. And then you get a class, a new test case, with here the test for your function. So if you have multiple functions in your file, you would also get multiple test cases. There are some flags you can pass to the Ghost Writer.
The first one is if you want to see which strategies that were used. So you have the explicit flag. And now you get a more robust overview and a more robust test case. And if you want, you can modify it to your needs.
If you just want a test case without all the stuff around it, then you have the bare flag. So that was the comment line, the second way. And the third and final way, how you can use a contract hypothesis is in your IDE. So there already exist plugins for the PyCharm IDE,
for Visual Studio Code, and for VIM. And it's very easy to use. Just I will show you how I use it in PyCharm. So you can just install it in PyCharm, go to the plugins, look for a contract hypothesis PyCharm. And what you get then is,
so we have the same function as before. Just right-click on your function, and here you get an overview of what's possible. So it's the same functionality as before, but now very accessible in PyCharm. So the first way is just test your function. When you're developing, you can quickly,
just run your test and see what's going on. The next way is to see what strategy corresponds to your function. Third way is to use a ghost tracker. So you have a test case, a test suite in your prompt. And then the last option is to directly write
your test suite to a file, and then you have an external test suite that you can modify, that you can use. So this is one of the easiest ways to quickly test your code to a contract hypothesis. So that's how you can use a contract hypothesis.
To end my part, just a look into the future of a contract hypothesis, or what we envision as a future of a contract hypothesis. So right now, if you have a function where you have multiple variables, and define relations between them, then you get a piece of code that's not very readable
and not efficient. So yeah, you need to use fixed dictionaries that you filter, and just put the whole condition, a whole precondition into a filter. And this will often lead to very inefficient strategies that are not very readable.
So what we would like to end up with are composite strategies. This is a functionality in hypothesis, where you can define your strategies procedurally. So it makes it possible to define relations between multivariable conditions.
It's more efficient because you often don't need a filter, and it also makes it more readable. Here you can quickly see, you have the two input arguments, and you can just read it, and you know immediately what is going on. So that was eye contract hypothesis.
And then I will now give the stage to Philip, who will introduce his tool, Crosshair. Thank you very much. Great, thank you, Lauren, and Marco. So I actually cover three topics. The first is Crosshair, which is a tool
that's quite similar to eye contract hypothesis. I will also talk a little bit about a corpus that we've been developing for contracts, and I'll sort of field Q&A. So firstly, Crosshair, I am the primary maintainer for Crosshair, and Crosshair is a tool to check contracts
but it works a little bit differently than eye contract hypothesis. It works with symbolic execution. So hypothesis is a tool that will apply sort of random heuristic inputs, but Crosshair attempts to verify your contract in a more formal way. So we use a theorem prover to reason
about all possible values when we run your code. To be a little bit more specific, we find one concrete path for your code and reason about all values along that path, and then we'll try other paths. So it's concrete paths but with symbolic values. I won't go into too much about how symbolic execution works if you don't already know much about it, that's fine.
If you'd like to learn more, there's some links and information about how Crosshair works in more detail at the website. But today, I am going to mostly just get into the demo. I'm going to attempt a completely live demo for us today.
And so I'm gonna describe a potential problem. And so in this problem, we are implementing a online shopping system. And so the objective is to compute a total price for your online shopping order. And the inputs to such a function
are a set of items that the user is buying. So this is a list of line items. The line items each have an item ID for the item that you're purchasing and quantity. In order to compute a total, we'll also need prices for all of those items. So this is a dictionary that maps a string
which is intended to be the item ID of the line item into a float price. Of course, don't use floats for monetary units but in this example, we'll do this. And then we will return a total. So the implementation for this function is fairly straightforward. We start with an empty total, loop over the line items.
We add the price for each item and multiply by the quantity and return the term. Now, one thing we would like to ensure that this, that happens for our online system is that nobody should be able to check out
with a total of zero. Now, you can already imagine many ways in which you can pass inputs that would not meet this. And we could start implementing preconditions via the required decorator to ensure these but one of the nice things about tooling
and like crosshair and hypothesis is that we can use these tools to just sort of tell us what those preconditions are. So there is a, I'm using PyCharm. There's a PyCharm plugin for crosshair. And so we can sort of continuously run crosshair
towards this watch command. And so this will sit here and think about the function and try to find input that will invalidate the post condition. So we already have one here. And so we're gonna go through these one by one and start adding preconditions to see if we can make crosshair happy.
So one thing it's noticed is that there's a key error. So here are item IDs, empty strings, a little weird MPI string, but, oops. Okay, this does update in live. So we maybe have a, maybe we'll deal with this one first. One is that prices dictionary has a zero price
for this item ID. And so one thing we should ensure that is that prices always has a, always has positive prices. If you're like me, write your list comprehensions in reverse.
So for all of the prices and values, we want to ensure that the price is greater than zero. Oops, great, so we got that one. But here we see items could be empty.
So we should also require that the items, the length of those things should be greater than zero. Let's see if that makes us happy. Not quite. Okay, here we don't have a false, oops.
Okay, here's a false. And in this case, we have positive price. We have item IDs that match, but the quantities are zero. So quantities should be greater than zero. Again, we need an i for i in items.
What do we need to ensure? Well, i.item ID should be in the prices section. Oops, I forgot to add prices to my inputs to the precondition, and maybe we did it.
Nope, not quite. Let's see, item ID, quantity, price is zero. Ah, price is five. Oh, oh, the quantities are zero. So we need positive quantities.
I think now we've gotten it.
So one thing about this example is we didn't actually find any problem with compute total. So we did a whole lot of work, but we didn't find any bugs. And this is sort of a different way of thinking about contracts and useful in a different way as well.
So one of the things this does for you is it makes a lot of your assumptions sort of explicit. And by adding all these contracts to this function, all the functions that call it will have to meet these requirements. And that in turn may require more preconditions on those functions. And so these conditions spread throughout your system
in much the same way that, say when you change the type of an argument, you have to change it in a bunch of other places. So that may seem like work, but it's actually fairly important in making a lot of important things explicit. So an example here, like just to go through these one by one, we would like to ensure that when you change the quantity of a line item to zero,
that we remove it from the cart. We would like to ensure that you can't add something to your cart for which we don't have a cart. We should also make sure that you cannot get to the checkout part of the system if you don't have any items in your cart. And finally, if you're parsing
some third party price feed, you should validate that all of the prices are greater than zero. And so all of these conditions sort of propagate these requirements throughout your system. This is really sort of a beautiful thing about using contracts generally. Okay, so that is this example.
I'm gonna stop crosshair. And I'm briefly gonna show also the approximate square root function that we've been taught using elsewhere. And so we can watch this guy instead. I just pulled Newton's method for approximating square roots. And you can see here also,
we find a counter example and that's because the function I pulled uses less than or equal to, whereas our condition wanted a less than position. So we can just correct that in our implementation and crosshair gets happier.
Okay, so let's stop that. And let me flip back to the presentation. I do wanna talk a little bit about the limitations of crosshair. So the first is that crosshair is probably in a beta quality situation. It does not symbolically model everything into Python.
Basically what's required in order to make crosshair work is that we have to logic, sort of implement it in a formal logic way, everything that's implemented in C. And so we have a lot of it, but it's not complete. And so there's a lot of parts of Python where crosshair is not that effective simply
because we haven't implemented it. Also because of the way it works with the solver, as code complexity increases, it may increasingly have trouble finding or solving the problems that it needs to solve. And then finally, there are some limitations
that are built into the solvers that we use. We use a thing called SMT solvers and there's certain problems that it doesn't handle very well. So nonlinear arithmetic in general is undefinable. And so to make that concrete, perhaps an easy example that crosshair cannot find a counter example for it,
this one where we say powers of two, powers of two should all be small, but of course we all know many powers of two are very big. It's very easy to find a big power of two. You can put even about any value in here for X and non-trivial positive value and you get a very big result, but crosshair doesn't tell you.
I contract hypothesis would, for example, because you just have to try some values and it's easy to find them. So those are some of the limitations of the system. I wanna talk about some related tools. So right down the center column here, we have some other tools that work very similar to crosshair.
Many of these are based on research papers and such. I believe crosshair is the most feature complete in terms of implementing as much of Python as possible, but you may also be interested in these tools. They're more toolkit like and crosshair is a little bit more product like.
So depending on what you're looking for, some of those other projects may be the right thing to look for. On the left side, we have some other tools that just other ways of analyzing Python programs that don't use symbolic execution hypothesis. Fuzz testing is a huge active area right now where mostly security-based stuff, but a lot of fuzz testing can,
those tactics can be used to demonstrate correctness as well. On the right side, we have some symbolic execution tools, which are not analyzing Python. Anger and QUI are examples of this. It can be a little confusing because anger is itself implemented in Python, but it analyzes binary. So both of these tools analyze machine code
or analyzes code that's at the machine code level rather than at the Python code level. So crosshair is looking at your Python code and understands Python. So that's some of the difference with this tool. Okay, so that's crosshair. I'm gonna spend a little time talking about
our Python by contract purpose now. So the Python by contract corpus is a set of problem solutions and contracts. And right now we have the 2020 Advent of Code problems as well as the fall 2019 exercises from introductory programming at ETH 3rd.
Both contain good and buggy implementations. So why would we do this? Well, corpus can be used to benchmark and compare contract analysis tools such as eye contract hypothesis and crosshair. And so that's why we've built it. There are also good examples for education. So if you're interested in contracting
and want to see some examples, you can use the support. We also strongly welcome contributions because more code means that we can do more testing of our tools and improve them. So those are welcome, and you can find instructions at this link.
By the way, the slide deck is linked in the top notes in the online system. Okay, and so that's it for my sections. And now we'll move on to some Q&A. I think perhaps all three of us are here online. All right, well, thank you very much.
It was a fantastic, fantastic presentation. I really enjoyed it. It's a hot topic and it's also reflected in the chat. I see lots of questions flying by. So I'll do my best
and ask you the most upvoted question. So the one against the highest number of thumbs up, little hearts, that kind of stuff. And we'll start with one immediately. I'll put it here at the bottom of the screen. I'll try to summarize it. So people are asking, you know, this is fantastic,
but can we have an, or is there an easy switch, right? When you put your code into production to maybe disable some of this, some of these checks or all of them entirely.
So Marco is actually the author of the iContact library. There is a way to disable it. I don't know if he wants to talk in more detail about that or whether your connection is good enough Marco. Yeah, let's try. If it breaks down, you can talk to the end. So yeah, there is a switch in the decorator.
You can, at the load time of the module, you can turn off some of the contracts or all of them or whatever way you configure it. So it can be really fine-grained. You can control that at a very fine-grained level. Sometimes at load time, you can also use implications like an or, not and, or,
if you want to have that at runtime as well. So there are ways. They're also listed in recipes on iContact repository. Cool, excellent. Another question related to maybe some of this,
this very topic, I guess, somehow, and I will put it here, let's see. So a lot of us are using a lot of other libraries that we don't write ourselves and obviously don't have contracts. So, you know, what about those? Can you easily, I don't know,
maybe add contracts outside of these libraries? You know, do you have any comment on that? I don't know whether Marco has any thoughts about this, but I don't know any way to do this.
So you can, of course, wrap the functions that you need with your own functions and apply contracts to those, but I'm not aware of a way in iContract to just apply, well, Tatsu, you could do some fancy Python things
where you just sort of monkey patch the functions with ones that are annotated. So at least that might be possible. I don't know if we'd recommend it. Yeah, yeah, yeah, I was thinking more or less along the same lines. Anybody else wants to mention anything on this topic?
Cool, another question that flew by is what about performance implications? I mean, I imagine all these checks have some performance hit. Do you have a little bit of an idea of how much we can expect?
So as Marco said, there's- Is this on iContract or anything? Go ahead, Marco.
Sorry, guys. So on the repository, we also present benchmarks. So there are basically no, there's no efficiency penalty if you turn the contracts off. Of course, you need to pay for what you're using. So if you turn the contracts on, they will run as fast as they run in Python code.
And then there's some margin overhead due to decoration. So you have this, but it's also due to how the Python works. So assertions are the fastest way if you just want to use assertions, but then you pay some marginal effort for the decoration and that's it.
We present benchmarks on the repository so people can check them out. In most practical settings, like in all the production systems I worked with, for example, we didn't really observe any significant overheads so that we had to turn the contracts off.
We usually turn off the post conditions, the ensure decorators, but we leave the preconditions on to catch the bugs in the core code. Fantastic. Anybody else wants to comment on this? If not, we have lots of questions. We don't have a huge amount of time,
but some of the questions are really, actually all of the questions are fantastic. I'll show another one here with a little sad face. What about Python 2? Some of us have large code bases still written in Python 2
and the porting, as you know, takes time and effort. Any laughs for Python 2 or should we really go to Python 3? I think all tools need at least Python 3.6 even.
Or I think iContract might work with 3.5, but yeah, I think it's 3.6 at the moment for all the tools. Yeah, makes sense. Makes sense. Cool. Couple more questions.
Yes, this one is quite interesting. Do you folks have any thought about this new PEP, relatively new PEP 647 about type guards for Python 3.10? I don't know if you're familiar with that. Yes, so the type guards,
if I remember correctly, it's that you can add value ranges on the variables, for example, on input arguments, right? I hope I'm not mixing it up now. They are okay for single arguments, but they cannot model the relations.
So for example, if you take the approximate square root example, there you cannot say that the result is the absolute difference from the inputs, right? The input squared is the result up to the precision in absolute terms and not in relative, et cetera. So I think they are serving like different use cases.
The type guards are more looking at the single arguments, whereas contracts are really good when you have relations in the arguments or even relations to the external code. So for example, if you have a singleton somewhere, et cetera, I would say that the contracts are just more powerful. So there was also discussion in Python ideas,
may list introduced contracts in Python natively, but not fruitful yet. Yeah, I had to refresh my memory on this PEP, but I am fairly certain that this one, it helps you narrow types. And so the expressiveness
of the typing system isn't changing. It helps you sort of like, say when certain functions are producing things of certain types, you can do that more effectively, but it's not changing the expressivity of the type system itself. Whereas in terms of contracts, we're saying,
you can say arbitrary things about your data, which I believe is not true for that. Excellent, fantastic, thank you. We are just at the break. So there were a couple of more questions. I will copy them into the breakout room for Brian.
So if you folks are hanging out a little bit longer, you can keep the discussion going into breakout Brian. And thank you again for the talk. It was a fantastic talk, very, very interesting. And lots of people enjoyed it.
Lots of discussions, very cool stuff. Thank you for taking the time.