Test-driven code search and reuse coming to Python with pytest-nodev
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 164 | |
Number of Parts | 169 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/21084 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Information managementCodeSoftware testingPlastikkartePhysical lawSoftware testingSearch engine (computing)Functional (mathematics)Self-organizationSocial classSystem callData structureResultantMultiplication signElectronic mailing listArithmetic meanQuery languageVirtual machineLatent heatMereologyCore dumpData storage deviceCodeUniqueness quantificationImplementationPlug-in (computing)Computer animation
03:04
Software testingIntegrated development environmentMultiplication signSet (mathematics)Function (mathematics)Video gameDialectMountain passSCSIManufacturing execution systemLarge eddy simulationEwe languageObject (grammar)Roundness (object)Library (computing)Functional (mathematics)Multiplication signResultantMereologyException handlingCASE <Informatik>Software testingSocial classSearch engine (computing)Distribution (mathematics)Perfect groupWindowStandard deviationIntegrated development environmentWave packet1 (number)Sheaf (mathematics)Web 2.0ImplementationEvent horizonDigitizingBeta functionLatent heatTraffic reportingCodeVirtual realityBinary fileVariable (mathematics)String (computer science)System callComputer animation
08:58
FlagDecision tree learningSoftware testingMaxima and minimaSystem callPhysical lawLemma (mathematics)Video game19 (number)Execution unitSet (mathematics)Interior (topology)Connected spaceSoftware bugFunctional (mathematics)Standard errorMultiplication signBitNumberDistribution (mathematics)Data managementSocial classIntegrated development environmentSoftware testingException handlingParameter (computer programming)Object (grammar)CuboidComputer fileBackupLimit (category theory)Order (biology)Virtual machineState of matterCASE <Informatik>Resultant2 (number)RandomizationProduct (business)Roundness (object)DivisorComputer animation
11:49
Real numberDegree (graph theory)WindowCodeFunctional (mathematics)Computer fileLibrary (computing)Independence (probability theory)Template (C++)Computer animationSource code
13:08
Set (mathematics)Pointer (computer programming)Maxima and minimaAerodynamicsOperator (mathematics)Software testingWritingEwe languageCore dumpParsingFunction (mathematics)Token ringFunctional (mathematics)Projective planeLibrary (computing)ImplementationSocial classOperator (mathematics)Right angleToken ringComputer-assisted translationCondition numberPoint (geometry)Streaming mediaPresentation of a groupoutputTrailCASE <Informatik>Game theoryProduct (business)Copyright infringementCodeReal numberSoftware testingResultantElement (mathematics)TheoryEndliche ModelltheorieObservational studyString (computer science)Position operatorNatural numberElectronic mailing listLatent heatWritingMobile appModule (mathematics)Source codeComputer animation
18:24
Software testingScalable Coherent InterfaceInterior (topology)WebsiteGamma functionMiniDiscGrand Unified TheoryEmulationEwe languageEstimationLie groupIntegrated development environmentSpacetimeImplementationException handlingStandard deviationOperator (mathematics)Different (Kate Ryan album)Line (geometry)MereologyComputer animation
19:44
Graphical user interfaceInfinityExecution unitScalable Coherent InterfacePolar coordinate systemMultiplication signBitPartition (number theory)ParsingFunctional (mathematics)Software testingCalculusObject (grammar)Proxy serverCategory of beingComputer animation
20:53
System callNewton's law of universal gravitationSet (mathematics)19 (number)CAN busComputer fileLimit (category theory)Software bugComputer-assisted translationMetropolitan area networkDemosceneSoftware engineeringConvex hull9 (number)EmulationAnnulus (mathematics)Mountain passLocal ringInformation managementPointer (computer programming)3 (number)Wechselseitige InformationResultantCondition numberUniform resource locatorFunctional (mathematics)ParsingLibrary (computing)Social classAreaUniqueness quantificationBitTouchscreenElectronic mailing listCASE <Informatik>Position operatorMereologyNumbering schemeArithmetic meanComputer animation
22:35
World Wide Web ConsortiumCAN busSimilarity (geometry)ImplementationType theoryStaff (military)Task (computing)Point (geometry)Streaming mediaSocial classSoftware testingInternet service providerComputer animation
23:26
Fatou-MengeMultiplicationMenu (computing)IRIS-TLibrary (computing)Functional (mathematics)Software testingCodeStandard deviationComputer animation
23:58
Arc (geometry)Travelling salesman problemPointer (computer programming)Execution unitMaxima and minimaProxy serverObject (grammar)Product (business)Social classRight angleFunctional (mathematics)Matching (graph theory)Numbering schemeIntegrated development environmentMultiplication signMereologyComputer animation
24:45
Generic programmingInterior (topology)Lambda calculusMIDILipschitz-StetigkeitExt functorData Encryption StandardParameter (computer programming)Point (geometry)MereologyElectronic mailing listLine (geometry)Streaming mediaConfiguration spaceFunctional (mathematics)Different (Kate Ryan album)Computer fileShape (magazine)NumberResultantCartesian coordinate systemCASE <Informatik>Software testingVideo gameData streamOcean currentAdditionComputer animation
26:21
3 (number)Streaming mediaRegulärer Ausdruck <Textverarbeitung>Functional (mathematics)Multiplication signComputer configurationComputer animation
27:22
Mathematical singularityCausalitySoftware testingPersonal digital assistantFocus (optics)Drop (liquid)CodeProbability density functionWorld Wide Web ConsortiumHypothesisObject (grammar)System programmingSoftwareCASE <Informatik>Execution unitIndependence (probability theory)ImplementationSoftware developerData Encryption StandardTest-driven developmentSocial classFunction (mathematics)Computer configurationQuery languageMountain passExtension (kinesiology)Search engine (computing)Beta functionOpen setBootingMalwareNewton's law of universal gravitationFingerprintAsynchronous Transfer ModeEmulationRegulärer Ausdruck <Textverarbeitung>Software development kitNetwork topologyCartesian coordinate systemFunctional (mathematics)CodeNumberParameter (computer programming)CASE <Informatik>Object (grammar)Line (geometry)Normal (geometry)SpacetimeIntegrated development environmentAdditionVirtual machineSoftware testingWeb 2.0Spectrum (functional analysis)ResultantSubject indexingTask (computing)ImplementationValidity (statistics)Theory of relativityRow (database)Extension (kinesiology)Computer configurationMultiplication signSearch engine (computing)PlotterSocial classLogic gateVector spacePoint (geometry)Well-formed formulaProjective planeArithmetic meanOrder (biology)WordHypermediaEmailBasis <Mathematik>Universe (mathematics)Process (computing)Machine codeComplex (psychology)Set (mathematics)TheoryStudent's t-testGradientUnit testingMixed realityQuery languageInterpreter (computing)PermutationBitJava appletLimit (category theory)Computer animation
36:01
Lipschitz-StetigkeitGeneric programmingLambda calculusIntegrated development environmentSoftware testingFunctional (mathematics)Parameter (computer programming)Complex (psychology)Element (mathematics)Right angleVector graphicsComputer animation
Transcript: English(auto-generated)
00:02
Okay, welcome to this talk. The speaker is Alessandro Ameche, a good developer, I think, yeah, he's a friend. So he will explain some stuff about Pytest, and so welcome.
00:23
So this talk is about test-driven code search, and this is a rather new technique, not so new because someone already tried it a few years ago with Java, but it's the first time that I see it applied to Python.
00:43
The idea is pretty simple. What we produced, what we did, was a very basic search engine. It's a Pytest nodev, it's a Pytest plugin, that enables you to search for code inside
01:04
your machine, on packages that you have installed on your local machine. The special thing about this, the test-driven search, is that you use a test as part of the search query.
01:21
So you may use some metadata and try to refine your search, but at the core, what you are looking for is what you describe within a test. We call it a specification test.
01:41
There is something that tries to specify a behavior or a feature without going too much into the details on how it is implemented. Once you run your search engine, you will get some search results. So this is a list of functions or classes, or whatever object actually, that passes the
02:06
specification test. The documentation, the core of the tool, is the Pytest nodev plugin, and there you have the main documentation, but there are a couple of other tools that I will show
02:26
you during the talk. Now since this is something new, at the beginning I organized this talk to be somehow theoretical, but then I completely rewrote it yesterday, because I think really good examples make
02:46
people understand much faster. How it works? Do people here know unique Pytest and Pytest fixtures? Who does?
03:01
Okay. Now, basically, the base implementation detail is that the plugin provides a special fixture that's called candidate, and you need to use this fixture when you write a test that you want to use to search for code.
03:23
What will happen is that the fixture will effectively parameterize your test by passing it all the objects that it will manage to find in your environment. So if you install 10 packages in the virtual environment together with Pytest, it will
03:48
collect all the objects, all the live objects, in your standard library and in all the packages that you installed. Then obviously, since this will be a parameterized test, the test will be run every thousandth
04:07
of time, most probably, and once for every object, and the object will be passed, referenced to the object, will be passed into the candidate variable. So you basically will use this candidate as if it was the function that you are looking
04:23
for, and then the search engine will just tell you which functions, classes, or objects in general actually appear to behave exactly as you intended.
04:42
So let's do our first search. You want to search for some kind of a function that has a feature, for example, let's search for a function that, given the name of an executable, returns the path to it.
05:00
This is not just a nice example, this is actually the first real case that we have. We had exactly this need, and we started searching for it on the web. We didn't like the results, and we say, okay, no, this is the perfect test case,
05:20
because it's something easy, it's easy to write a test for it, and maybe there is something somewhere in my environment already that does it. Obviously, you could just write something like a sub-process call to which, and then
05:44
parse the result, etc. That would be hacky, and it will not work on Windows, so it's not the best. So, what is a specification test? I write a standard test function for PayTest, I use the candidate fixture, then I, just
06:06
as a tool to have the test more readable, I basically rename the candidate to which, which is more or less the idea that I'm looking for something that works like the which command in the standard library.
06:21
So, then I assert the behavior that I expect. If I ask sh, the shell, I want to, my function, the function I'm looking for should return bin sh, and if I pass it the string env, it should return usr bin env.
06:44
These two are two very common Unix commands, and they are the one among the most stable because a lot of commands can be in usr bin, or in bin, or in spin, but these two are the more common.
07:02
So, once I have written this test, I write it to a file, and then I just run it as usual, with PayTest as usual, just I need to add candidate from all. This means that the candidate function, the candidate fixture will be parameterized by
07:25
everything I find in the, in my environment. So, this starts a standard test session, and I get usually something like 5,000, 6,000
07:42
objects. This depends very much on how many, how many packages you have installed. This is not many. It's easy to go into the 30,000 or 50,000. And then, it just ran for a while, we will see in a minute, and since the test is expected
08:03
to fail, the PayTest will print a small x when the test failed, I mean, you are throwing basically random functions to the test, so you expect it to fail most of the time, but then you have capital Xs, which means that the test passed in, it was not expected,
08:24
but it passed. At the end of the run, you have many, many Xs. What you expect to do is to have a result, and in this case, we found three functions, three objects that passed the test, and this is the report.
08:42
So, for my test, which file, we found a case in which the test, which function passed, and this is the executable. Now, I have the test function as well, and let's see how it works, how much time
09:09
it takes. So, right now, I'm not using PurePyTest, I'm using kind of a boxed run on Pytex inside
09:27
the Docker container, because when you throw random arguments to random functions, anything can happen, so if you try to do it on your machine, you will find backup files with crazy names, or probably connections to run ghosts, or whatever, so you prefer
09:47
to do it in Docker, and at the end of the run, you throw away your Docker environment. So, what happens is that right now, it's collecting all the objects, now I have a
10:04
little bit less objects than when I did the test, because I add blacklist objects all the time, because they might crash your environment, or, I don't know, open up a browser, et cetera, and this is what happens.
10:21
Now, that test is running with all the functions. We see a small x, it means that we didn't find n much, but here, we have one x, so this is one of the, we found at least one function that actually worked.
10:44
This takes approximately 60 seconds, if everything goes okay, and now we should also have some garbage on the screen, because since you are using functions and classes in unexpected ways, you're always throwing random stuff to it, exactly, you end up discovering a lot
11:06
of bugs in the package that you have, because most of the printouts is exception in the Dell method, that are ignored, but printed to the standard error.
11:26
Well, I'm finished. So this is the, now, what happens once you get the result? You say, okay, I have tested past that very easy, very basic test, and what do I do?
11:43
Well, since I have a manageable number of results, I can just have a look at them, and decide if this is really what I want. This is the, sorry, the, these two details spawn find executable, the name looks like
12:06
what we are interested in, and this is inside the standard library, so it's very useful. Maybe I don't need to write any code for my find executable, for my which function,
12:23
because I may just use this one. You see, that's more or less what I thought, it gets path, then it splits somehow in OS independence way, then it does some win32 checks that I even didn't think I needed, because I don't use Windows usually, but yes, might be useful, and then it just tries
12:48
to see if the file exists inside the path. It's not really the best. I mean, is file, it doesn't check if the file is an executable.
13:02
So it's not really perfect, but at least I have a template if I want to improve on that. Then I have pxpect uphill switch. I don't care too much about that, because I already have a function in the standard library, so I don't need to add a dependency to my project if I want to use that.
13:23
But then they have shdwitch, whoa, this is even more standard in the standard library, and this is the code, and if you have to look at the code, it's much, much more complex than it has any real access check, that means it checks that you can read it, that you
13:43
can read the file, and you can execute it, and it has several details that I would not have thought, it would have taken me one year of production to get right. So very nice.
14:00
Unfortunately, if you go into documentation, you learn that this is only Python 3 only, actually Python 3.3. So if your code, if your use case needed to work in Python 2 and 3 and you can get to a very nice find executable, it's still in the standard library, it's not as nice,
14:20
but okay, or maybe you can just take it as template and get it better, or if you're Python 3 only, you have the luxury to use witch, witches. Well, how many of you already know the witch function or how to solve this problem?
14:42
Okay, a few, right? I mean, it's in the standard library, but I mean, I didn't know it, and it was faster this way than to look for it. Okay, let's go back. Now, this is a very simple example, but it also shows how things work.
15:05
Now, one of the points is that in this case, input and output of the function were really easy. When you have something where the reasonable implementation is really easy, it's easy
15:21
to write tests, but as soon as you look for more complex stuff, writing a test that is somehow implementation agnostic, that doesn't make too many assumptions of the implementation is complicated. It's more complicated, but actually Python is really great to write stuff that it's not too tied to the implementation, to the details of the implementation, because it
15:44
has dynamic nature. For example, using duct typing, you are not forced to guess the right data type. The in operator is extremely powerful, and a lot of classes even work nicely with the
16:04
in operator. That is, instead of looking if the result of your function is a list, and the first element of a list is what you were looking for, you just use the in operator to see if somewhere inside your function, inside your result, there is what you expected it
16:22
to be. And then you may write specific helper, in particular, we wrote the node app specs that helps you, that leverages the InSpec module to go even deeper into the search of where
16:41
if your result actually contains what you expected it to contain, even if in crazy ways. So, let's see how you would write a specification test in a way that tries to be
17:02
more independent from the implementation. Here, I want to parse an NFC3986 URI. This is also a real test, a real case, and so I use the candidate fixture, I just
17:23
rename it for, so I read it nicer, I use the test URI, and then I get my all the functions that I will get, will be passed this URI, and I expect it to return some kind of tokens, and then here I will check if the schema and the path that I put
17:51
in my URI are correctly parsed. Now, since there are a lot of false positives that are just strings, I mean, just functions
18:07
that return the same string as the inputs, I check that the return of my functions is not a string. I don't care, I really want the string to be divided into tokens, so I don't want one string, I want some kind of list of strings.
18:26
So, let's see how it goes. This is the naive implementation in the sense that I didn't use any special trick except Python standard IN operator that is overloading, etc.
18:45
Now, this is going to run, come on, and usually I have different command lines that can be passed, and those mostly need to restrict the search space.
19:03
If you already know that some packages are not useful, you want to restrict the search space so you get faster. But this one, candidates from all, it's the more powerful, it's just search for everything and anything in your environment. So, this is where, obviously, I tested just before the, let's see, I have a second
19:52
run. Now, let's see, since it takes a little bit of time to run, I also tried the second
20:06
example that is the same parsing function that is instead tested, the test is written using some advanced functionalities. The disflect container, you know that specs generic, it gets an object and it's a proxy
20:28
object that when you use the IN function, it tries really hard to see if the item that you're looking for is somewhere in the object. So, for example, it looks into the attributes, into the properties, even if it's an iterable,
20:46
it looks inside every item inside the iterable, so it's extremely thorough. Now, let's see if we manage to not get the kill.
21:06
So, okay, apparently they're both running. So, on this screen, I have the naive test, the one that rushed before.
21:21
It was some kind of race condition because it's going okay now. And now let's see what the results are. Okay, I got several results. Now, the first three results in collections doesn't look very good because keysView, chainmap,
21:43
userString really look like false positives, that is, they're not trying to do anything with RFC or URL parsing, but it's just they're packaging somehow this thing that you are giving them. But then you have this RFC 3986 API URI reference that looks very nice, but also the URL parse,
22:08
URL parse, that means you hit a function that are able to do this both in a package and also in the standard library.
22:21
Now, what is interesting is that in both cases, both URL, the URL parse inside the RFC 3988 packet and also the one in the standard library, they don't return lists, they return classes. So, how exactly this worked with a class?
22:45
The point is that a lot of people are quite smart and they give you some way to get to access stuff or to test stuff in an implementation independent way. That is, the two implementations that they used actually provide __contains__method
23:08
that tests exactly if they manage to find pieces that test exactly like a string, like a, sorry, a couple or a list. So, it's not a simple type, but it's a class that behaves like a type.
23:26
Very nice. So, you can happily use this one for most of your need, but if you need more features, if you may explore the code and you see, this special package has more features.
23:48
For example, it's able to recognize the username which the standard library function doesn't. Now, there is something even more interesting. The other test, the one that uses a dedicated proxy object to do the containment test,
24:10
has found one more object that matches. And this is a class in the PIP product that actually does the right thing,
24:26
but doesn't provide the nice containment helper functionality. So, we managed to get it as well because the helper function tried very hard to find if the postgres and the path, the scheme and the path were inside the class.
24:47
So, this is a way to get, to test results in an implementation independent way.
25:05
Then, I want also to pass arguments in the implementation independent way. So, in this case, what helps me is the parameterized marker of pytest. For example, in this case, I'm looking for a function that just removes comments from a stream.
25:32
And the main point is how do I represent the stream? Because this is my text, this is the readout of my configuration file, for example,
25:44
and I want to strip these comments here. So, how do I do? I use a parameterized parameter argument so that I can say, okay, different functions that will make this comment into different shapes.
26:04
I can pass it as it is, I can pass it as a list of individual lines, or I can pass it as a list of individual lines with numbers. This is how my application actually was doing this part. Or I can pass it as a file.
26:23
Now, in this case, since I have a lot of parameters, I will run not just 5,000 times, but this will run 20,000 times.
26:42
So, I prefer to restrict my search by just including any function whose name matches this regular expression. So, I want something that has to do with comments.
27:00
This makes everything much, much faster. And here it is. So, I find an ignore comments function in pip. Very good, because pip is something that I might assume it's a light dependency. And this tells me that the text to stream that passed is the third one.
27:24
So, I go back here. It's 0, 1, 2. So, this is the way, which was exactly the way I preferred. I could have worked with all the other trees,
27:40
but this means that I don't even need to change my application to use that function. So, by the way, extremely fast. This is the ignore comments. It's very simple, and it has also the feature that skips the line if it's empty, and this is the reason it also returns the line number,
28:01
because it doesn't return all the lines that you pass. And look at the other function just below. This function takes options, which is a special class, and this class must have the skip requirements regex, otherwise it crashes. Oh God, even if I needed this,
28:22
I would never ever manage to pass the correct parameters to it, because these parameters are extremely tied to the implementation. So, I do tests that are quite loosely coupled with the implementation. I try to be as implementation agnostic as possible,
28:42
but I only find functions, callables, or classes that are good code, but they don't mix implementation details uselessly. You could just have this skip requirements regex,
29:02
which has just been a keyword argument with the same default, and the function would be as useful, and I would have been able to search for it, or to use it in general. So, when you search, you may get all the relevant results,
29:22
which means your query is just perfect, or you have to refine your query. So, if you don't get any results at all, which happens quite often, it means that your test is too strict, and you probably need to remove test cases, edge cases, or probably just use a lower number of normal cases.
29:42
If you find a lot of results, but they are not relevant, it means that your test is too weak, it's not strict enough, so you need to add more cases, describe your feature better, and probably add more corner cases. If you appear to go from no result at all to no relevant result and back,
30:03
it means that you don't find anything, you most probably are looking for a function that is not in your environment. Now, this is the base of a test-driven reuse, which is something that has been studied a little bit in the Java community,
30:29
and the idea of test-driven reuse is just that you start like test-driven development, you start your test, maybe you try to write it in a more independent way than you would do
30:44
if you already know what is the implementation that you are doing, and then you try to search if you find a function that already works. If any function passes your test, then you have three options.
31:01
So if you don't find any function, it's test-driven development, you have to develop it, so fine. Otherwise you may just import it, that means you get the dependency, or you may fork it, that is you get exactly the same code, test, you check the license and copy into your project,
31:21
you may just have a look at it to see how many details you didn't think already. Another trick is that you may just use the test-driven code search, which is a tool by itself for unit test validation. If you wrote a test, you think it's a good test, then you make a search with it,
31:43
and you find a couple of totally unrelated functions, it means that your test is too weak, it finds false hit. So, limitation of future work. The main point right now is performance,
32:02
and then you may do a lot of things like extending the search space and making more tools, but then you get even more work to do, and so performance, performance, performance, and parallelization, etc.
32:24
It would be very nice if this was not done on your machine, but on the web. So what we are trying to do is to make kind of a search engine on the web. If you want to know when things are starting to roll,
32:45
write an email to the email here, and we are looking for people who are willing to test. Conclusions. If you start using it, you will recognize much better what are good tests
33:01
and what are good codes, and you will tend, at least this is what we noticed, we tend to write your code so that all the implementation details are filed as way as possible, simple or as intuitive as possible. Thank you for your attention.
33:27
Okay, do you have any questions for that? So, do you filter somehow already on, for example,
33:42
the number of arguments that can be passed and similar things? Because if a function doesn't take any argument and you need something that takes one, then there's not a valid candidate, for example. I didn't understand. When you look for candidates of things that solve your problem, do you filter already on the things that you've done?
34:02
Right now, no. This is one of the reasons a web search, I mean a curated index of objects would be nice, but it's very difficult to do it on your machine. With Python, you may tell how many arguments you are in a function, but not much more because due to typing,
34:25
you might not actually want to be too strict. So, the idea behind having a web search engine is that you have a curated index of what kind of function may fit a particular test or not.
34:47
Is there anything for timing out functions that could take a long time? This is already taken into account. Every test has a time out of one second, so a lot of the stuff that are tried are time outs. I use prompt, raw input, etc.
35:02
doesn't give any problem. The real problem is when you call C extensions and they just crash the interpreter. I have a long blacklist for this kind of things. Another question? Yeah, no.
35:24
So, how do you deal with multi-argument functions where I don't know what the order of the arguments is gonna be and what's the time complexity of that? Okay, so this is what he in the first row tried to do at the beginning. There is an automatic permutation of arguments.
35:43
I refuse that because right now to get correct it's more important than to enlarge the space, but as I was writing this talk I noticed that you can easily use the parameter,
36:02
you can easily parameterize your function just switching arguments. So, it can be done right now by just passing the parameterized with switching. And the complexity is very hard. If you have two arguments it's two, but then it's n factorial.
36:22
So, with four arguments you are already very, very heavy. It's very easy to go into the hundreds of thousands of tests. Now it was really a small environment and for educational purposes, etc.