We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Fixture factories for faster end-to-end tests

00:00

Formal Metadata

Title
Fixture factories for faster end-to-end tests
Title of Series
Number of Parts
160
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Fixture factories for faster end-to-end tests [EuroPython 2017 - Talk - 2017-07-13 - Anfiteatro 2] [Rimini, Italy] When developing and maintaining many different services, unit testing is not enough to make sure your code works in production. By now, many teams doing SOA (service-oriented architectures) have a set of end-to-end tests that cover critical workflows to make sure these work. For these tests, all of the utilized services need to have the proper test fixture data in their datastores. This often leads to developers having to deal with raw datastore data (like JSON or SQL) for these tests, making the authoring of those tests very slow, tedious, and error-prone. This talk is going to discuss several approaches we tried at Yelp to generating these fixture data in a quicker, developer-friendly and more correct way. The main part of the talk will be a deep-dive into what fixture factories are, how to implement them and how to integrate them with pytest, the leading Python testing framework. I'll show you several other benefits this approach has over writing raw fixture data and how this leads to more maintainable and easier to adapt code. We'll also explore how you can then run your tests in parallel, cutting down runtime drastically
95
Thumbnail
1:04:08
102
119
Thumbnail
1:00:51
Software testingSoftwareIntelDemonDisintegrationStatistical hypothesis testingWeb serviceState of matterStatement (computer science)Service (economics)CodeFunction (mathematics)Endliche ModelltheorieFactory (trading post)AbstractionDefault (computer science)Metric systemBitScaling (geometry)Computer fileEndliche ModelltheorieFunctional (mathematics)Cartesian coordinate systemMultiplicationLibrary (computing)INTEGRALData storage deviceStatistical hypothesis testingIntegrated development environmentProduct (business)Software frameworkSoftwareUniversal product codeSocial classData modelMereologyState of matterWeb serviceSet (mathematics)Factory (trading post)Physical systemSubsetConnectivity (graph theory)Service (economics)Sensitivity analysisArchitectureCodeExecution unitElement (mathematics)Stack (abstract data type)Web 2.0AbstractionDefault (computer science)Flow separationMedical imagingPoint (geometry)CASE <Informatik>Repository (publishing)ScalabilityNP-hardDatabaseTheory of relativityFigurate numberGraph (mathematics)Parameter (computer programming)NumberSoftware testingOnline helpGoodness of fitSolid geometryTwitterLetterpress printingTask (computing)QuicksortLevel (video gaming)Object (grammar)Table (information)Rule of inferenceMultiplication signRepresentational state transferUniform resource locatorInterior (topology)Correspondence (mathematics)Bit rateRow (database)Key (cryptography)WebsiteMobile appFitness functionElasticity (physics)Unit testingMilitary baseComputer animation
Statistical hypothesis testingSoftware engineeringTexture mappingNatural numberCASE <Informatik>Library (computing)Server (computing)Parameter (computer programming)ResultantObject (grammar)Dependent and independent variablesRepository (publishing)MathematicsContrast (vision)Line (geometry)Error messageCodeDefault (computer science)Statistical hypothesis testingTerm (mathematics)Functional (mathematics)Endliche ModelltheorieProduct (business)Web serviceBitFitness functionFlow separationInternet service providerMultiplication signSpecial functionsPattern languageDistribution (mathematics)Point (geometry)Business modelUniversal product codeIntegerAdditionComputer configurationQuicksortInstance (computer science)DatabaseSystem callTable (information)Theory of relativityService (economics)Statement (computer science)Factory (trading post)Scaling (geometry)Set (mathematics)WritingAssociative propertySoftware testing2 (number)Program flowchart
Natural numberTexture mappingSoftware testingGamma functionStatistical hypothesis testingUltraviolet photoelectron spectroscopyParallel portToken ringCore dumpAlgebraic closureData conversionSocial classCodeRight angleOrder (biology)NumberProcess (computing)RobotMultiplication signMessage passingData storage deviceFactory (trading post)Statistical hypothesis testingRepository (publishing)Virtual machineDigital electronicsTwitterSinc functionMathematicsMereologyPlanningSlide ruleSoftware developerWater vaporSoftware testingSoftwareService (economics)Sampling (statistics)Electronic mailing listFlow separationPoint (geometry)Group actionDeterminismSubsetParallel portState of matterBitOverhead (computing)Pattern languageFrustrationSet (mathematics)Letterpress printingProduct (business)Instance (computer science)RoboticsStability theoryCodecDefault (computer science)Boolean algebraRun time (program lifecycle phase)BlogWordCore dump2 (number)Parameter (computer programming)Complex (psychology)Software bugCodeXML
Service (economics)Mobile WebIntegrated development environmentLecture/Conference
Right angleContinuous integrationProcess (computing)Data storage deviceIntegrated development environmentConfiguration spaceSoftware testingDatabaseLecture/Conference
DatabaseMedical imagingHuman migrationNormal (geometry)Presentation of a groupOrder (biology)Product (business)Scaling (geometry)Lecture/Conference
MathematicsProduct (business)Right angleStatistical hypothesis testingSet (mathematics)Software developerDatabaseSoftware testingMultiplication signCollision
CASE <Informatik>DatabaseInformation securityKey (cryptography)Software testingInsertion lossMultiplication signStatistical hypothesis testingRow (database)Level (video gaming)Endliche ModelltheorieLecture/Conference
Factory (trading post)DatabaseEndliche ModelltheorieHypermediaINTEGRALData structureCodeLecture/Conference
Repository (publishing)Human migrationService (economics)Core dumpIntegrated development environmentMereologyWeb serviceStatistical hypothesis testingLecture/Conference
Statistical hypothesis testingCartesian coordinate systemLogicTable (information)Row (database)MereologyCountingLecture/Conference
Statistical hypothesis testingCASE <Informatik>Translation (relic)AreaState of matterSoftware testingOrder (biology)Lecture/Conference
Transcript: English(auto-generated)
Thank you, hello, thanks for having me. Before I start, a little bit about my employer that thankfully allows me to be here with you. Yelp has got a website and an app to help you find great local businesses and we got over 120
million user reviews and we are active in over 32 countries. So to help you make sure you find the correct business that does what you want it to do and have a good experience. So let's dive right into it. What will I be talking about? First of all, just specify what I mean with end-to-end tests,
then discuss the problem that we have with setting up end-to-end tests, see how a fixture or data factories help solve the problem and just to clarify because some people ask me what I mean with fixtures. Fixtures are the
tasks you need to do to set up the environment so your test can run. In our case that's mostly inserting data into data stores but it can also be all sorts of other tasks and then lastly how can that help us make the test faster. So end-to-end tests. Some people call them system tests,
acceptance tests. I've seen people say yeah but those are integration tests, yes. Integration tests are something where you might just test like one external component whereas end-to-end tests are defined as or I see them as an
environment where you try to replicate as much of your production environment as possible so you spin up as many internal services as possible and you try to be as close to production as possible and then you run your test on that test infrastructure. This means these tests are typically the slowest tests to run and they're also the most expensive tests in that sense
and you know so we want to make sure they are effective because if you have a distributed architecture those tests are crucial in making sure that you know everything works in production and works together. Obviously unit tests are still your foundation of everything but they don't tell you that everything fits in together nicely. So just a quick shout out to the
technologies we use in our technology stack that are relevant to this talk so I'll be mentioning these. We use Pyramid, the web framework. We use Swagger slash open API to communicate between services and in the example
here I'm going to be using MySQL with SQLAlchemy. Again we also use a lot of other software like for data stores most notably Cassandra, Elastic search, several others. Doesn't matter for this talk. So what do I mean with a distributed architecture? I'm sure like many of you are familiar with this. You
have services here in this image. It's called microservices. Doesn't have to be. The point is that these services are not necessarily running on the same host. They are separated by a network layer and in our case the code bases are also separated. We don't have a big mono rate goal like some companies. These are
in separate Git repositories and just like in production also in testing these services are spun up separately and separate docker containers and then communicate over the network. And so the problem, the hard part about
end-to-end tests is that you have to create the correct state for your tests and not only your own service and your own service data store but also all dependent so downstream services that you are calling. And so how did we do that? Well I'm not really happy to say but yes we basically wrote a bunch
of SQL for MySQL. So I got these files and when your docker container is created for your downstream services then we just run SQL, insert some data and then use that data in tests. This means you have very tight coupling to your downstream services and yeah it's hard to write because we write raw
SQL. It's hard to get right. You might forget stuff, you might do it differently than your production code would have created that data and it's hard to maintain. So what are some of the possible solutions? If we take a look
at what Django does, they have their own ORM and you can use it really nicely in tests. You import your model and you have a nice Python API to creating the data you need. You do that in the setup function and it actually even cleans up after the test is run. So this is pretty nice but we'd like
that for downstream services as well and without code duplication. So without having to do that like in n multiple services that call one in the same service like yeah we don't want to repeat ourselves. So since we don't use
Django internally, I mentioned that we use pyramid and SQL alchemy, what can we do? Well this is what my talk is about. We are creating or yes we have these factory libraries that basically generate the data that you need for your
tests with a nice Python API and they contain the SQL alchemy models you need for that plus creation functions that provide a slightly higher level of abstraction. They take care of things like default values here so don't have to specify all of that if it really doesn't matter for your test and they
make sure your data is logically correct. So making sure that if you create a certain object, a certain row in a table, you also create all other rows that should be created at the same time. This is especially important since we like other companies that run MySQL at scale disabled foreign key
checks for performance reasons for scalability. So we can't even rely on the relation. And lastly if you have these packages all of a sudden you document which services are using which other services data store API for their
testing. So all of a sudden what you didn't know beforehand or couldn't figure out reliably now you can actually figure out manually and automatically creating a nice dependency graph and telling you which pieces of you need to look at if you change something in the data model. So let's take
a look at such an example end-to-end test how it looked previously. Yeah it's basically we use pytest for testing so I'll talk a little bit about that later. The important part here is we do a GET request. It's really not that
complicated I would say. And in the URL we pass two parameters to test this endpoint. If I didn't mention that it's like a REST API with JSON. Basically I think what many people are using. So we pass a business ID and a question ID.
And how do we do that? We pass them as hard-coded ints. So there's an actual number in the test and that number corresponds to the ID we specified in those SQL statements I showed you before. And then we use them here both for the business ID and the question ID. And you know we pass some other parameters
doesn't really matter in this case and then we assert on the response. So how do we want that test to look? How do we want to change it? How does it look now? First of all, so you see I don't know if you could see that it's basically just three lines that change. The first line is the top line
where we pass pytest fixtures. This is why I'm using data factories now for those libraries we use to create the data because pytest uses the same term fixtures for setting up your tests. And it does that in providing special
functions that you mark as fixtures. And then if you use that function name as an argument to your test function, it executes that fixture function and returns or provides in that argument whatever that function returns. We're
going to see an example in a second. I hope that makes it clearer. But the point is that like question and business ID is now something that gets done during setup and provided here as arguments that we can use. So the only two other lines that change are now where we pass the business ID and we don't specify a hard-coded integer anymore but we actually pass whatever
ID was generated for the business and we do the same for the question. And so those are the code changes in the test itself. What we get to do now in addition is delete those ugly SQL lines we used to set up the
test previously. But now the important question is how do we actually generate such a business ID or the business data? Well the fixture itself is very short because it delegates all the work to that factory library I
mentioned earlier. We mark it with a decorator so pytest knows it's a fixture and then it calls the create method on the business factory to create a business with frankly like all of the default values we don't care what specific business it is for our test. So how does that create method
look like? This is it. We got the option of passing all sorts of values to to customize the business if we want but really all it does is use the business model which is an SQLAlchemy model which is not important for this specific examples and creates an instance of that model in the database
and returns the ID. This provides reasonable defaults that we don't have to specify for each test and it prevents the DIY so we don't need to repeat ourselves. But you might now ask me why do we need these functions? Can't
we just use the models directly? As I mentioned we don't have foreign key checking and even if we did like not everything can be caught by foreign key checking sometimes your production code only generates data entries together and you want to do the same in your tests. So in this case it's for a
business owner or biz user how we call it internally and you can see there's like a biz user private table that should always have an entry for any entry that is in the biz user table and there's another relation with the biz user business table that might have one or more entries that connects a
business user to a business on Yelp. So what do we do here? We want our code to mirror like as closely as possible as what our production code does so let's take a look at the biz user fixture and what we do here. We can see it's relatively the same to the business fixture in the beginning it
creates an a biz user entry with the SQL alchemy model but then here it also always creates an entry in the biz user private table so you can't not do that which is exactly what we want and it also since most of the time if
you want to associate a business user with a business it also does this here as a convenience with a second method that it calls for you if you provide a business ID that you can still use in your test separately in case you need it for your test. So there's another case maybe you say well this is all nice
but I actually do have some services that provide me with a data creation API can I use that? Yes that actually integrates very well with this approach so in this case we got a service that's called question-and-answer we use swagger so a library called bravado to communicate with that service
we make a request to the service we pass some parameters and we get as a result a question object and this is also a pytest fixture that we can then use and I did in the example before we just specify question and then this
piece of code gets executed and returns the data. Now while this is possible and this is indeed code I've taken out of an internal repository you should be aware that in contrast to using models and those factory methods or
functions rather this has the disadvantage that it kind of assumes that you can actually communicate with that service and that the specific endpoint will work and return data so in case it doesn't for whatever reason instead of getting a nice test failure that that really points you to
the issue you get an error at fixture function execution which is a little bit harder to debug so just be aware of that. Now let's discuss a little bit the pros and cons why I like this approach I think it's a natural fit for pytest fixtures that just integrates very well it also
provides much easier data creation if you like if you look at this you could use those pytest fixture functions to execute raw SQL and you would get some of the same benefits but you would still have to write SQL you would still
have this extremely tight coupling yeah and it just this provides a nice Python API also it makes sure people create separate data entries for each test automatically since once you have written those pytest fixtures most
of the time all people need to do is just use those arguments in their test function and it will automatically create new entries for each test which has several advantages that I'll get into just in a second and you don't have to convert all of your tests immediately you can actually just for
example write new tests with this pattern and not touch your existing tests or migrate them slowly and this will all continue to work one thing you do need to do is you need to maintain those libraries with those factories it's a little bit of overhead in our experience it's actually not that much
work but it is another repository with more code so be aware of that and it can be potentially slower especially if you use to share like these fixtures these data between tests because you now well create new entries for each
test every time you run it and that can be slower which is not great since I use the word faster in my title so let's take a look at what we can do to make them faster well this is actually very easy use all of the
cores your machine has or your test executor you know run more tests in there's two issues with end-to-end tests in particular that might make this a problem since it's about writing data so first of all many tests are not
repeatable so you can't run them more than once without resetting your data stores this is not a problem here for just running them once if you then reset your data store but let's keep it in mind for later and the second one is that if you share data between tests you now all of a sudden
depend on execution order and this is something you'll see in a second we've encountered with our tests so you write these tests and they use the same business and one test changes something about the business and the second test runs and everything is fine because it knows that this data was
changed but if you're paralyzed tests you can't guarantee test execution order so depending on which test runs first one of them or even both of them might start to fail so but how do we actually run tests in parallel well
this is it for pi test it's actually really easy you install something you provide the parameter n with the number of workers or processes you want pi test to use and that's it well since we did that before converting any of our tests
just to see if that would work well it turns out it didn't it's just a sample list of issues we had and you can see we got a bunch of test failures so this is not something you can simply do and this is why these fixture factories or data factories help because once you do isolate your tests and you
have separate data entries for each test you fix this issue since you at that point what one test does the other test doesn't really care about anymore so oh actually one thing that I did want to mention even if you just
randomized test execution order you will get most of these failures so parallelizing tests is just something to make it faster but in my experience especially if using something like pi test which has a deterministic test execution order by default if you randomize test execution which is basically what we do with parallelizing you get these failures
so you can see here the latest issue or the last issue in the list was let's enable parallel test execution for all of our acceptance tests we call them acceptance tests internally so we should be fine right turns out that we weren't
fine there were just issues that we only discovered when more people were running the tests in parallel simply because even if they do run in parallel pi test doesn't really always randomize the tests or run them in completely different order so the more people use your tests the more issues you will discover and again this resulted in basically us switching more and more of
those end-to-end tests to use those data factories I provided the number four in our example for how to execute or how many processes to use I can tell you that at least for us using more than four processes didn't provide a
significant speed up but it did cut down execution time by roughly 60% and as you know end-to-end tests are like the slowest so this is actually a significant speed up plus it provides you with you know a nicer API it's
easier to write those tests and since we also solved the problem of test repeatability it means that you can actually like when you do develop the tests you can actually run them multiple times without having to reset your data store state across your downstream services because since you
always create new data entries you never get a problem with like I don't know you wanted to like reply to a message and the first time it works and the second time it says well you actually already replied to that message but if every time you run the test it creates a new message and then tests if you can reply to that message your test will just continue to work so what
are the main takeaways of my talk first of all if you use this kind of pattern you will get faster developments and not only test execution but people will be able to write tests faster and encounter less
frustration in writing these complex end-to-end tests and you will get more correct test data we've had multiple instances where we discovered that the test setup we did using raw SQL was not the correct one like it didn't represent how the data would look like in production convert tests for
test isolation and repeatability this is basically what all those JIRA issues I showed you were about if you do that then you can also actually make your tests faster by for example running them in parallel and you will just get much better tests up set up anyway since you really don't want to
depend on test order execution yeah and as I mentioned it is also easier to iterate over them over the tests and if the tests are not hard to write like if it's not such a pain to develop end-to-end tests the developers are more likely to actually write end-to-end tests which will yield
to fewer bugs and more stable software I also wanted to call out since I'm now at the end of my talk talks by my colleagues that already happened yesterday and the day before yesterday please check them out they were really good in my opinion it's not not a surprise they are smarter than me they
will be online as I assume Yelp they also showed the slide one thing I want to call out here is the engineering block where we have in my opinion really interesting blog post again written by people smarter than me yeah
and that is it my slides are online on github there's like just a few typos I'll need to fix it also contains some of the speaker not actually all of the speaker notes it's just not that there is so many so go check that out that is it thank you are you running this test on a CI how are you orchestrating
all of these services before running them are you running this test like on a CI environment like and how are you orchestrating or the micro services before running them okay so it's actually like really hard to hear from
frontier I only saw other speakers struggling with that so first it was about the CI environment right so while you can execute them locally we then use Jenkins for continuous integration and it also spins up everything yeah so how
we how we set that up in the past we used docker compose for doing that we are actually in the process of switching to an internal solution called Yelp compose which kind of does what docker compose does but is more tailored to our environment so basically yeah you have something that defines your
dependencies and the configuration for them the data stores they need and then it builds and spins all of that up thanks for my question is instead of
writing always kill why don't you dump your database for tests and use it migrations you mean in the in the beginning of your presentation you
show that you write the railway scale in order in order to generate your date why don't you dump your database from production yeah that's a good question well first of all our production database is is way too big
so we we couldn't use that and basically yeah couldn't use that in tests so basically we just want to create a set of data we need for well running our our tests and if you depend on production data you would also again have the same problem of depending on the fact that the data
doesn't change or you know like you can't just have a fixed set of data that you then always reuse even if you use production data you would say well there's a business with the ID 100 and I'm going to modify it and then yeah you at least don't get test repeatability right like because
once you have modified it you need to reset your data enemy anyhow and you also need to make sure that for example all of the tests use all of different businesses from your production data which again is actually not that easy to coordinate amongst many teams of developers so it's most of
the time just easier to create the data you need I hope that so I have a question how do you avoid the data collision in the test especially when running in parallel yes frankly in the case of my SQL we just let the
database take care of it so we tell it to create a data like an entry and it returns a primary key and then we use that primary key and frankly the database just makes sure that if there's like 10 inserts coming at the same time that it all still works it's really like I can actually expand on that the problem is really with tests using the same data so as long as we
have row level locking you you never run into any issues as long as every test uses different rows and as long as you do that you're fine this actually solves a lot of these kinds of issues getting you showed these model
packages yeah with the factories and the models in them and then you said that it was also kind of an advantage that you can then see the dependencies
throughout your code on these models did you mean that you have different microservices using integration databases using the same models or did I misunderstand that yes so I can give you like a specific example we
have sessions managed in a specific service and so obviously like since this is a very core part of things we have like multiple other services using that session service so if you have like a common package that is that
creates sessions just as an example then you can reuse that everywhere so at first you don't repeat yourself and second of all you can see like you can write a small script and find out which other repositories use that data creation package right so all of a sudden you know who is actually
creating test data for your session service so if you plan on doing a backwards incompatible data migration or schema migration at least you know now who you need to watch out for like at least for their test environments
right you said that you know every test uses different data because they
separately created rows but is that like no part of your application logic that for example like takes the count of every something in the table yes that is a very good question I was waiting for that obviously it's not solving all of the problems it's true if you had such a test that would be a problem
typically for us you wouldn't like take account of all businesses but there are some cases I can give you an example we do have things that are specific for countries right so and then if you have tests that modify the
state of something for a country and then you have another test that creates businesses in that country all of a sudden like one of the test influences the other you don't have test isolation anymore and you might get test failures so yes there are still cases that won't be solved by this it still requires you to manually design your test that way by for example using
a country we don't normally use in other tests you know stuff like that but yes you're right it but I would say it actually helps in the vast majority of the cases we encounter not all of them yes maybe the comment was
maybe there's a way to actually specify test order but so I suggest
you continue in the hole because I don't think anyone can hear really and yeah thank you very much