We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Make your tests fail

00:00

Formal Metadata

Title
Make your tests fail
Title of Series
Part Number
47
Number of Parts
79
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
It's easy as pie: before checking in, your test suite should always be green. Or should it? What if your tests are all green but you forgot to check one important edge case? What if your underlying system environment lets you down, but only under rare conditions that you didn't cover in your tests? This talk introduces randomised testing as used by projects like Apache Lucene and Elasticsearch based on the Carrotsearch Randomised Testing framework. It has helped uncover (and ultimately fix) a huge number of bugs not only in these project’s source code, but also in the JVM itself which those projects rely on. Writing unit and integration tests can be tricky: assumptions about your code may not always be true as any number of "this should never happen" log entries in production systems show. When implementing a system that will be integrated in all sorts of expected, unexpected, and outright weird ways by downstream users, testing all possible code paths, configurations and deployment environments gets complicated. With the Carrotsearch Randomised Testing framework, projects like Apache Lucene and Elasticsearch have introduced a new level to their unit and integration tests. Input values are no longer statically pre-defined but are generated based on developer defined constraints, meaning The test suite is no longer re-run with a static set of input data each time. Instead, every continuous integration run adds to the search space covered. Though generated at random, tests are still reproducible as all configurations are based on specific test seeds that can be used to re-run the test with the exact same configuration. Add to this randomising the runtime environment by executing tests with various JVM versions and configurations,and you are bound to find cases where your application runs into limitations and bugs in the JVM. This talk introduces randomised testing as a concept, shows examples of how the Carrotsearch Randomised Testing framework helps with making your test cases more interesting, and provides some insight into how randomising your execution environment can help save downstream users from surprises. All without putting too much strain on your continuous integration resources. Isabel Drost-Fromm
1
Thumbnail
09:05
11
23
Thumbnail
1:03:26
26
Thumbnail
1:01:01
30
Thumbnail
58:05
31
Thumbnail
53:11
43
60
Thumbnail
42:31
62
77
Thumbnail
10:59
Statistical hypothesis testingSoftwareFreewareOpen setMachine codeFerry CorstenStatistical hypothesis testingLibrary (computing)Frame problemSoftware testingElasticity (physics)WordSoftware developerWeb pageSlide ruleCodeGoodness of fitVideoconferencingXMLUMLComputer animation
Java appletoutputFunction (mathematics)Arc (geometry)Metropolitan area networkArray data structureOvalEuler anglesUniformer RaumGroup actionSoftware testingStatistical hypothesis testingLatent heatPhysical systemMultiplication signoutputIntegrated development environmentProduct (business)Set (mathematics)Function (mathematics)Covering spaceCategory of beingIterationString (computer science)Resampling (statistics)Different (Kate Ryan album)Software developerArithmetic meanFunctional (mathematics)Bound stateRandomizationContinuous integrationIntegerElectric generatorTask (computing)Chemical equationType theoryElectronic mailing listMedical imagingCASE <Informatik>System programmingPattern languageOrder (biology)Software frameworkAreaCuboidVulnerability (computing)Personal computerUniversal product codeRandom number generationLeakUnit testingMultiplicationVirtual machineDemosceneMaxima and minimaPoint (geometry)ResultantNumbering schemeAlgorithmMereologyQuicksortHacker (term)Object (grammar)Phase transitionSimilarity (geometry)Term (mathematics)Mathematical optimizationNetwork topologyRule of inferenceException handlingRight angleSpacetimeLevel (video gaming)INTEGRALExecution unitLogicSuite (music)Shooting methodSoftware bugAutomationFuzzy logicLocal ringState of matterAnalytic continuationJava appletComputer animationLecture/Conference
Value-added networkSystem programmingProcess (computing)Parameter (computer programming)Mathematical optimizationParametrische ErregungError messageNetwork topologySubject indexingCuboidChemical equationNP-hardPoint cloudVirtual machineRegular graphCommitment schemeTheoryVirtualizationWindowMereologyCodeRepresentational state transferContinuous integrationDifferent (Kate Ryan album)Instance (computer science)Client (computing)Unit testingSoftware developerLevel (video gaming)PressureFinite differenceStatistical hypothesis testingParallel portSystem programmingRandomizationData miningRepetitionClosed setINTEGRALJava appletBuildingUniform resource locatorDirectory serviceRevision controlTupleMultiplication signDistanceNeuroinformatikBitSocial class1 (number)Type theoryPhysical systemBoundary value problemCASE <Informatik>Software testingCore dumpAuthorizationObject (grammar)CommutatorControl flowComputer-assisted translationNumbering schemeGreen's functionRight angleNatural numberPoint (geometry)Dependent and independent variablesElasticity (physics)Software bugDeterminismThread (computing)Suite (music)EntropiecodierungExecution unitoutputMeeting/Interview
Continuous integrationStatistical hypothesis testingCASE <Informatik>Software bugDifferent (Kate Ryan album)BitLibrary (computing)Point (geometry)DatabaseNeuroinformatikReal numberPoint cloudMultiplication signIntegrated development environmentJava appletINTEGRALSystem programmingSystem callControl flowLikelihood functionSet (mathematics)CuboidLoginPressureUnit testingNatural numberLevel (video gaming)Order (biology)VirtualizationWritingRight angleSoftware developerProduct (business)Cycle (graph theory)ResultantSoftware frameworkMultiplicationOffice suiteParallel portOperator (mathematics)Scheduling (computing)EmailFilm editingSoftware testingPhysical systemCodeSubject indexingRevision controlDeterminismCode refactoringInsertion lossInstance (computer science)Block (periodic table)Group actionSound effectBuildingState of matterTheoryOracleElasticity (physics)AdditionMarginal distributionCovering spaceWordLecture/Conference
Multiplication signINTEGRALStatistical hypothesis testingSoftware developerIntegrated development environmentDecision theoryData managementUnit testingWordResampling (statistics)CASE <Informatik>Degree (graph theory)Configuration spaceSampling (statistics)Electric generatorAuthorizationFunctional (mathematics)Virtual machineSpeech synthesisTwitterInsertion lossSequenceRandomizationTerm (mathematics)Parameter (computer programming)ResultantLocal ringSign (mathematics)SoftwareBEEPPresentation of a groupIntegerSpacetimeProjective planeSummierbarkeitFormal verificationBitDisk read-and-write headDifferent (Kate Ryan album)Software testingoutputPhysical systemMathematical analysisNetwork topologyWebsiteCuboidExecution unitAreaWeb applicationSoftware frameworkQuery languageJava appletLecture/Conference
Different (Kate Ryan album)Video gamePosterior probabilityMoment (mathematics)WindowLine (geometry)Flow separationScheduling (computing)Programming languageBefehlsprozessorAsynchronous Transfer ModeVirtual machineOrder (biology)Physical systemSoftware developerSoftware testingStatistical hypothesis testingVelocityOperating systemRange (statistics)Multiplication signOpen sourceEmailArrow of timeIntegrated development environmentError messageSoftware frameworkExistential quantificationGodElectronic mailing listBookmark (World Wide Web)Java appletTraffic reportingPatch (Unix)MereologyAreaWeb pageInstance (computer science)Projective planeStability theoryQuicksortStrategy gameGroup actionProcess (computing)Point (geometry)Proper mapInheritance (object-oriented programming)Pairwise comparisonComputer fileContinuous integrationHash functionDefault (computer science)Electric generatorPlanningBuildingBlackboard systemComputer programmingoutputRight angleCodePoint cloudSoftware bugFeedbackData storage deviceSource codeCore dumpLecture/Conference
SoftwareOpen setFreewareComputer animation
Transcript: English(auto-generated)
So after trying to fix the how to get slides on the video problem, welcome to my early morning talk on how to make your tests fail. Randomized testing at the example of Elasticsearch, also the library was initiated at Lusine.
Who am I to tell you something about testing at Elasticsearch? I happen to be a developer at Elasticsearch, GmbH. Apart from that, I'm a member of the Apache Software Foundation, co-founder of Apache Mahood. Put your hands up if you know Mahood, one, two, okay.
Co-founder of Berlin Wazwirth, if you need an, wait, better? Mr. Technician, is the microphone working, or is there something we can change?
Something works. Okay, so there's nothing else? Much, much, okay. Okay, but- Because the warning is fixed.
Okay, if you need an, if you need an excuse to have your employer pay for a trip to Berlin, just go to Berlin Wazwirth to learn more about Elasticsearch and Hadoop and search in general in France. So, let's start with a few questions.
How many of you write tests? I don't care which one. Still one who doesn't? Okay. Say no code. Do you regularly check code coverage? About half, okay? Like regularly also means once a week, once a month, once every year.
I don't care. Do you run your tests regularly? Okay, good. So, pretty much everything should be fine, right? So, all tests are green, everything is nice, funny little world.
When randomized testing was introduced in Lucene, someone found something very odd. If you call a function that generates an integer between integer minimum and maximum, and then let Java compute the absolute from it.
That should always be positive, right? However, it's- Exactly. So, there's just like one exception to the rule and that's where you get a minimum value. So, although everything your tests check looks like a fairly green world and
looks like a well shaped object, this is what our world actually looks like when we look at our tests. We don't check all the corner cases and this is where typically our customers find bugs.
So, how do we go about checking those corner cases? I do have a little child, if you've been here earlier this morning, you have seen her. And I don't have a lot of time to clean our flat every day or every week. So, we looked into automating all of these chores.
One of the automations was to buy one of those Roombas. Every night, 5 AM, it starts its work. Every evening, all I have to do is take the garbage out. And this is what I like to do with testing as well. Let me introduce you to character randomized testing. The principle is also known as property based testing.
There's many other names that it's known under. If you do have a hacking background, you may be familiar with the term fuzzing. There's also similar ideas at play. Also here, we don't try to specifically uncover security holes when confronting the system with unexpected input.
But we just try to enlarge the space of expected values that we expose the system to. So, let's take a look at what carriage-switch supplies us with. It's a framework that generates random numbers, random strings with different properties.
It can be ASCII strings. It can be UDF8 strings. It can be UDF8 strings with specific properties. It can generate specific characters from specific care sets only. It can go and generate different environment settings,
like changing your locales that your system runs under. Like imagine running both your systems under development and on your continuous integration environment with the same locale. But having customers that actually live in China or in Turkey that have a different locale setting and
not finding bugs just because you test under the wrong locale setting. So, you don't run your test just with random input, because then, of course, your tests aren't reproducible anymore. Each test will be based on a reproducible specific seed value, and
the system will tell you the seed value that the test ran with when it failed, so you can reproduce it on your machine. You can repeat one test run multiple times. This makes sense, especially if during the test run, you generate different data sets. So you have the same test setup,
the same logic run with different input values. It supports for test timeouts. So when your test happens to run too long, it will kill it and tell you. It detects leaking threats, both in your test environment, but also in your production code. When I tried this out on Apache Mahood, we discovered two or
three struct leaks just by running the test shoot within the Keras hit randomizer runner without changing anything. You can annotate tests, like remember those tests in your test suits that were always very annoying because they run multiple minutes. Just put the at nightly annotation to it and
it will run them just say on your integration system or only when you start the test suit with a specific command line switch. Okay, how do I write tests that of which I do not know the input? It only makes sense for a certain type of test.
One of those types of tests is when actually checking the result is cheaper than computing it. When is that the case? Imagine a function that thoughts integer values. It's very cheap to check that the values that you have are actually sorted, but it's reasonably costly to do the sorting.
So what you do is you implement your sorting algorithm. In your test, you generate a set of numbers. You run it through your sorting function and at the end only checks that your numbers have been sorted. That's easy. Another use case for randomized testing is when you know of an algorithm
that is slower than the one you implemented, for real, for production, but that is simpler to implement. So you don't have all the funky optimizations in there. You put the slow algorithm in your randomized test. You run that over your generated input and
make sure that the output is identical to your production output. Another easy check is if you can think of an algorithm that in a very cheap way computes the upper and lower bound on the result.
Imagine adding a and b as a number. You can make the assertion that the result of a plus b, so should at least be larger than either a or b. If it's lower, then obviously something is wrong.
So just especially for these types of tests, if all you can define is an upper and lower bound on the result. You also, of course, wanna make sure to also have deterministic tests with specific input data to check the output of your function.
This will only give you broader coverage in your input data. So what does it look like? After adding the characterized randomized testing dependency to your build, what it gives you is a annotation, which we see here below the test annotation.
Repeat gives you a hint that this test should be repeated multiple times. In this case, iterations equals 100 means it should be repeated 100 times. And this means it's repeated 100 times, each with a different input data set.
Why is it a different input data set? That's because over here we generate a random integer value. We can either give it just an upper bound or we give it a lower and upper bound. We also have a function to generate just any random integer without the bounds.
Same here, this is where we generate our data. We fill our list with a random short value. Here we don't have the bounds. We could have them if we wanted to. As explained before, we do the sorting in our sorting algorithm, and in the end,
we just check that the array is sorted. So to summarize, on the unit test level, we generate input with a fixed state so we can repeat our tests if they fail. In order to increase the search space for bugs in our code, during continuous integration,
we can rerun tests to cover as much data set as we want to and as we need. We can even specify to just run the iterations during continuous integration and on the local box, just run it on one data set.
Okay, so that's the unit test level. At Elasticsearch, we went one step further. So far, we only looked at unit tests. Can we cover even more? Of course, we can cover even more. If you think about coding in Java, your clients may run with different JVMs, for instance, or they may run with different JVM optimization parameters.
If you have a distributed system, you may even want to vary the number of nodes that your system runs on. You may want to vary the number of processes on each of these nodes. In Elasticsearch, we vary the number, for each test, integration test run,
we vary the number of data nodes that the test cluster consists of in integration tests. And of course, we also vary JVM optimization parameters. We vary the JVMs that we test against. What that led to both in the seen world
and the Elasticsearch world was to uncover various JVM bugs themselves that lead to index data corruption, that lead to crashes, such that we now can give specific recommendations on which JVM versions up to the minor version we recommend users to run Elasticsearch on.
How does that look like for the ELK stack? We have a bunch of machines, both on Amazon, running in the cloud, as well as hard metal boxes,
where we don't have a virtualization layer in between. There is, on each commit to smoke test, there's a regular run for Java unit tests, there's a regular run of tests that go against the REST API. We test on, not only on different boxes,
but also on different operating systems. There is, if I remember correctly, there's Debian, there's Ubuntu, there's CentOS, there's Windows, probably by now even more. We run on different types of cloud instances, like smaller ones, larger ones.
So we do have a decent coverage there. To visualize again, we test against. Elasticsearch Core being the unit test level, we test against the Java API, Java Client API, and we test against the RESTful API with REST tests.
Plus, we do have a backwards compatibility version, where we check out a random, but supposed to be backwards compatible version of Elasticsearch, and running both of these in parallel and checking that they are both compatible.
Now what impact does that have on development culture? Actually, the impact is pretty large, and we've struggled quite a bit with stabilizing things in the past, because once you do have randomized testing, if you have a deterministic test with deterministic input and deterministic environment,
what you can do is that your developer checks out the code, runs it through, and hopefully if, well, except for Java versions being different than some such, but if this test run is green, everything should be fine. However, here, if you happen to have like one piece of data set,
that your developer didn't check on his local box, suddenly it will fail on continuous integration at some point in time. So you need to have some kind of responsibility there to check what went wrong on the continuous integration system, and to find the developer who can fix it,
have a culture where everyone looks there, and everyone considers it first priority to stabilize the build. Another impact is, sure, you do have a decent test suite right now, but as soon as you, I promise you, as soon as you integrate randomized testing in your build,
your test suddenly will become very flaky, and that's not due to the nature of randomized testing, but due to the fact that there probably are many bugs in your system right now that you're not aware of. Just to give you two anecdotes, I tried integrating randomized testing in Mahout.
First trouble I ran into, Mahout was built on top of Hadoop, and the test, the integration test had been written such that they always wrote to the temp directory in a very specific location, so if you had two build runs in parallel, it would crash. Another thing for Mahout was that the developers
relied upon not closing threads in the tests. If you have many tests in the same JVM running, that's not very nice, so each randomized test crashed for the thread leakage, and that's not even introducing random data. At a previous employer of mine, I played around with randomized testing,
so I looked for a class that looked reasonably easy. It did a little bit of geomask, computing distances and such, so I introduced random data there. How hard can it be to find bugs for codes that's been running for decades? Actually, pretty easy.
It took one night, and we had two or three cases where it failed, one being the dateline, others being bounding boxes of wraparounds that no one had thought about because the client apps that were using that particular piece of the code never ran into this issue, but it was just essentially a ticking time bomb
until someone in the client developing teams would have hit the problem down there. Unfortunately, it took weeks to iron out all bugs because if you had fixed one of these issues in that class, it took a week, and then we found another one, another week,
and that's just because there were either inexperienced developers who didn't think of the boundary cases, or it was developers who were happy to have just their computation fixed and they know that it works, so they had a psychological boundary to look for further issues,
and trust me, I've got the psychological boundary as well. I like tests that are green, so at some point in the day, I will stop searching for things that break, so if you introduce randomized testing, what happens is you don't have this notion
of CI breaks and the last committer actually is the one who broke the bill because it could as well be that some random generated dataset broke the test, so you will need to have either someone look at it and triage it, or you will have the team do it all together.
Depending on the team size, this may take quite a bit of time. This particular issue can be helped if you have randomized testing in place for long enough because at some point, you will realize that it's easier to deal with these issues
if you have reproducible errors, like you shouldn't have tests that rely too much on timing. You shouldn't have tests that rely too much on having all of the system up and running. Right now at Elasticsearch, we work heavily towards moving away from having many integration tests
towards having just a lot of unit tests so that the amount of code that could be broken and that could have caused the issue is being reduced. Another thing you should establish is to teach developers how to write tests
that are easy to reproduce and we've had many developers who found it very easy and very rewarding to do these integration tests because if you go to the rest layer and you establish an index, it's very easy in our testing framework and it's by design very easy to do that so that even customers can do it.
However, it also invites developers to just write integration tests. So you need to teach your developers really go down there, do the unit testing first and get that right. Also, it will uncover bugs in your environment.
In our case, it uncovered many edge cases in the Java environment each time Oracle releases a new JVM. The pre-release version are being run against at least Lucene but also against Elasticsearch and it's quite often that we run into problems
that at the end of the day, may lead to index corruption issues and that in Elasticsearch world means data loss. So we try to uncover these very early. To summarize, randomized testing is not a silver bullet.
It can cause a lot of problems to you. So first of all, it can cause a lot of problems because if you introduce it to an established code base, it will uncover many bugs that first you will have to fix. On the other hand, it also cannot replace tests
with deterministic data. Like we've tried to come up, we've done some, in our team we've done some refactoring recently and we found many cases where just having a test with deterministic data is a lot easier in order to check the result
and may give you deeper coverage even because the checks can be more restricted. However, the randomized nature of the test gives you a broader coverage. So my recommendation would be to write traditional unit tests and to add randomized testing in order to cover the corner cases.
You can add virtualization in order to cover installation setups. That's what we are about to do right now, like test the whole cycle from check-in up to building a release and the packaging, installing the packaging and then running the tests against this installed version
so that really become full cycle. One note, introducing randomized testing needs a lot of discipline for fixing tests. If you look, for instance, at the solar build or at multiple builds that I've seen at companies in-house, you really, really need
to go out there and fix these tests quickly because due to the fact that we run with random data, what you end up with is a continuous integration test system where the tests become like flickering. They go from green to red to green to red
and that's not something that you wanna have. That's something where developers very quickly stop looking at the continuous integration system at all. So you want it to have always green so that you can rely upon, I checked and I didn't break anything. Otherwise, if it's sometimes green and sometimes red,
people will start thinking, oh, maybe it wasn't me. And the other thing you wanna think about is at Elasticsearch right now, I've told you that we run in the cloud on multiple instances, on multiple operating systems, on multiple JVMs, possibly in parallel. So sometimes, you run into the following issue.
You do have a reproducible issue and it fails on every test run. So suddenly, you have that ripple effect that in the morning, you go to the office and you have 10 build failures and all of them caused by the same issue. You really wanna cut that out. Otherwise, people will start filtering these emails because we really can deal with 10 to 20 emails
each day just for one build issue. That's not very nice. So if you've looked at your watch, this talk is actually shorter than initiate, than advertised on the schedule.
I do have a few t-shirts here and I do have a little bit of what's called Zucru here that I've been told geeks like very much for fixing things. I would like to ask you, A, if you have any questions, but I would like to hand out the t-shirts for the best testing stories that I get.
Like if you know my background at FrostCon and at other conferences, what I usually do is take the microphone and hand it to everyone to tell them, to tell me their background. I thought that today, I'd make it a little bit different. Volunteers. Players.
Yeah. Oh, use the microphone please. One question. I mean, as I see, this is basically a helper for unit tests. Would you also recommend that for some kind integration tests because that it's, I think it's getting even more pain because sometimes the systems are very complex and how would you tackle that? I mean, if they could easily break all the time.
So at Elasticsearch, we do it at the integration test level that in our case, it makes sense because customers run the system in different environments. If you deploy to the same environment with the same database in the back all the time, you don't want to randomize there. It doesn't make any sense unless you want
to change the database at some point in the near time future. In our case, it makes sense because we ship that library out there and people are free to run it on whatever cloud instance, real computers against whatever JVM system with whatever local settings they have.
So we actually run our integration tests with that. You're right, it does raise the likelihood of breaking. On the other hand, we also offer support for customers. So we'd rather have these breakages in the integration test system than in a call
where we need to answer within two days. Because these are much, you don't have that much pressure to fix such an integration test and you can spend more time analyzing the logs. So in the end, it boils down to two things. A, make your system easier to analyze
when something fails. And B, have logging that actually helps you fix the system. And in our case, that even helps in production because those same logs are then available on the boxes of customers and of downstream users to analyze. Okay, thanks. Those are questions, comments, or stories.
From a technical standpoint, we saw how you call, let's say, test data generators, and we saw the annotation which tells
the environment to repeat the test. But how do you actually build a randomized test? It was incomplete, the example. I wouldn't know how to do it myself. The example itself actually was complete because it relies on the character
of the randomized testing framework. And that gives you all the annotations plus functions. And that one also, if I remember correctly, it's timestamp based, and that's what generates the seed. And based on that seed, then the data is being generated.
Like if you take the Java API, it already has a function for generating random integers. And that's essentially what all this is based upon. So it's not random integers in the mathematical cryptographic sense. Of the word, but it's really random
as in we generate a bunch of integers. Okay, there's a question or stories. Yeah. I'd like a t-shirt. I developed software in New Zealand,
and I was working there for the Ministry of Social Development, so this is an anecdote. And we were building a brand new web application using Google Web Toolkit. And at the same time, one of the managers had this great idea that we should also introduce basically end-to-end testing. And so we were using a brand new system, using WebDriver, Concordian, to do end-to-end UI testing.
And I was a little bit skeptical in the beginning, thinking how can this really be greater than all the integration testing we were already doing, and all the unit testing? And so we went ahead, and it was literally a giant pain in the ass. At first, we had all these random failures, and this went on for maybe four, five, six months.
So we were developing for about a year. And for six months, we were getting these random failures because Selenium was running too fast, and our test verifications wouldn't work, or there was like 100 different problems with it. And another professional automating testing team was put to our side to help us fix all the problems.
And in the end, we did. And after about six months, we basically earned the rewards because in the end of the project, lots of features had to be developed really quickly, and lots of bugs, and basically the end-to-end testing saved our backsides finding all these defects as times got a little bit more stressful in the end.
But my experience, and that's what I wanted to ask you in the end, is it was really only possible because management supported the whole idea. Without management accepting, the developers spend enormous amounts of time trying to debug defects, and also efforts
trying to keep the morale up, because developers work that way, well, it's failed without reason three times, I don't really wanna look at it anymore. It was an enormous investment. And without the backing of management, it wouldn't have been possible. Is that true in your experience as well? Yes, it's true. So in our case, we got seed backing mostly
because like I explained earlier, having backs in our system actually erases the amount of money we spend on support. So doing this kind of testing upfront saves us a lot of money in the end. But you're right, developers spend an enormous amount of time fixing these issues, finding these issues,
finding the developer who can fix them, and you can't do that without management support. I did work for other companies before where it was a management decision to, yes, we know that such and such configuration can generate a back. Our system won't be down, but the queries won't work. It's our management decision that we won't fix it.
So in such an environment, establishing deep integration testing with randomized input probably doesn't make a whole lot of sense if you don't have the idea that you wanna fix it in the end.
It will only make your build more unstable. Any other stories?
Hi, thanks. Just a recent case in our integration test, we did find a strange issue
with creating of some entities through Jenkins. It ran red all the time, though on our personal machines it was green. So it showed us that in Jenkins we used
like not really integrated environment because we took HSQL, so HypersQL to create the entities and the driver was dealing with the creating of a sequence through the sequence annotation
and it was stricter than the creating of the sequence. In our environment we had like an Oracle driver, so the Oracle driver took the annotation
without parameters and on our machines it created the entities somehow, but HSQL was like bugging about it all the time. So it helped us, it was just an issue
with other environment that showed us that we have to use different, take a look on different drivers to see that the software can fail because the drivers work differently. Just a recent issue, we have had these cases commonly
that the integration tests show us that you really have to look on different kinds of environments.
So one more story. Yes, in our project we have some, also some testing pain in different areas. One part is, which is more the easier part is,
I'm the only one developing on a Windows machine and everyone else is developing on Mac OS, so we had a lot of issues of these environmental stuff. So the easier one was file separators. One we had also was Git, so it was not configured to use proper line endings,
so we were calculating a hash on a file and it worked on a Linux machine, but the comparison failed on Windows because we had a different, got a different line separator, end of line on the Windows machine. And the more complicated tests are, yeah, also the asynchronous test
on the actor system we use. So they were working with timeouts of some seconds and on the continuous integration, these tests start to fail when other tests are running because the CPU is too loaded. So we also get very flaky tests and this is quite annoying.
So I think you can't cover it with randomized testing, that's more problem of test environments and scheduling tests that they are running in the right order. Actually we had similar problems as well running,
like we had configured our tests to run with a range of nodes, of Elasticsearch nodes, and for some of the cloud instances, the cloud instances actually were too small to host all these nodes on them, so they were running into all sorts of different timeouts.
Another issue we ran into was that some of our build engineers were trying out new environments and those, of course, were generating all sorts of errors because they were not yet done, but the errors went to the development mailing list, so every developer was like, oh my god, something's wrong. No, it's not. Someone's trying stuff out and they didn't tell us upfront.
So I'm out of t-shirts, but I still have Zucrow here, which you can squeeze and squish and it turns into plastic. If you have funny stories, I'm a happy taker.
Otherwise, I've got a few questions. I've talked a lot about the randomized testing frameworks that you use, how many of you are Java developers? Those who are not, do you know of similar frameworks of your favorite programming language?
Anyone? No one? Really? How many of you are open source developers? So many non-open source developers at FrostCon. So I hope you get infected.
So one thing I've seen in the Hadoop community is that, so they take patches, but each patch that they get through JIRA is being automatically applied to the system, run the tests against, and you get an automated report. How many of you do that?
One, two, three. Can you tell me the project? It's the Hadoop project, and the core apps for the code, and they have set the black-hands incident, and the tests are run automatically once one proposes a merge.
Can you tell us more about the implications? Does it have a good impact on how many patches you get or how fast feedback goes out? Yeah. It's noticeable that the code quality improves because you detect a lot of issues
when the commits are made. There was someone up there? Using ProCI with a couple of all sorts of storage. Okay. Blackboard that I'm working on. Implications or something you noticed when switching it on?
Yeah. Yeah, okay. Someone over there was also, thank you. Okay.
Okay. So I've heard about it only last week.
Anyone here heard about mutation testing? Yes, what's your experience with it? More on the fuzzing side. I use the Go programming language, and Dimitri has this fuzzer, GoFuzz, and it'll more or less be guaranteed
to find bugs in your program. Have you used it yourself? Yeah, yeah, it's super easy to use, but it'll totally crash your program, so. There was someone else over here. I don't remember the face. Yes.
Is it complicated to mutate? You have to build up stack of inputs to mutate and to generate it, and issue is to take back what was the problem and what crashed my plans. Okay, thank you. Let me check.
One last question. When we set up our build environment in AWS, we started doing it mostly manually just to get it up and running, and then we added one more instance for that operating system, one more instance for another operating system, one more instance for another JVM.
Right now, we are at the point where doing all this manually doesn't work anymore. How many of you went through the trouble of scaling your testing infrastructure in AWS and ended up using something like Puppet Chef, whatever, what have you? Anyone? No one?
Yes, dumb one. Yes, we work at the Bank of New Zealand. There's a team called the DevOps Guy, which surely you know here as well, and they sort of take care of that kind of thing. They use Puppet to stand up environments for us,
so us as the development teams, and I'm not entirely sure how exactly it works. We make requests and say, hey, we need another test environment, and then, I don't know, a few hours later, it's there. So yeah, it seems to work really, really well if people know how to use Puppet, and I think it didn't take them very long, maybe a few days to get used to it, but it seems to be the way to go,
comparing it to the past, where you would wait for weeks and weeks, and it's still not work. A few hours really sounds like a dream. Okay, if you don't have any more questions, this is from my side, and you can go and have some coffee. There's more zoopra here, there's more stickers here, take some.